Beyond large language models: rediscovering the role of classical statistics in modern data science

dc.contributor.authorGutiérrez García-Pardo, Inmaculada
dc.contributor.authorGómez González, Daniel
dc.contributor.authorCastro Cantalejo, Javier
dc.contributor.authorBruce Bimber
dc.contributor.authorJulien Labarre
dc.date.accessioned2026-01-12T14:49:18Z
dc.date.available2026-01-12T14:49:18Z
dc.date.issued2024
dc.description.abstractThis study explores the synergy between large language models and classical statistics in contemporary data science. In the field of large language models, we find there is no one-size-fits-all model which satisfies the needs of other scientists. There are differences in the soft results which may be a limitation on their application. To analyze these differences and lack of robustness, we propose a robust methodology that integrates classical statistical experimental design principles with the these advanced models, aiming to identify statistically significant differences among their outcomes. In particular, an experimental design is presented in which the main factors, levels, treatments and interactions that influence the predictions made by different models of complex natural language processing are identified. The main aim of this research is to better understand the influence of some controlled factors that are used in com-plex natural language processing models by applying classical statistical techniques, providing a comprehensive perspective on the relative effectiveness of different zero-shot classification models. It aims to offer practitioners insights into when and where certain models may be more or less sensitive, facilitating informed decision-making in applying these advanced language models. Additionally, computational results obtained from a pilot dataset are presented. These results illustrate the entire process of the proposed methodology, highlighting the importance of considering statistical evidence when making decisions.
dc.description.departmentDepto. de Estadística y Ciencia de los Datos
dc.description.facultyFac. de Estudios Estadísticos
dc.description.refereedTRUE
dc.description.statuspub
dc.identifier.citationI. Gutiérrez, D. Gómez, J. Castro, B. Bimber and J. Labarre, "Beyond Large Language Models: Rediscovering the Role of Classical Statistics in Modern Data Science," 2024 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE), Yokohama, Japan, 2024, pp. 1-8, doi: 10.1109/FUZZ-IEEE60900.2024.10611766.
dc.identifier.doi10.1109/FUZZ-IEEE60900.2024.10611766
dc.identifier.issn1558-4739
dc.identifier.officialurlhttps://ieeexplore.ieee.org/document/10611766
dc.identifier.urihttps://hdl.handle.net/20.500.14352/129910
dc.journal.titleIEEE International Conference on Fuzzy Systems
dc.language.isoeng
dc.publisherIEEE
dc.rightsAttribution-NonCommercial-NoDerivatives 4.0 Internationalen
dc.rights.accessRightsrestricted access
dc.rights.urihttp://creativecommons.org/licenses/by-nc-nd/4.0/
dc.subject.cdu51
dc.subject.cdu311
dc.subject.cdu004
dc.subject.cdu519.22-7
dc.subject.ucmCiencias
dc.subject.ucmMatemáticas (Matemáticas)
dc.subject.ucmEstadística aplicada
dc.subject.unesco12 Matemáticas
dc.subject.unesco1209 Estadística
dc.subject.unesco1203.17 Informática
dc.titleBeyond large language models: rediscovering the role of classical statistics in modern data science
dc.typejournal article
dc.type.hasVersionVoR
dspace.entity.typePublication
relation.isAuthorOfPublication2f4cd183-2dd2-4b4e-8561-9086ff5c0b90
relation.isAuthorOfPublication4dcf8c54-8545-4232-8acf-c163330fd0fe
relation.isAuthorOfPublicatione556dae6-6552-4157-b98a-904f3f7c9101
relation.isAuthorOfPublication.latestForDiscovery2f4cd183-2dd2-4b4e-8561-9086ff5c0b90

Download

Collections