Aviso: para depositar documentos, por favor, inicia sesión e identifícate con tu cuenta de correo institucional de la UCM con el botón MI CUENTA UCM. No emplees la opción AUTENTICACIÓN CON CONTRASEÑA
 

Automatic regrouping of strata in the goodness-of-fit chi-square test

dc.contributor.authorNúñez-Antón, Vicente
dc.contributor.authorPérez Salamero González, Juan Manuel
dc.contributor.authorRegúlez Castillo, Marta
dc.contributor.authorVentura-Marco, Manuel
dc.contributor.authorVidal-Meliá, Carlos
dc.date.accessioned2023-06-15T07:49:10Z
dc.date.available2023-06-15T07:49:10Z
dc.date.issued2019
dc.description.abstractPearson’s chi-square test is widely employed in social and health sciences to analyse categorical data and contingency tables. For the test to be valid, the sample size must be large enough to provide a minimum number of expected elements per category. This paper develops functions for regrouping strata automatically, thus enabling the goodness-of-fit test to be performed within an iterative procedure. The usefulness and performance of these functions is illustrated by means of a simulation study and the application to different datasets. Finally, the iterative use of the functions is applied to the Continuous Sample of Working Lives, a dataset that has been used in a considerable number of studies, especially on labour economics and the Spanish public pension system.
dc.description.facultyFac. de Ciencias Económicas y Empresariales
dc.description.facultyInstituto Complutense de Análisis Económico (ICAE)
dc.description.refereedTRUE
dc.description.sponsorshipMinisterio de Economía y Competitividad (MINECO)/FEDER
dc.description.sponsorshipUniversidad del País Vasco
dc.description.statuspub
dc.eprint.idhttps://eprints.ucm.es/id/eprint/55555
dc.identifier.doi10.2436/20.8080.02.83
dc.identifier.issn2013–8830
dc.identifier.officialurlhttp://dx.doi.org/10.2436/20.8080.02.83
dc.identifier.urihttps://hdl.handle.net/20.500.14352/140.1
dc.issue.number1
dc.journal.titleSORT
dc.language.isoeng
dc.page.final30
dc.page.initial1
dc.page.total25
dc.publisherInstitut d'Estadística de Catalunya (Idescat)
dc.relation.ispartofseriesDocumentos de Trabajo del Instituto Complutense de Análisis Económico (ICAE)
dc.relation.projectID(ECO2015-65826-P; MTM2016-74931-P)
dc.relation.projectID(IT 793-13 and IT-642-13)
dc.rights.accessRightsopen access
dc.subject.keywordChi-square test
dc.subject.keywordstatistical software
dc.subject.keywordVBA
dc.subject.keywordMathematica
dc.subject.keywordContinuous Sample of Working Lives.
dc.subject.ucmEconometría (Economía)
dc.subject.unesco5302 Econometría
dc.titleAutomatic regrouping of strata in the goodness-of-fit chi-square test
dc.typejournal article
dc.volume.number43
dcterms.referencesAgresti, A. (2002). Categorical Data Analysis (2nd edition). Wiley, New York. Bartholomew, D.J. and Tzamourani, P. (1999). The goodness-of-fit of latent trait models in attitude measurement. Sociological Methods and Research, 27, 525–546. Bartholomew, D.J., Knott, M. and Moustaki, I. (2011). Latent Variable Models and Factor Analysis (3rd edition). Wiley, New York. Bishop, Y.M.M., Fienberg, S.E. and Holland, P.W. (1975). Discrete Multivariate Analysis: Theory and Practice. MIT Press, Cambridge. Bosgiraud, J. (2006). Sur le regroupement des classes dans le test du Khi-2. Revue Romaine de Mathematiques Pures et Appliquees, 51, 167–172. Cai, L., Maydeu-Olivares, A., Coffman, D.L. and Thissen, D. (2006). Limited-information goodness-of-fit testing of item response theory models for sparse 2p tables. British Journal of Mathematical and Statistical Psychology, 59, 173–194. Campbell, I. (2007). Chi-squared and Fisher-Irwin tests of two-by-two tables with small sample recommendations. Statistics in Medicine, 26, 3661–3675. Cochran, W.G. (1952). The ?2 test of goodness-of-fit. The Annals of Mathematical Statistics, 23, 315–345. Collins, L.M., Fidler, P.L., Wugalter, S.E. and Long, J. (1993). Goodness-of-fit testing for latent class models. Multivariate Behavioral Research, 28, 375–389. Delucchi, K.L. (1983). The use and misuse of chi-square: Lewis and Burke revisited. Psychological Bulletin, 94, 166–176. DGOSS (2014). Muestra Continua de Vidas Laborales 2013. Secretaría de Estado de la Seguridad Social. Dirección General de Ordenación (DGOSS). Ministerio de Trabajo e Inmigración. Madrid, Spain. Fienberg, S.E. (2006). Log-linear models in contingency tables. In Encyclopedia of Statistical Sciences. Wiley, New York. Fisher, R.A. (1935). The logic of inductive inference. Journal of the Royal Statistical Society, 98, 39–54. García Pérez, M.A. and Nuñez-Antón, V. (2009). Accuracy of power-divergence statistics for testing independence and homogeneity in two-way contingency tables. Communications in Statistics - Simulation and Computation, 38, 503–512. Goodman, L.A. (1974). Exploratory latent structures analysis using both identifiable and unidentifiable models. Biometrika, 61, 215–231. Grafstörm, A. and Schelin, L. (2014). How to select representative samples. Scandinavian Journal of Statistics, 41, 277–290. Haviland, M.G. (1990). Yates´ s correction for continuity and the analysis of 2 × 2 contingency-tables. Statistics in Medicine, 9, 363–367. Hirji, K.F. (2006). Exact Analysis of Discrete Data. Chapman and Hall, Boca Raton. Hosmer, D.W., Hosmer, T., Le Cessie, S. and Lemeshow, S. (1997). A comparison of goodness-of-fit tests for the logistic regression model. Statistics in Medicine, 16, 965–980. Hosmer, D.W. and Lemeshow, S. (2000). Applied Logistic Regression. Wiley, New York. INSS (2014). Informe Estadístico 2013. Secretaría de Estado de Seguridad Social. Ministerio de Empleo y Seguridad Social, MESS. Madrid, Spain. Keeling, K.B. and Pavur, R.J. (2011). Statistical accuracy of spreadsheet software. The American Statistician, 65, 265–273. Khan, H.A. (2003). A visual basic software for computing Fisher´s exact probability. Journal of Statistical Software, 8, 1–7. Kroonenberg, P.M. and Verbeek, A. (2018). The tale of Cochran´s rule: my contingency table has so many expected values smaller than 5, what am I to do? The American Statistician, 72, 175–183. Kruskall, W. and Mosteller, F. (1979a). Representative sampling, I. International Statistical Review, 47, 13–24. Kruskall, W. and Mosteller, F. (1979b). Representative sampling, II: scientific literature, excludind statistics. International Statistical Review, 47, 111–127. Kruskall, W. and Mosteller, F. (1979c). Representative sampling, III: the current statistical literature. International Statistical Review, 47, 245–265. Kruskall, W. and Mosteller, F. (1980). Representative sampling, IV: The History of the Concept in Statistics, 1895-1939. International Statistical Review, 48, 169–195. Larose, D.T. and Larose, C.D. (2014). Discovering Knowledge in Data: An Introduction to Data Mining. Wiley, New York. Lazarsfeld, P.F. and Henry, N.W. (1968). Latent Structure Analysis. Houghton Mifflin, Boston. Lewis, D. and Burke, C.J. (1949). The use and misuse of chi-square. Psychological Bulletin, 46, 433–489. Lin, J.J., Chang, C.H. and Pal, N. (2015). A revisit to contingency table and tests of Independence: bootstrap is preferred to chi-square approximations as well as Fisher’s exact test. Journal of Biopharmaceutical Statistics, 25, 438–458. Lydersen, S., Fagerland, M.W. and Laake, P. (2009). Tutorial in biostatistics. Recommended tests for association in 2x2 tables. Statistics in Medicine, 28, 1159–1175. Marsaglia, G. (2003). Random number generators. Journal of Modern Applied Statistical Methods, 2, 2–13. McCullough, B.D. (2000). The accuracy of Mathematica 4 as a statistical package. Computational Statistics, 15, 279–299. McCullough, B.D. (2008). Special section on Microsoft Excel 2007. Computational Statistics and Data Analysis, 52, 4568–4569. Mehta, C.R. and Patel, N.R. (1983). A network algorithm for performing Fisher’s exact test in r×c contingency tables. Journal of the American Statistical Association, 78, 427–434. MESS (2017). La Muestra Continua de Vidas Laborales. Guía del contenido. Estadísticas, Presupuestos y Estudios. Estadísticas. Secretaría de Estado de Seguridad Social. Ministerio de Empleo y Seguridad Social, MESS. Madrid, Spain. Moore, D.S. (1986). Tests of chi-squared type. In Goodness-of-fit Techniques (R. D’Agostino and M. Stephens, eds.). Marcel Dekker, New York, 63–95. Okeniyi, J.O. and Okeniyi, E.T. (2012). Implementation of Kolmogorov Smirnov p-value computation in Visual Basic: implication for Microsoft Excel library function. Journal of Statistical Computation and Simulation, 82, 1727–1741. Omair, A. (2014). Sample size estimation and sampling techniques for selecting a representative sample. Journal of Health Specialties, 2, 142–147. Pearson, K. (1900). On the criterion that a given system of deviations from the probable in the case of a correlated system of variables is such that it can be reasonably supposed to have arisen from random sampling. Philosophical Magazine, 50, 157–175. Pérez-Salamero González, J.M. (2015). La Muestra Continua de Vidas Laborales (MCVL) como fuente generadora de datos para el estudio del sistema de pensiones. Unpublished Ph.D. Thesis. Universitat de Valencia, Spain. Pérez-Salamero González, J.M., Regúlez-Castillo, M. and Vidal-Meliá, C. (2016). Análisis de la representatividad de la MCVL: el caso de las prestaciones del sistema público de pensiones. Hacienda Pública Española (Review of Public Economics), 217, 67–130. Pérez-Salamero Gonzélez, J.M., Regúlez-Castillo, M. and Vidal-Meliá, C. (2017). The continuous sample of working lives: improving its representativeness. SERIEs. Journal of the Spanish Economic Association, 8, 43–95. Quintela-del-Río, A. and Francisco-Fernández, M. (2017). Excel templates: a helpful tool for teaching statistics. The American Statistician, 71, 317–325. Ramsey, C.A. and Hewitt, A.D. (2005). A methodology for assessing sample representativeness. Environmental Forensics, 6, 71–75. Ripley, B.D. (2002). Statistical methods need software: a view of statistical computing. Opening lecture - Royal Statistical Society, Plymouth. Ross, A. (2015). Probability or statistics-permorming a chi-square goodness-of-fit test. Mathematical Stack Exchange. Tollenaar, N. and Mooijaart, A. (2003). Type I errors and power of the parametric bootstrap goodness-of-fit test: Full and limited information. British Journal of Mathematical and Statistical Psychology, 56, 271–288. Tsang, W.W. and Cheng, K.H. (2006). The chi-square test when the expected frequencies are less than 5. In COMPSTAT 2006 - Proceedings in Computational Statistics (A. Rizzi and M. Vichi, eds.). Physica Verlag - Springer, Heidelberg, 1583–1589. Wickens, T.D. (1989). Multiway Contingency Tables Analysis for the Social Sciences. Hillsdale, NJ: Erlbaum. Wilkinson, L. (1994). Practical guidelines for testing statistical software. In Computational Statistics: Papers Collected on the Occasion of the 25th Conference on Statistical Computing at Schloss Reisensburg (P. Dirschedl and R. Ostermann, eds.). Physica Verlag - Springer, Heidelberg, 1–16. Yates, F. (1934). Contingency tables involving small numbers and the ?2 test. Supplement to the Journal of the Royal Statistical Society, 1, 217–235.
dspace.entity.typePublication

Download

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
43.1.6.nunez-etal.pdf
Size:
519.81 KB
Format:
Adobe Portable Document Format

Collections

Version History

Now showing 1 - 2 of 2
VersionDateSummary
2*
2023-06-15 09:49:10
Version created in EPrints
2023-06-15 09:49:10
Version created in EPrints
* Selected version