Improving the representativeness of a simple random sample: an optimization model and its application to the Continuous Sample of Working Lives

Núñez Antón, Vicente; Pérez Salamero González, Juan Manuel; Regúlez Castillo, Marta; Vidal-Meliá, Carlos

Improving the representativeness of a simple random sample: an optimization model and its application to the Continuous Sample of Working Lives

Download

1920.pdf (814.61 KB)

Publication date

2019

Authors

Núñez Antón, Vicente

Pérez Salamero González, Juan Manuel

Regúlez Castillo, Marta

Vidal-Meliá, Carlos

Publisher

Facultad de Ciencias Económicas y Empresariales. Instituto Complutense de Análisis Económico (ICAE)

Citations

Exportar

URI

https://hdl.handle.net/20.500.14352/17484

Abstract

This paper develops an optimization model for selecting a large subsample that improves the representativeness of a simple random sample previously obtained from a population larger than the population of interest. The problem formulation involves convex mixed-integer nonlinear programming (convex MINLP) and is therefore NP-hard. However, the solution is found by maximizing the “constant of proportionality” – in other words, maximizing the size of the subsample taken from a stratified random sample with proportional allocation – and restricting it to a p-value high enough to achieve a good fit to the population of interest using Pearson’s chi-square goodness-of-fit test. The beauty of the model is that it gives the user the freedom to choose between a larger subsample with a poorer fit and a smaller subsample with a better fit. The paper also applies the model to a real case: The Continuous Sample of Working Lives (CSWL), which is a set of anonymized microdata containing information on individuals from Spanish Social Security records. Several waves (2005-2017) are first examined without using the model and the conclusion is that they are not representative of the target population, which in this case is people receiving a pension income. The model is then applied and the results prove that it is possible to obtain a large dataset from the CSWL that (far) better represents the pensioner population for each of the waves analysed.

UCM subjects

Optimización matemática, Economía pública

Collections

Documentos de trabajo e informes técnicos

Full item page

Improving the representativeness of a simple random sample: an optimization model and its application to the Continuous Sample of Working Lives

Download

Official URL

Full text at PDC

Publication date

Authors

Advisors (or tutors)

Editors

Journal Title

Journal ISSN

Volume Title

Publisher

Citations

Exportar

URI

Citation

Abstract

Research Projects

Organizational Units

Journal Issue

Description

UCM subjects

Unesco subjects

Keywords

Collections