Aviso: para depositar documentos, por favor, inicia sesión e identifícate con tu cuenta de correo institucional de la UCM con el botón MI CUENTA UCM. No emplees la opción AUTENTICACIÓN CON CONTRASEÑA
 

Imputación múltiple y validación bootstrap en modelos pronósticos

dc.contributor.advisorFernández Félix, Borja Manuel
dc.contributor.advisorLópez Herrero, María Jesús
dc.contributor.authorPeressini Álvarez, Melina
dc.date.accessioned2023-06-22T21:21:57Z
dc.date.available2023-06-22T21:21:57Z
dc.date.issued2022-09-21
dc.description.abstractEn el ámbito biomédico, los modelos pronósticos se emplean habitualmente para predecir la probabilidad de que un paciente presente una determinada condición. Su validación interna es necesaria para estimar su rendimiento predictivo en nuevos individuos, y puede llevarse a cabo empleando la técnica de remuestreo bootstrap. Ante la presencia de valores perdidos, las técnicas estadísticas clásicas requieren su tratamiento previo, que puede abordarse mediante imputación múltiple: (I) los valores perdidos se imputan múltiples veces, (II) el análisis estadístico se realiza en cada una de las muestras completas resultantes y (III) las estimaciones obtenidas para el parámetro de interés se combinan. En el marco de la validación interna bootstrap, la forma en que la imputación múltiple debe integrarse en el proceso de remuestreo se encuentra actualmente en estudio. En el presente trabajo, se realiza un estudio de simulación para evaluar diferentes estrategias cuando se tienen valores perdidos tanto en los predictores como en la variable de interés de un modelo logístico. En la estrategia MI-BS, se aplica en primer lugar la imputación múltiple y el remuestreo se realiza sobre cada una de las muestras imputadas. En la estrategia BS-MI, se realiza en primer lugar el remuestreo y la imputación múltiple se aplica sobre cada una de las muestras bootstrap. La estrategia BS-MI proporciona estimadores de rendimiento de menor sesgo en la práctica mayoría de los escenarios estudiados. Las diferencias entre estrategias se encuentran cuando el número de eventos por variable (EPV) es reducido y se desdibujan conforme éste aumenta.
dc.description.abstractIn the biomedical field, prognostic models are commonly used to predict the probability of a patient having a certain condition. Internal validation is necessary to estimate their predictive performance in new individuals, and can be carried out using the bootstrap resampling technique. Missing data need to be handled prior to classical statistical analyses. Multiple imputation is a popular approach to addressing the presence of missing data: (I) missing values are imputed several times, (II) statistical analysis is performed on each of the resulting complete samples, and (III) the estimates obtained for the parameter of interest are pooled. In the framework of bootstrap internal validation, the way multiple imputation of missing values and resampling should be combined is currently under study. In this work, a simulation study is performed to evaluate different strategies when missing values are in both covariates and outcome of a logistic model. In the MI-BS strategy, multiple imputation is applied first and bootstrap resampling is performed on each of the complete samples. On the other hand, in the BS-MI strategy resampling is performed first and multiple imputation is applied on each of the bootstrap samples. BS-MI provides less biased estimators in most of the scenarios. Differences between both strategies are found when the number of events per variable (EPV) is low, and become smaller as it ncreases.
dc.description.departmentDepto. de Estadística y Ciencia de los Datos
dc.description.facultyFac. de Estudios Estadísticos
dc.description.refereedTRUE
dc.description.statusunpub
dc.eprint.idhttps://eprints.ucm.es/id/eprint/74868
dc.identifier.urihttps://hdl.handle.net/20.500.14352/73956
dc.language.isospa
dc.rights.accessRightsopen access
dc.subject.cdu519.23
dc.subject.keywordmodelos pronósticos
dc.subject.keywordvalidación interna
dc.subject.keywordbootstrap
dc.subject.keywordimputación múltiple
dc.subject.keywordMICE
dc.subject.keywordprognostic models
dc.subject.keywordinternal validation
dc.subject.keywordmultiple imputation
dc.subject.ucmEstadística
dc.subject.unesco1209 Estadística
dc.titleImputación múltiple y validación bootstrap en modelos pronósticos
dc.typemaster thesis
dcterms.referencesCollins GS, Reitsma JB, Altman DG, Moons KG. Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): the TRIPOD Statement. BMC Biol [Internet]. 2013;11:EE. Available from: www.annals.org Steyerberg EW. Clinical Prediction Models. A practical approach to development, validation and updating. New York: Springer; 2009. Steyerberg EW, Harrell FE, Borsboom GJJM, Eijkemans MJC, Vergouwe Y, Habbema JDF. Internal validation of predictive models: Efficiency of some procedures for logistic regression analysis. J Clin Epidemiol. 2001;54(8):774–81. Steyerberg EW, Bleeker SE, Moll HA, Grobbee DE, Moons KGM. Internal and external validation of predictive models: A simulation study of bias and precision in small samples. J Clin Epidemiol. 2003;56(5):441–7. van Buuren S, Groothuis-Oudshoorn K. mice: Multivariate imputation by chained equations in R. J Stat Softw. 2011;45(3):1–67. Musoro JZ, Zwinderman AH, Puhan MA, ter Riet G, Geskus RB. Validation of prediction models based on lasso regression with multiply imputed data. BMC Med Res Methodol [Internet]. 2014 [cited 2022 Apr 27];14:116. Available from: http://www.biomedcentral.com/1471-2288/14/116 Wahl S, Boulesteix AL, Zierer A, Thorand B, Avan de Wiel M. Assessment of predictive performance in incomplete data by combining internal validation and multiple imputation. BMC Med Res Methodol. 2016 Oct 26;16(1):1–18. Austin PC, Steyerberg EW. Events per variable (EPV) and the relative performance of different strategies for estimating the out-of-sample validity of logistic regression models. Stat Methods Med Res. 2017;26(2):796–808. Harrell FE, Lee KL, Califf RM, Pryor DB, Rosati RA. Regression modelling strategies for improved prognostic prediction. Stat Med [Internet]. 1984 Apr;3(2):143–52. Available from: https://onlinelibrary.wiley.com/doi/10.1002/sim.4780030207 Peduzzi P, Concato J, Kemper E, Holford TR, Feinstein AR. A simulation study of the number of events per variable in logistic regression analysis. J Clin Epidemiol [Internet]. 1996 Dec;49(12):1373–9. Available from: https://linkinghub.elsevier.com/retrieve/pii/S0895435696002363 Efron B, Tibshirani R. An Introduction to the Bootstrap. Chapman and Hall; 1993. Efron B. Estimating the error rate of a prediction rule: some improvements on crossvalidation. J Am Stat Assoc. 1983;(78):316–31. Rubin DB. Inference and missing data. Biometrika. 1976; 63(3):581–92 Lee KJ, Tilling KM, Cornish RP, Little RJA, Bell ML, Goetghebeur E, et al. Framework for the treatment and reporting of missing data in observational studies: The Treatment And Reporting of Missing data in Observational Studies framework. J Clin Epidemiol. 2021;134:79–88. Hughes RA, Heron J, Sterne JAC, Tilling KM. Accounting for missing data in statistical analyses: multiple imputation is not always the answer. Int J Epidemiol. 2019;48(4):1294–304. Rubin DB. Multiple imputation for nonresponse in surveys. New York: John Wiley & Sons, Ltd; 1987. Graham JW, Olchowski AE, Gilreath TD. How many imputations are really needed? Some practical clarifications of multiple imputation theory. Prev Sci [Internet]. 2007 Sep [cited 2022 May 6];8(3):206–13. Available from: https://pubmed.ncbi.nlm.nih.gov/17549635/ Kenward MG, Carpenter J. Multiple imputation: current perspectives. Stat Methods Med Res [Internet]. 2007 Jun [cited 2022 May 6];16(3):199–218. Available from: https://pubmed.ncbi.nlm.nih.gov/17621468/ Vink G, Frank LE, Pannekoek J, Van Buuren S. Predictive mean matching imputation of semicontinuous variables. 2014; Van Buuren S, L Brand JP, M Groothuis-oudshoorn CG, Rubin DB. Fully conditional specification in multivariate imputation. J Stat Comput Simul [Internet]. 2006 [cited 2022 May 6];76(12):1049–64. Available from: http://www.tandf.co.uk/journals Moons KGM, Donders RART, Stijnen T, Harrell FE. Using the outcome for imputation of missing predictor values was preferred. J Clin Epidemiol. 2006 Oct;59(10):1092–101. Von Hippel PT. Regression with missing Ys: An improved strategy for analyzing multiply imputed data. Sociol Methodol. 2007 Dec;37(1):83–117. Harrell FE, Lee KL, Mark DB. Multivariable Prognostic Models: Issues in Developing Models, Evaluating Assumptions and Adequacy, and Measuring and Reducing Errors. Tutorials Biostat Stat Methods Clin Stud. 2005;1:223–49. Harrell FE. Regression Modeling Strategies. New York: Springer; 2001. Van Calster B, McLernon DJ, Van Smeden M, Wynants L, Steyerberg EW, Bossuyt P, et al. Calibration: The Achilles heel of predictive analytics. BMC Med. 2019 Dec 16;17(1). Riley RD, van der Windt DA, Croft P, Moons KGM. Prognosis research in healthcare. Oxford; 2019 Stevens RJ, Poppe KK. Validation of clinical prediction models: what does the “calibration slope” really measure? J Clin Epidemiol [Internet]. 2020 Feb 1 [cited 2022 Apr 12];118:93–9. Available from: https://doi.org/10.1016/j.jclinepi.2019.09.016 Steyerberg EW, Eijkemans MJC, Harrell FE, Habbema DF. Prognostic modelling with logistic regression analysis: a comparison of selection and estimation methods in small data sets. Stat Med. 2000;(19):1059–79. Marshall A, Altman DG, Royston P, Holder RL. Comparison of techniques for handling missing covariate data within prognostic modelling studies: a simulation study [Internet]. 2010. Available from: http://www.biomedcentral.com/1471-2288/10/7
dspace.entity.typePublication
relation.isAdvisorOfPublication64a702cc-f8f5-468f-baeb-e37e92492a68
relation.isAdvisorOfPublication.latestForDiscovery64a702cc-f8f5-468f-baeb-e37e92492a68

Download

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Peressini_Álvarez_TFM_MBE_eprint.pdf
Size:
2.72 MB
Format:
Adobe Portable Document Format