Imputación múltiple y validación bootstrap en modelos pronósticos
Loading...
Official URL
Full text at PDC
Publication date
2022
Authors
Advisors (or tutors)
Editors
Journal Title
Journal ISSN
Volume Title
Publisher
Citation
Abstract
En el ámbito biomédico, los modelos pronósticos se emplean habitualmente para predecir la probabilidad de que un paciente presente una determinada condición. Su validación interna es necesaria para estimar su rendimiento predictivo en nuevos individuos, y puede llevarse a cabo empleando la técnica de remuestreo bootstrap. Ante la presencia de valores perdidos, las técnicas estadísticas clásicas requieren su tratamiento previo, que puede abordarse mediante imputación múltiple: (I) los valores perdidos se imputan múltiples veces, (II) el análisis estadístico se realiza en cada una de las muestras completas resultantes y (III) las estimaciones obtenidas para el parámetro de interés se combinan. En el marco de la validación interna bootstrap, la forma en que la imputación múltiple debe integrarse en el proceso de remuestreo se encuentra actualmente en estudio. En el presente trabajo, se realiza un estudio de simulación para evaluar diferentes estrategias cuando se tienen valores perdidos tanto en los predictores como en la variable de interés de un modelo logístico. En la estrategia MI-BS, se aplica en primer lugar la imputación múltiple y el remuestreo se realiza sobre cada una de las muestras imputadas. En la estrategia BS-MI, se realiza en primer lugar el remuestreo y la imputación múltiple se aplica sobre cada una de las muestras bootstrap. La estrategia BS-MI proporciona estimadores de rendimiento de menor sesgo en la práctica mayoría de los escenarios estudiados. Las diferencias entre estrategias se encuentran cuando el número de eventos por variable (EPV) es reducido y se desdibujan conforme éste aumenta.
In the biomedical field, prognostic models are commonly used to predict the probability of a patient having a certain condition. Internal validation is necessary to estimate their predictive performance in new individuals, and can be carried out using the bootstrap resampling technique. Missing data need to be handled prior to classical statistical analyses. Multiple imputation is a popular approach to addressing the presence of missing data: (I) missing values are imputed several times, (II) statistical analysis is performed on each of the resulting complete samples, and (III) the estimates obtained for the parameter of interest are pooled. In the framework of bootstrap internal validation, the way multiple imputation of missing values and resampling should be combined is currently under study. In this work, a simulation study is performed to evaluate different strategies when missing values are in both covariates and outcome of a logistic model. In the MI-BS strategy, multiple imputation is applied first and bootstrap resampling is performed on each of the complete samples. On the other hand, in the BS-MI strategy resampling is performed first and multiple imputation is applied on each of the bootstrap samples. BS-MI provides less biased estimators in most of the scenarios. Differences between both strategies are found when the number of events per variable (EPV) is low, and become smaller as it ncreases.
In the biomedical field, prognostic models are commonly used to predict the probability of a patient having a certain condition. Internal validation is necessary to estimate their predictive performance in new individuals, and can be carried out using the bootstrap resampling technique. Missing data need to be handled prior to classical statistical analyses. Multiple imputation is a popular approach to addressing the presence of missing data: (I) missing values are imputed several times, (II) statistical analysis is performed on each of the resulting complete samples, and (III) the estimates obtained for the parameter of interest are pooled. In the framework of bootstrap internal validation, the way multiple imputation of missing values and resampling should be combined is currently under study. In this work, a simulation study is performed to evaluate different strategies when missing values are in both covariates and outcome of a logistic model. In the MI-BS strategy, multiple imputation is applied first and bootstrap resampling is performed on each of the complete samples. On the other hand, in the BS-MI strategy resampling is performed first and multiple imputation is applied on each of the bootstrap samples. BS-MI provides less biased estimators in most of the scenarios. Differences between both strategies are found when the number of events per variable (EPV) is low, and become smaller as it ncreases.