Evaluación de técnicas de validación interna en modelos predictivos en genómica
Loading...
Official URL
Full text at PDC
Publication date
2025
Authors
Advisors (or tutors)
Editors
Journal Title
Journal ISSN
Volume Title
Publisher
Citation
Abstract
La modelización con datos transcriptómicos presenta diversos retos, entre ellos la alta dimen sionalidad y el reducido tamaño muestral de los conjuntos de datos. Este trabajo tiene por objetivo comparar dos técnicas de validación interna ampliamente utilizadas: la validación cruzada repetida (repcv) y el optimismo corregido por bootstrap (optboot), para evaluar su desempeño en la estimación del rendimiento de un modelo de regresión logística penalizada con Lasso en este escenario. Se lleva a cabo un estudio de simulación con datos sintéticos para buscar diferencias en el ajuste de hiperparámetros, la selección de predictores y el rendimiento de los modelos. Los resultados indican que repcv proporciona modelos más simples con mejor rendimiento real y, además, menor sesgo en la estimación del rendimiento. Se puede concluir que, a diferencia de resultados previos en escenarios de baja dimensionalidad, repcv ofrece estimaciones más fiables en escenarios p " n. Con este trabajo se quiere mostrar la importancia de ajustar las técnicas de validación al escenario que se esté estudiando.
Modelling using transcriptomic data poses various challenges, such as high dimensionality scenarios and a reduced sample sizes. The goal of this project is to compare two widely known validation techniques: repeated k-fold cross validation (repcv) and optimism corrected bootstrap (optboot), in order to evaluate their performance in estimating the predictive ability of a logistic regression model penalized by Lasso. A simulation study will be conducted using synthetic data to search for differ ences in hyperparameter tuning, predictors selection and model performance. Results show that repcv yields simpler models with better real performance and less biased performance estimates. We conclude that, contrary to previous findings in low dimensionality settings, repcv provides more reliable estimations in p " n scenarios. With this projects, we aim to remark the importance of adapting the validation techniques to the specific characteristics of the problem at hand.
Modelling using transcriptomic data poses various challenges, such as high dimensionality scenarios and a reduced sample sizes. The goal of this project is to compare two widely known validation techniques: repeated k-fold cross validation (repcv) and optimism corrected bootstrap (optboot), in order to evaluate their performance in estimating the predictive ability of a logistic regression model penalized by Lasso. A simulation study will be conducted using synthetic data to search for differ ences in hyperparameter tuning, predictors selection and model performance. Results show that repcv yields simpler models with better real performance and less biased performance estimates. We conclude that, contrary to previous findings in low dimensionality settings, repcv provides more reliable estimations in p " n scenarios. With this projects, we aim to remark the importance of adapting the validation techniques to the specific characteristics of the problem at hand.












