Estudio comparativo de métodos de machine learning para la identificación del tipo celular mediante single-cell RNAseq con resolución espacial
Loading...
Official URL
Full text at PDC
Publication date
2026
Authors
Advisors (or tutors)
Editors
Journal Title
Journal ISSN
Volume Title
Publisher
Citation
Abstract
La tecnología de secuenciación de ARN de célula única con resolución espacial (scRNA-seq) permite cuantificar la expresión génica a nivel unicelular preservando la arquitectura del tejido. En el contexto oncológico, esta aproximación es fundamental para el estudio del microambiente tumoral, un ecosistema altamente complejo donde interactúan diversas células. Un paso crucial para su caracterización es la anotación celular, sin embargo, la asignación de los tipos celulares presentan desafíos intrínsecos que dependen fuertemente de la calidad de los datos. Este trabajo evalúa la robustez y concordancia de dos enfoques metodológicos de anotación celular. Utilizando datos reales de cáncer de pulmón no microcítico se simuló un escenario de degradación global de la calidad de los datos y se comparó un método no supervisado (Leiden) y uno semi-supervisado (InSituType). Los resultados demuestran que la identificación celular depende fuertemente de la metodología, la configuración de los hiperparámetros propios de cada método y la calidad de los datos. Ante la degradación, el impacto es asimétrico, afectando poblaciones heterogéneas como el estroma y el tumor. Además, los algoritmos responden con mecanismos divergentes a la pérdida de calidad. Leiden colapsa el etiquetado hacia linajes generales, mientras que InSituType favorece la asignación de células hacia perfiles preestablecidos lo que conlleva a un riesgo potencial de asignación incorrecta. En conclusión, para garantizar un etiquetado fidedigno no solo basta con utilizar los algoritmos, si no que es necesario una exhaustiva evaluación posterior de las etiquetas para consolidar resultados biológicamente representativos.
Spatial single-cell RNA sequencing (scRNA-seq) technology enables the quantification of gene expression at the single-cell level while preserving tissue architecture. In the oncological context, this approach is fundamental for the study of the tumor microenvironment, a highly complex ecosystem where diverse cells interact. A crucial step for its characterization is cell annotation, however, the assignment of cell types presents intrinsic challenges that strongly depend on data quality. This work evaluates the robustness and concordance of two methodological approaches for cell annotation. Using real data from non-small cell lung cancer, a global data quality degradation scenario was simulated to compare an unsupervised method (Leiden) and a semi-supervised method (InSituType). The results demonstrate that cell annotation strongly depends on the methodology, the configuration of the hyperparameters of each method, and the data quality. Under data degradation, the impact is asymmetric, affecting heterogeneous populations such as the stroma and the tumor. Furthermore, the algorithms respond with divergent mechanisms to the loss of quality. Leiden collapses the labeling towards general lineages, while InSituType tends to assign cells to pre-established profiles, introducing a potential risk of misclassification. In conclusion, to guarantee reliable annotation, it is not enough to solely use the algorithms; an exhaustive subsequent evaluation of the labels is necessary to consolidate biologically representative results.
Spatial single-cell RNA sequencing (scRNA-seq) technology enables the quantification of gene expression at the single-cell level while preserving tissue architecture. In the oncological context, this approach is fundamental for the study of the tumor microenvironment, a highly complex ecosystem where diverse cells interact. A crucial step for its characterization is cell annotation, however, the assignment of cell types presents intrinsic challenges that strongly depend on data quality. This work evaluates the robustness and concordance of two methodological approaches for cell annotation. Using real data from non-small cell lung cancer, a global data quality degradation scenario was simulated to compare an unsupervised method (Leiden) and a semi-supervised method (InSituType). The results demonstrate that cell annotation strongly depends on the methodology, the configuration of the hyperparameters of each method, and the data quality. Under data degradation, the impact is asymmetric, affecting heterogeneous populations such as the stroma and the tumor. Furthermore, the algorithms respond with divergent mechanisms to the loss of quality. Leiden collapses the labeling towards general lineages, while InSituType tends to assign cells to pre-established profiles, introducing a potential risk of misclassification. In conclusion, to guarantee reliable annotation, it is not enough to solely use the algorithms; an exhaustive subsequent evaluation of the labels is necessary to consolidate biologically representative results.













