Publication: Classification of COVID19 Patients Using Robust Logistic Regression
Full text at PDC
Advisors (or tutors)
Coronavirus disease 2019 (COVID19) has triggered a global pandemic affecting millions of people. Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) causing the COVID-19 disease is hypothesized to gain entry into humans via the airway epithelium, where it initiates a host response. The expression levels of genes at the upper airway that interact with the SARS-CoV-2 could be a telltale sign of virus infection. However, gene expression data have been flagged as suspicious of containing different contamination errors via techniques for extracting such information, and clinical diagnosis may contain labelling errors due to the specificity and sensitivity of diagnostic tests. We propose to fit the regularized logistic regression model as a classifier for COVID-19 diagnosis, which simultaneously identifies genes related to the disease and predicts the COVID-19 cases based on the expression values of the selected genes. We apply a robust estimating methods based on the density power divergence to obtain stable results ignoring the effects of contamination or labelling errors in the data and compare its performance with respect to the classical maximum likelihood estimator with different penalties, including the LASSO and the general adaptive LASSO penalties.
CRUE-CSIC (Acuerdos Transformativos 2022)
1. Algamal ZA, Lee MH (2015) Penalized logistic regression with the adaptive LASSO for gene selection in high-dimensional cancer. Expert Syst Appl 42:9326–9332 2. Araveeporn A (2021) The higher-order of adaptive lasso and elastic net methods for classification on high dimensional data. Mathematics 9:1091 3. Avella-Medina M, Ronchetti E (2018) Robust and consistent variable selection in high-dimensional generalized linear models. Biometrika 105:31–44 4. Bianco AM, Yohai VJ (1996) Robust estimation in the logistic regression model. Robust statistics, data analysis, and computer intensive methods. Springer, New York 5. Bianco AM, Boente G, Chebi G (2021) Penalized robust estimators in sparse logistic regression. TEST, 1–32 6. Basu A, Harris R, Hjort N, Jones MC (1998) Robust and efficient estimation by minimising a density power divergence. Biometrika 85(549–559):1998 7. Basu A, Ghosh A, Jaenada M, Pardo L (2021) Robust adaptive Lasso in high-dimensional logistic regression with an application to genomic classification of cancer patients. arXiv:2109.03028 8. Cantoni E, Ronchetti E (2001) Robust inference for generalized linear models. J Am Stat Assoc 96:1022–1030 9. Cawley GC, Talbot NLC (2006) Gene selection in cancer classification using sparse logistic regression with Bayesian regularization. Bioinformatics 22(19):2348–2355 10. Fan J, Li R (2001) Variable selection via nonconcave penalized likelihood and its oracle properties. J Am Stat Assoc 96:1348–1360 11. Fokianos K (2008) Comparing two samples by penalized logistic regression. Electron J Stat 2:564–580 12. Ghosh D, Chinnaiyan AM (2005) Classification and selection of biomarkers in genomic data using LASSO. J Biomed Biotechnol 2005(2):147 13. Ghosh A, Basu A (2016) Robust estimation in generalized linear models: the density power divergence approach. TEST 25(2):269–290 14. Ghosh A, Majumdar S (2020) Ultrahigh-dimensional robust and efficient sparse regression using nonconcave penalized density power divergence. IEEE Trans Inf Theory 66(12):7812–7827 15. Ghosh A, Jaenada M, Pardo L (2020) Robust adaptive variable selection in ultra-high dimensional linear regression models arXiv:2004.05470 16. Hastie T, Tibshirani R, Friedman J (2009) The elements of statistical learning: data mining, inference and prediction. Springer, Berlin 17. Huang J, Ma S, Zhang CH (2008) The iterated lasso for high-dimensional logistic regression. The University of Iowa, Department of Statistics and Actuarial Sciences, pp 1–20 18. Jacob L, Obozinski G, Vert JP (2009) Group lasso with overlap and graph lasso. In: Proceedings of the 26th annual international conference on machine learning, pp 433–440 19. Konishi S, Kitagawa G (1996) Generalized information criteria in model selection. Biometrika 83:875–890 20. Mick E, Kamm J, Pisco AO, Ratnasiri K, Babik JM, Calfee CS et al (2020) Upper airway gene expression differentiates COVID-19 from other acute respiratory illnesses and reveals suppression of innate immune responses by SARS-CoV-2. medRxiv 21. Park MY, Hastie T (2008) Penalized logistic regression for detecting gene interactions. Biostatistics 9:30–50 22. Ramesh P, Veerappapillai S, Karuppasamy R (2021) Gene expression profiling of corona virus microarray datasets to identify crucial targets in COVID-19 patients. Gene Rep 22:100980 23. Plan Y, Vershynin R (2013) Robust 1-bit compressed sensing and sparse logistic regression: a convex programming approach. IEEE Trans Inf Theory 59(1):482–494 24. Salahudeen AA, Choi SS, Rustagi A, Zhu J, Sean M, Flynn RA, Kuo CJ (2020) Progenitor identification and SARS-CoV-2 infection in long-term human distal lung organoid cultures. BioRxiv. https://doi.org/10.1101/2020.07.27.212076 25. Shevade SK, Keerthi SS (2003) A simple and efficient algorithm for gene selection using sparse logistic regression. Bioinformatics 19(17):2246–2253 26. Sun H, Wang S (2012) Penalized logistic regression for high-dimensional DNA methylation data with case–control studies. Bioinformatics 28:1368–1375 27. Tibshirani R (1996) Regression shrinkage and selection via the lasso. J R Stat Soc Ser B (Methodol) 58(1):267–288 28. Wu TT, Chen YF, Hastie T, Sobel E, Lange K (2009) Genome-wide association analysis by lasso penalized logistic regression. Bioinformatics 25(6):714–721 29. Zhang YH, Li H, Zeng T, Chen L, Li Z, Huang T, Cai YD (2021) Identifying transcriptomic signatures and rules for SARS-CoV-2 infection. Front Cell Dev Biol 8:1763 30. Zhu J, Hastie T (2004) Classification of expressions arrays by penalized logistic regression. Biostatistics 5(3):427–443 31. Zou H (2006) The adaptive lasso and its oracle properties. J Am Stat Assoc 101(476):1418–1429