Robust adaptive Lasso in high-dimensional logistic regression with an application to genomic classification of cancer patients

Loading...
Thumbnail Image
Official URL
Full text at PDC
Publication date

2021

Advisors (or tutors)
Editors
Journal Title
Journal ISSN
Volume Title
Publisher
Citations
Google Scholar
Citation
Algamal, Z. Y. and Lee, M. H. (2019). A two-stage sparse logistic regression for optimal gene selection in high-dimensional microarray data classification. Advances in Data Analysis and Classification, 13, 753-771. Algamal, Z. Y. and Lee, M. H. (2015). Penalized logistic regression with the adaptive LASSO for gene selection in high-dimensional cancer classification. Expert Systems with Applications, 42, 9326{9332. Alon, U., Barkai, N., Notterman, D. A., Gish, K., Ybarra, S., Mack, D., and Levine, A. J. (1999). Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proceedings of the National Academy of Sciences, 96(12), 6745-6750. Avella-Medina, M. (2017). In uence functions for penalized M-estimators. Bernoulli, 7, 23(4B), 3178-3196. Avella-Medina, M. and Ronchetti, E. (2018). Robust and consistent variable selection in high- dimensional generalized linear models. Biometrika, 105(1), 31-44. Basu, A., Harris, R., Hjort, N., and Jones, M. C., (1998). Robust and efficient estimation by minimising a density power divergence. . Biometrika, 85, 549-559, 1998. Basu, A. , Shioya, H. and Park, C. (2011). The minimum distance approach. Monographs on Statistics and Applied Probability. CRC Press, Boca Raton. Basu, A., Mandal, A., Martín, N. and Pardo, L. (2013). Testing statistical hypotheses based on the density power divergence. Annals of the Institute of Statistical Mathematics, 65, 319-348. Basu, A., Mandal, A., Martín, N. and Pardo, L. (2015). Robust tests for the equality of two normal means based on the density power divergence. Metrika, 78, 611-634. Basu, A., Mandal, A., Martín, N. and Pardo, L. (2016). Generalized Wald-type tests based on minimum density power divergence estimators. Statistics, 50, 1-26. Basu, A., Ghosh, A., Mandal, A., Martín, N. and Pardo, L. (2017). A Wald-type test statistic for testing linear hypothesis in logistic regression models based on minimum density power divergence estimator. Electron. J. Stat., 11, 2, 2741-2772. Castilla, E., Ghosh, A., Jaenada, M., and Pardo, L (2020). On regularization methods based on Rényi's pseudodistances for sparse high-dimensional linear regression models. arXiv preprint arXiv:2007.15929. Cawley, G. C. and Talbot, N. L. C. 2006(). Gene Selection in Cancer Classification Using Sparse Logistic Regression with Bayesian Regularization. Bioinformatics, 22(19), 2348-2355. Duffy, D. E. and Santner, T. J. (1989). On a Small Sample Properties of Norm-Restricted Maximum Likelihood Estimators for Logistic Regression Models. Communication Statistics (Theory Methods) ,18, 959-980. Fan, J., Fan, Y., and Barut, E. (2014). Adaptive robust variable selection. Annals of Statistics, 42(1), 324-351 Fan, J. and Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American Statistical Association, 96, 1348{1360. Fan, Y. and Tang, C. Y. (2013). Tuning parameter selection in high dimensional penalized likelihood. Journal of the Royal Statistical Society, Series B Statistical Methodology, 531-552. Filzmoser, P., Maronna, R., and Werner, M. (2008). Outlier identification in high dimensions. Computational statistics & data analysis, 52(3), 1694-1711. Fokianos, K. (2008). Comparing two samples by penalized logistic regression. Electronic Journal of Statistics, 2, 564-580, Guo, P. , Zeng, F., Hu, X. , Zhang, D., Zhu, S., Deng, Y. and Hao, Y. (2015). Improved Variable Selection Algorithm Using a LASSO-Type Penalty, with an Application to Assessing Hepatitis B Infection Relevant Factors in Community Residents. PloS, ONE, 10 (7). Ghosh, A. and Basu, A. (2013). Robust Estimation for Independent but Non-Homogeneous Obser vations using Density Power Divergence with application to Linear Regression. Electronic Journal of Statistics, 7, 2420{2456. Ghosh, A., Basu, A. and Pardo, L. (2015). On the robustness of adivergence based test of simple statistical hypotheses. Journal of Statistical Planning and Inference, 161, 91{108. Ghosh, A. and Basu, A. (2016). Robust estimation in generalized linear models: the density power divergence approach. Test, 25(2), 269-290. Ghosh, A., Mandal, A., Martín, N. and Pardo, L. (2016). In uence Analysis of Robust Wald-type Tests. Journal of Multivariate Analysis, 147, 102{126. Ghosh, A. and Majumdar, S. (2020). Ultrahigh-dimensional Robust and Eficient Sparse Regression using Non-Concave Penalized Density Power Divergence. IEEE Transactions on Information Theory, doi: 10.1109/TIT.2020.3013015. Ghosh, A., Jaenada, M. and Pardo, L. (2020). Robust adaptive variable selection in ultra-high dimensional regression models based on the density power divergence loss. Golub, T. R., Slonim, D. K., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J. P., Coller, H., Loh, M. L., Downing, J. R., Caligiuri, M. A., Bloomfield, C. D. and Lander, E. S. (1999). Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science, 286(5439), 531-537. Huang, J. Ma, S. and Zhang, C. (2008). The iterative LASSO for high-dimensional regression. Technical Report 392. University of Iowa. Hoerl, A. E. and Kennard, R. W. (1970). Ridge regression: Biased estimation for nonorthogonal problems. Tecnometrics, 12(1), 55-67. Hobza, T., Pardo, L. and Vajda, I. (2008). Robust median estimator in logistic regression. Journal of Statistical Planning and Inference, 138 (12), 3822-3840. James, G., Witten, D., Hastie, T. and Tibshirani, R. (2013). An introduction to statistical learning, New York, Springer. Koh, K., Kim, S. and Boyd, S. (2007). An interior-point method for large-scale l1-regularized logistic regression. Journal of Machine Learning Research, 8 (4), 1519-1555. Le Cessie, S. and Van Houwelingen, J. C. (1992). Ridge Estimators in Logistic Regression. Journal of Royal Statistical Society. Series C, 41(1), 191{201. Lee, S. I., Lee, H., Abbeel, P. and Ng, A. Y. (2006). Efcient`1 regularized logistic regression. Association for the advancement of artificial intelligence. 6, 401-408. Lee, A. H. and Silvapulle, M. J. (1988). Ridge Estimation in Logistics Regression. Journal of Communication Statistics Simulation and Computation, 17 (4), 1231{1257. Park, M. Y. and Hastie, T. (2008). Penalized logistic regression for detecting gene interactions. Biostatistics, 9, 30-50. Park, M. Y. and Hastie, T. (2007). l1-regularization path algorithm for generalized linear models. Journal of the Royal Statistical Society. Series B, 69, 659{677. Park, H., and Konishi, S. (2016). Robust logistic regression modelling via the elastic net-type regularization and tuning parameter selection. Journal of Statistical Computation and Simulation, 86(7), 1450-1461. Plan, Y. and Vershynin, R. (2013). Robust 1-bit compressed sensing and sparse logistic regression: A convex programming approach. IEEE Transactions on Information Theory, 59(1), 482{494. Schaefer, R. L., Roi, L. D., andWolfe, R. A. (1984). A ridge logistic estimator. Communications in Statistics (Theory and Methods), 13(1), 99{113. Shevade, S. K. and Keerthi, S. S. (2003). A simple and efficient algorithm for gene selection using sparse logistic regression. Bioinformatics, 19 (17), 2246-2253. Sun, H. and Wang, S. (2012). Penalized logistic regression for high-dimensional DNA methylation data with case-control studies. Bioinformatics, 28, 1368-1375. Tibshirani, R. (1996).Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society. Series B (Methodological), 58, 267{288. van't Veer, L. J., Dai, H., Van De Vijver, M. J., He, Y. D., Hart, A. A., Mao, M., Peterse, H., van der Kooy, K., Marton, M. J., Witteveen, A. T., Schreiber, G.J., Kerkhoven, R. M., Roberts, C. Linsley, P. S., Bernards, R. and Friend, S. H. (2002). Gene expression proling predicts clinical outcome of breast cancer. nature, 415(6871), 530-536. Yang, Z., Liang, Y., Zhang, H., Chai, H., Zhang, B., and Peng, C. (2018). Robust Sparse Logistic Regression With the Lq(0 < q < 1) Regularization for Feature Selection Using Gene Exression Data. IEEE Access, 6, 68586-68595. Yang, L. and Qian, Y. (2016). A sparse logistic regression framework by different of convex functions programming. Applied Intelligence, 45, 241-254. West, M., Blanchette, C., Dressman, H., Huang, E., Ishida, S., Spang, R., Zuzan, H., Olson, J.A., Marks, J.R. and Nevins, J. R. (2001). Predicting the clinical status of human breast cancer by using gene expression profiles. Proceedings of the National Academy of Sciences, 98(20), 11462-11467. Zervakis, M., Blazadonakis, M. E., Tsiliki, G., Danilatou, V., Tsiknakis, M., and Kafetzopoulos, D. (2009). Outcome prediction based on microarray analysis: a critical perspective on methods. BMC bioinformatics, 10(1), 1-22. Zhu, J. and Hastie, T. (2004). Classiffcation of Expressions Arrays by Penalized logistic regression. Biostatistics, 5(3), 427-443. Zou, H. and Hastie, T. (2006). The adaptive lasso and its oracle properties. Journal of American Statistical Association 101, 1418-1429.
Abstract
Penalized logistic regression is extremely useful for binary classiffication with a large number of covariates (significantly higher than the sample size), having several real life applications, including genomic disease classification. However, the existing methods based on the likelihood based loss function are sensitive to data contamination and other noise and, hence, robust methods are needed for stable and more accurate inference. In this paper, we propose a family of robust estimators for sparse logistic models utilizing the popular density power divergence based loss function and the general adaptively weighted LASSO penalties. We study the local robustness of the proposed estimators through its in uence function and also derive its oracle properties and asymptotic distribution. With extensive empirical illustrations, we clearly demonstrate the significantly improved performance of our proposed estimators over the existing ones with particular gain in robustness. Our proposal is finally applied to analyse four different real datasets for cancer classification, obtaining robust and accurate models, that simultaneously performs gene selection and patient classification.
Research Projects
Organizational Units
Journal Issue
Description
Unesco subjects
Keywords
Collections