On the Asymptotic Distribution of Cook’s distance in Logistic Regression Models

Thumbnail Image
Full text at PDC
Publication Date
Advisors (or tutors)
Journal Title
Journal ISSN
Volume Title
Taylor & Francis
Google Scholar
Research Projects
Organizational Units
Journal Issue
It sometimes occurs that one or more components of the data exert a disproportionate influence on the model estimation. We need a reliable tool for identifying such troublesome cases in order to decide either eliminate from the sample, when the data collect was badly realized, or otherwise take care on the use of the model because the results could be affected by such components. Since a measure for detecting influential cases in linear regression setting was proposed by Cook [Detection of influential observations in linear regression, Technometrics 19 (1977), pp. 15–18.], apart from the same measure for other models, several new measures have been suggested as single-case diagnostics. For most of them some cutoff values have been recommended (see [D.A. Belsley, E. Kuh, and R.E. Welsch, Regression Diagnostics: Identifying Influential Data and Sources of Collinearity, 2nd ed., John Wiley & Sons, New York, Chichester, Brisban, (2004).], for instance), however the lack of a quantile type cutoff for Cook's statistics has induced the analyst to deal only with index plots as worthy diagnostic tools. Focussed on logistic regression, the aim of this paper is to provide the asymptotic distribution of Cook's distance in order to look for a meaningful cutoff point for detecting influential and leverage observations.
Unesco subjects
[1] A. Agresti,Categorical Data Analysis, 2nd ed., John Wiley & Sons, New York, 2002. [2] E.B. Anderson, An Introduction to Categorical Data Analysis, John Wiley & Sons, New York, 1996. [3] P.J. Bickel, E.A. Hammel, and J.W. O’Conner, Sex bias in graduate admissions: data from Berkeley, Science 187(1975), pp. 398–404. [4] B.W. Brown, Prediction analysis for binary data, in Biostatistics Casebook, R.G. Miller, B. Efron, B.W. Brown and L.E. Moses, eds., John Wiley and Sons, New York, 1980, pp. 3–18. [5] R.D. Cook, Detection of influential observations in linear regression, Technometrics 19 (1977), pp. 15–18. [6] R.D. Cook and S. Weisberg, Residuals and Influence in Regression, Chapman & Hall, London, 1982. [7] J.A. Díaz-García, G. González-Farías, and V.M. Alvarado-Castro, Exact distributions for sensitivity analysis in linear regression, Appl. Math. Sci. 1 (2007), pp. 1083–1100. [8] J.J. Dik and M.C.M. de Gunst, The distribution of general quadratic forms in normal variables, Statist. Neerlandica 39 (1985), pp. 14–26. [9] P. Feigl and M. Zelen, Estimation of exponential probabilities with concomitant information, Biometrics 21 (1965),pp. 826–838. [10] T.S. Ferguson, A Course in Large Sample Theory, Chapman & Hall, London, 1996. [11] D.J. Finney, The estimation from individual records of the relationship between dose and quantal response, Biometrika 34 (1947), pp. 320–334. [12] A.S. Hadi and J.S. Simonoff, Procedures for the identification on multiple outliers in linear models, J. Amer. Statist. Assoc. 88 (1993), pp. 1264–1272. [13] D.W. Hosmer and S. Lemeshow, Applied Logistic Regression, 2nd ed., John Wiley & Sons, New York, 2000. [14] D.R. Jensen and D.E. Ramirez, Some exact properties of Cook’s DI, in Handbook of Statistics, N. Balakrishnan and C. Rao, eds., Vol. 16, Elsevier Science, Amsterdam, 1998, pp. 387–402. [15] W. Johnson, Influence measures for logistic regression: Another point of view, Biometrics 72 (1985), pp. 59–65. [16] K.E. Muller and M. Chen Mok, The distribution of Cook’s D statistics, Comm. Statist. Theory Methods 26 (1997), pp. 525–546. [17] J. Muñoz-García, J.M. Muñoz-Pichardo, and L. Pardo, Cressie and Read power-divergences as influence measures for logistic regression models, Comput. Statist. Data Anal. 50 (2006), pp. 3199–3221. [18] R.L. Obenchain, Letter to the editor, Technometrics 19 (1977), pp. 348–351. [19] J.A. Pardo, L. Pardo, and M.C. Pardo, Minimum φ-divergence estimator in logistic regression models, Statist. Papers 47 (2005), pp. 91–108. [20] D. Pregibon, Logistic regression diagnostics, Ann. Statist. 9 (1981), pp. 705–724. [21] C.R. Rao and H. Toutenburg, Linear Models: Least Squares and Alternatives, 2nd ed., Springer, New York, 1999. [22] S. Weisberg, Applied Linear Regression, John Wiley & Sons, New York, 1980. [23] D. Zelterman, Models for Discrete Data, Oxford University Press, New York, 2005.