On tests of independence based on minimum phi-divergence estimator with constraints: An application to modeling DNA

Thumbnail Image
Full text at PDC
Publication Date
Menéndez Calleja, María Luisa
Zografos, Konstantinos
Advisors (or tutors)
Journal Title
Journal ISSN
Volume Title
Elsevier Science
Google Scholar
Research Projects
Organizational Units
Journal Issue
A new family of estimators, Minimum phi-divergence estimators, is introduced for the problem of independence in a two-way contingency table and their asymptotic properties are studied. Based on this new family of estimators, a new family of test statistics for the problem of independence is defined. This new family of test statistics yield the likelihood ratio test and the Pearson test statistic as special cases. A simulation study is presented to show that some new test statistics offer an attractive alternative to the classical Pearson and likelihood ratio test statistics for this problem. The procedures proposed in this paper can be used for testing positional independence of a DNA sequence as it is illustrated by a numerical example.
Unesco subjects
Agresti, A., 2002. Categorical Data Analysis. second ed. Wiley, NewYork. Aitchison, J., Silvey, S.D., 1958. Maximum-likelihood estimation of parameters subject to restraints. Ann. Math. Statist. 29, 813–828. Ali, S.M., Silvey, S.D., 1966. A general class of coefficient of divergence of one distribution from another. J. Roy. Statist. Soc. 28 (1), 131–142. Cressie, N., Read, T.R.C., 1984. Multinomial goodness-of-fit tests. J. Roy. Statist. Soc. Ser. B 46, 440–464. Cressie, N., Pardo, L., Pardo, M.C., 2003. Size and power considerations for testing loglinear models using -divergence test statistics. Statist. Sinica 13, 550–570. Crowder, S., Holton, J., Alber, T., 2001. Covariance analysis of RNA recognition motifs identifies functionally linked amino acids. J. Mol. Biol. 310, 793–800. Csiszár, I., 1963. Eine Informationstheoretische Ungleichung und ihre Anwendung auf den Bewis der Ergodizität on Markhoffschen Ketten. Publ. Math. Inst. Hungarian Acad. Sci. 8, 84–108. Dale, J.R., 1986. Asymptotic normality of goodness-of-fit statistics for sparse product multinomials. J. Roy. Statist. Soc. Ser. B 41, 48–59. Ewens,W.J., Grant, G.R., 2005. Statistical methods in bioinformatics: an introduction. second ed. Springer, Berlin. Griffiths, R.C., Tavaré, S., 1994. Ancestral inference in population genetics. Statist. Sci. 9, 307–319. Higgins, J.J., Pucilowska, J., Lombardi, R.Q., Rooney, J.P., 2004. A mutation in a novel ATP-dependent Lon protease gene in a kindred with mild mental retardation. Neurology 63, 1927–1931. Holste, D., Grosse, I., Herzel, H., 2001. Statistical analysis of the DNA sequence of human chromosome 22. Phys. Rev. E 64 (9), 041917. Holste, D., Grosse, I., Beirer, S., Schieg, P., Herzel, H., 2003. Repeats and correlations in human DNA sequences. Phys. Rev. E 67 (7), 061913. Johnson, N.S., 1975. C method for testing for significance in the r × c contingency table. J. Amer. Statist. Assoc. 70, 942–947. Larson, S.M., Di Nardo, A.A., Davidson, A.R., 2000. Analysis of covariation in an SH3 domain sequence alignment: applications in tertiary contact prediction and the design of compensating hydrophobic core substitutions. J. Mol. Biol. 303, 433–446. Morales, D., Pardo, L., Pardo, M.C., Vajda, I., 2000a. Extension of theWald statistic to models with dependent observations. Metrika 52, 97–113. Morales, D., Pardo, L., Vajda, I., 2000b. Rényi statistics in directed families of exponential experiments. Statistics 34, 151–174. Navidi,W., Arnheim, N., 1994. Analysis of genetic data from the polymerase chain reaction. Statist. Sci. 9, 320–333. Nelson, D., Speed, T., 1994. Statistical issues in constructing high resolution physical maps. Statist. Sci. 9, 334–354. Pardo, L., Morales, D., Salicrú, M., Menéndez, M.L., 1993. The -divergence statistic in bivariate multinomial populations including stratification. Metrika 40, 223–235. Pardo, J.A., Pardo, L., Zografos, K., 2002. Minimum-divergence estimators with constraints in multinomial populations. J. Statist. Plann. Inference. 104, 221–237. Pinheiro, H.P., Pinheiro, A.S., Sen, P.K., 2005. Comparison of genomic sequences using the Hamming distance. J. Statist. Plann. Inference. 130, 325–339. Qi,Y., Grishin, N.V., 2004. PCOAT: positional correlation analysis using multiple methods. Bioinformatics 20, 3697–3699.