Testing the order of Markov dependence in DNA sequences

Thumbnail Image
Full text at PDC
Publication Date
Menéndez Calleja, María Luisa
Pardo Llorente, María del Carmen
Zografos, Konstantinos
Advisors (or tutors)
Journal Title
Journal ISSN
Volume Title
Google Scholar
Research Projects
Organizational Units
Journal Issue
DNA or protein sequences are usually modeled as probabilistic phenomena. The simplest model is created on the assumption that the nucleotides at the various sites are independently distributed. Usually the type of nucleotide at some site depends on the type at another site and therefore the DNA sequence is modeled as a Markov chain of random variables taking on the values A, G, C and T corresponding to the four nucleotides. First order or higher order Markov models provide better fit to a DNA sequence. Based on this remark, the aim of this paper is to present and study a family of test statistics for testing order Markov dependence in DNA sequences. This new family includes as a particular case the classical likelihood ratio test. A simulation study is presented in order to find test statistics, in this family, with a better behaviour than the likelihood ratio test.
Unesco subjects
Avery PJ, Henderson DA (1999) Fitting Markov chain models to discrete state series such as DNA sequences. Appl Stat 48:53–61 Bejerano G, Friedman N, Tishhy N (2004) Efficient exact p-value computation for small sample, sparse and surprising categorical data. J Comput Biol 11:867–886 Bell GI, Sánchez-Pescador R, Laybourn PJ, Najarian RC (1983) Exon duplication and divergence in the human preproglucagon gene. Nature 304:368–371 Billingsley P (1961a) Statistical methods in Markov chains. Ann Math Stat 32:13–39 Billingsley P (1961b) Statistical inference for Markov processes. The University of Chicago Press, Chicago Ewens WJ, Grant GR (2005) Statistical methods in bioinformatics (2nd edn). Springer, New York. Hoel PG (1954) A test for Markov chains. Biometrika 14:430–433 Menéndez ML, Pardo JA, Pardo L (2001) Csiszar’s ϕ-divergences for testing the order in a Markov chain. Stat Pap 42:313–328 Menéndez ML, Pardo JA, Pardo L, Zografos K (2006) On tests of independence based on minimum φ-divergence estimator with constraints: an application to modeling DNA. Comput Stat Data Anal 51(2):1100–1118 Patel NR (2003) An exact test for homogeneity of a Markov chain. Pardo L (2006) Statistical inference based on divergence measures. Chapman & Hall/CRC, New York Pardo L,Morales D, Salicrú M, MenéndezML (1993) The ϕ-divergence statistic in bivariate multinomial populations including stratification. Metrika 40:223–235 Read TRC, Cressie NAC (1988) Goodness-of-fit statistics for discrete multivariate data. Springer, New York Reinert G, Schbath S, Waterman MS (2000) Probabilistic and statistical properties of words: and overview. J Comput Biol 7:1–46 Zografos K (1993) Asymptotic properties of φ-divergence statistic and applications in contingency tables. Int J Math Stat Sci 2:5–21