Wald type and Phi-divergence based test-statistics for isotonic binomial proportions Martin, N.1, Mata, R.2 and Pardo, L.2 1Dep. Statistics, Carlos III University of Madrid, 28903 Getafe (Madrid), Spain 2Dep. Statistics and O.R., Complutense University of Madrid, 28040 Madrid, Spain February 28, 2014 Abstract In this paper new test statistics are introduced and studied for the important problem of testing hypothesis that involves inequality constraint on proportions when the sample comes from independent binomial random variables: Wald type and phi-divergence based test- statistics. As a particular case of phi-divergence based test-statistics, the classical likelihood ratio test is considered. An illustrative example is given and the performance of all of them for small and moderate sample sizes is analyzed in an extensive simulation study. Keywords and phrases: Wald-type statistics, Phi-divergence statistics, Inequality constrains, Loglinear model, Logistic regression. 1 Introduction Ordinal categorical data appear frequently in the biomedical research literature, for example, in the analysis of I independent binary random variables related to an increasing ordered cat- egorical variable. It is well-known that for such data it is not possible to use the classical test-statistics such as chi-square or likelihood ratio with chi-squared asymptotic distribution, but there exist appropriate order-restricted test-statistics with chi-squared-bar asymptotic dis- tribution. To illustrate this problem a modification of an example given in Silvapulle and Sen (2005) is considered in this introductory section. Table 1 contains a subset of data from a prospective study of maternal drinking and con- genital malformations. Women completed a questionnaire, early in their pregnancy, concerning alcohol use in the first trimester; complete data and details are available elsewhere (Graubard 1 ar X iv :1 40 2. 67 17 v1 [ st at .M E ] 2 6 Fe b 20 14 i ni ni1 ni2 1 17114 48 17066 2 14502 38 14464 3 793 5 788 4 165 2 163 Table 1: Number of individuals with (j=1) and without (j=0) congenital sex-organ malformation cross-classified according to the maternal alcohol consumption level (i=1,2,3,4). and Korn, 1987). Specifically, women were asked what was the amount of alcohol taken dur- ing the first three months of their pregnancy and four categories of drink doses are considered (I = 4), no alcohol consumption (i = 1), average number of alcoholic drinks per day less than one but greater than zero (i = 2), one or more and less than three alcoholic drinks per day (i = 3) and three or more alcoholic drinks per day (i = 4). In terms of a binary random variable with ni individuals in total (see the second column in Table 1) with independent behavior with respect to having congenital malformations, the individuals not having congenital malformations are considered to be unsuccessful (j = 2, see the last column in Table 1) and successful otherwise (j = 1, see the third column in Table 1). Let πi be the probability of a success associated with the i-th alcohol dose. Let us consider some statistical inference questions that may arise in this example and in similar ones with binomial probabilities. 1. Is there any evidence of maternal alcohol consumption being related to malformation of sex organ? To answer this question, the null and alternative hypotheses may be formulated as H0 : π1 = π2 = π3 = π4 vs. H1 : π1, π2, π3, π4 are not all equal, respectively. However, this formulation is unlikely to be appropriate because the main issue of interest is the possible increase in the probability of malformation as alcohol consumption increases. 2. Is there any evidence that an increase in maternal alcohol consumption is associated with an increase in the probability of malformation?. This question, as it stands, is quite broad to give a precise formulation of the null and the alternative hypotheses. One possibility is to formulate the problem in the following way, H0 : π1 = π2 = π3 = π4 vs. H1 : π1 ≤ π2 ≤ π3 ≤ π4 with at least one inequality being strict. (1) Consider an experiment with I increasing ordinal categories for a variable X. Suppose that ni prefixed individuals are assigned to the i-th category and n = ∑I i=1 ni. The individuals are 2 followed over time for the development of an event of interest Y and the events related to the individuals are independent. Let Ni1 be the random variable that represents the number of individuals related to successful events (Y = 1) out of the total assigned to the i-th category, ni, i = 1, ..., I. If we denote by πi = Pr(Y = 1|X = i) the probability of a success associated with the i-th category, we have that Ni1 is a Binomial random variable with parameters ni and πi, i = 1, ..., I. Let Ni2 denote the number of unsuccessful events associated with the i-th category, i.e. Ni2 = ni − Ni1, then the contingency table of a realization of (Ni1, Ni2), i = 1, ..., I, is in the last two columns of the following table n1 n11 n12 = n1 − n11 ... ... ... ni ni1 ni2 = ni − ni1 ... ... ... nI nI1 nI2 = nI − nI1 . Our purpose in this paper is to propose new order-restricted test statistics, Wald-type and phi-divergence based test-statistics for testing H0 : π1 = π2 = · · · = πI , (2) H1 : π1 ≤ π2 ≤ · · · ≤ πI with at least one inequality being strict. The classical likelihood ratio test statistic will appear as a particular case of phi-divergence based test-statistics. A log-linear formulation of (1) is proposed in Section 2, fundamental for defining the Wald type test-statistics. In Section 3 the families of phi-divergence test statistics are presented. Section 4 is devoted to solve the problem presented in this Section 1 for a illustrative example. An extensive simulation study is carried out in Section 5. 2 Formulation for isotonic binomial proportions in terms of log-linear and logistic regression modeling: Wald type test- statistics Reparametrizating the initial problem through log-linear modeling, the formulation of the null hypothesis is strongly simplified since all the interaction parameters are zero under the null 3 hypothesis and this is appealing, in special, to create Wald type test-statistics. Let p = p(θ) = (p11(θ), p12(θ), p21(θ), p22(θ), ..., pI1(θ), pI2(θ))T = (n1 n π1, n1 n (1− π1), n2 n π2, n2 n (1− π2), ..., nIn πI , nI n (1− πI))T (3) be the probability vector of the following saturated log-linear model log pij(θ) = u+ u1(i) + θ2(j) + θ12(ij), (4) with u1(I) = 0, θ2(2) = 0, θ12(i2) = 0, i = 1, ..., I − 1, θ12(Ij) = 0, j = 1, 2, (5) being the identifiability constraints, θ = (θ2(1), θ12(11), ..., θ12(1I−1)) T (6) the unknown parameters vector and u = (u, u1(1), ..., u1(I−1)) T with u = u(θ) = log nI/n 1 + exp{θ2(1)} , (7) u1(i) = u1(i)(θ) = log ni nI ( 1 + exp{θ2(1)} ) 1 + exp{θ2(1) + θ12(i1)} , i = 1, ..., I − 1, (8) the redundant parameters, obtained through θ taking into account pi1(θ) + pi0(θ) = ni n , i = 1, ..., I. In terms of the log-linear formulation, (2) is equivalent to H0 : θ12(11) = θ12(21) = · · · = θ12(I−1,1) = 0, (9) H1 : θ12(11) ≤ θ12(21) ≤ · · · ≤ θ12(I−1,1) ≤ 0 with at least one inequality being strict. Notice that θ2(1) is a nuisance parameter since it does not interfere in (9). In particular, under the null hypothesis of π0 = π1 = π2 = · · · = πI , the value of the nuisance parameter is θ2(1) = logit(π0) = log[π0/(1 − π0)], and thus it contains all the information about the homogeneous probability vector. In matrix notation, we can express the vector of parameters of the log-linear model in terms of the following logistic regression logit(π) = Xθ (10) 4 where π = (π1, π2, ..., πI) T , X = ( 1I−1 II−1 1 0TI−1 ) , Ia is the the identity matrix of order a, 1a is the a-vector of ones and 0a is the a-vector of zeros. Since a saturated model has been considered, X is a full rank matrix and thus we can consider θ = X−1logit(π), (11) and on the other hand (9) in matrix notation is given by H0 : Rθ = 0I−1, (12) H1 : Rθ ≤ 0I−1 and Rθ 6= 0I−1, with R = (0I−1,GI−1), and GI−1 is a square matrix of order I−1 with 1-s in the main diagonal and −1-s in the upper superdiagonal. We shall consider three parameter spaces for θ Θ0 = { θ ∈ RI : Rθ = 0I−1 } ⊂ Θ̃ = { θ ∈ RI : Rθ ≤ 0I−1 } ⊂ Θ = RI , i.e. while Θ is the restricted parameter space, Θ̄ is unrestricted, and Θ0 becomes the parameter space under the null hypothesis. It is well known that, the Fisher information matrix for θ ∈ Θ in the logistic regression is given by IF (θ) = XTdiag{νiπi(1− πi)}Ii=1X, (13) where νi = limn→∞ ni n , i = 1, ..., I. The following result provides the explicit expression of the Fisher information matrix under the null hypothesis given in (2) or (9). Theorem 1 For θ0 ∈ Θ0 in the model (4) or (10), the Fisher information matrix is given by IF (θ̂) = π0(1− π0)  1 ν1 ν2 · · · νI−1 ν1 ν1 0 · · · 0 ν2 0 ν2 0 ... ... . . . ... νI−1 0 0 · · · νI−1  . (14) Proof. It is immediate by plugging πi = π0, i = 1, ..., I, to (13). If θ̂, θ̃ and θ represent the maximum likelihood estimator (MLE) of θ focussed on the 5 parameter spaces, Θ0, Θ̃, Θ respectively, according to Silvapulle and Sen (2005, pages 154 and 166) we can consider three Wald-type test-statistics, W (θ̃, θ̂) = nθ̃ T RT ( RI−1F (θ̂)RT )−1 Rθ̃, (15) H(θ̃, θ̂) = n(θ̃ − θ̂)TIF (θ̂)(θ̃ − θ̂), (16) D(θ, θ̃, θ̂) = n(θ − θ̂)TIF (θ̂)(θ − θ̂)− n(θ − θ̃)TIF (θ̃)(θ − θ̃), (17) where IF (θ̂) = π̂0(1− π̂0) ( 1 (ν̂∗)T ν̂∗ diag(ν̂∗) ) , IF (θ̃) = XTdiag{ν̂iπ̃i(1− π̃i)}Ii=1X, ν̂∗ = (ν̂1, ..., ν̂I−1) T = ( n1 n , ..., nI−1 n )T . These test-statistics, have according to Proposition 4.4.1 in Silvapulle and Sen (2005), the same asymptotic distribution as the likelihood ratio test-statistic. Proposition 2 Under the null hypothesis given in (2) or (9), the expression of W (θ̃, θ̂), given in (15), is as follows W (θ̃, θ̂) = nπ̂0(1− π̂0)(θ̃ ∗ )TΣν̂∗ θ̃ ∗ , where θ∗ = (θ12(11), ..., θ12(1I−1)) T , Σν̂∗ = diag(ν̂∗)− ν̂∗(ν̂∗)T , π̂0 = N•1 n , N•1 = N11 +N21 + ...+NI1. Proof. From Theorem 1 the 2× 2 block structure of the Fisher information matrix is IF (θ0) = π̂0(1− π̂0) ( 1 (ν̂∗)T ν̂∗ diag(ν̂∗) ) and I−1F (θ0) = 1 π̂0(1− π̂0) ( a bT c D ) , with D = Σ−1 ν̂∗ , Σν̂∗ = diag(ν̂∗)− ν̂∗(ν̂∗)T . But RI−1F (θ̂)RT = 1 π̂0(1− π̂0) GI−1Σ −1 ν̂∗GT I−1 6 and Rθ̃ = GI−1θ ∗. Therefore, W (θ̃, θ̂) = nθ̃ T RT ( RI−1F (θ̂)RT )−1 Rθ̃ = nπ̂0(1− π̂0)(θ̃ ∗ )TΣν̂∗ θ̃ ∗ . There is an explicit formula for the MLEs of θ under the null hypothesis, θ̂ = (θ̂2(1), θ̂12(11), ..., θ̂12(1I−1)) T = (logit(π̂0), 0, ..., 0)T . (18) For the calculation of θ̃ = (θ̃2(1), θ̃12(11), ..., θ̃12(1I−1)) T or θ = (θ2(1), θ12(11), ..., θ12(1I−1)) T , it is much easier to calculate first the corresponding MLE for the probability vector, π̃ = (π̃1, π̃2, ..., π̃I) T or π = (π1, π2, ..., πI) T , and plugging it to (11). There is an explicit formula for π = (N11 n1 , N21 n2 , ..., NI1nI )T , (19) and for calculating π̃ the following PAVA algorithm can be used. Algorithm 3 (Order restricted estimation of probabilities) The MLE of π = (π1, π2, · · · , πI)T under the restriction of π1 ≤ π2 ≤ · · · ≤ πI , π̃ = (π̃1, π̃2, · · · , π̃I)T , is calculated in the following way: STEP 1: Do π̃ := π, where π is (19). STEP 2: While not π̃i ≤ π̃i+1 ∀i = 1, ..., I − 1 do For i = 1, ..., I − 1 If π̃i 6≤ π̃i+1 do π̃i := ni n π̃i + ni+1 n π̃i+1 and π̃i+1 := π̃i. 3 Phi-divergence test statistics The classical order-restricted likelihood ratio test for testing (2) is given by G2 = 2 I∑ i=1 ( Ni1 log π̃i π̂0 + (ni −Ni1) log 1− π̃i 1− π̂0 ) (see for instance Mancuso et al (2001)). The Kullback-Leibler divergence measure between two 2I-dimensional probability vectors p = (p11, p12, ..., pI1, pI2) T and q = (q11, q12, ..., qI1, qI2) T , is 7 given by dKull(p, q) = I∑ i=1 ( pi1 log pi1 qi1 + pi2 log pi2 qi2 ) . It is an easy exercise to verify that G2 = 2n(dKull(p,p(θ̂))− dKull(p,p(θ̃))), (20) where p = p(θ) = (n1 n π1, n1 n (1− π1), n2 n π2, n2 n (1− π2), ..., nIn πI , nI n (1− πI))T = (N11 n , N12 n , N21 n , N22 n , ..., NI1n , NI2n )T , p̃ = p(θ̃) = (n1 n π̃1, n1 n (1− π̃1), n2 n π̃2, n2 n (1− π̃2), ..., nIn π̃I , nI n (1− π̃I))T , p̂ = p(θ̂) = (n1 n π̂0, n1 n (1− π̂0), n2 n π̂0, n2 n (1− π̂0), ..., nIn π̂0, nI n (1− π̂0))T . The classical order-restricted chi-square test statistic for testing (2), known as Bartholomew’s test-statistic, is given by X2 = 1 π̂0 (1− π̂0) I∑ i=1 ni (π̃i − π̂0)2 , (21) which can be written as X2 = 2ndPearson(p(θ̃),p(θ̂)), (22) where dPearson(p, q) is the Pearson divergence measure defined by dPearson(p, q) = 1 2 I∑ i=1 ( (pi1−qi1)2 qi1 + (pi2−qi2)2 qi2 ) . Details about this test-statistic can be found in Fleiss et al. (2003, Section 9.3). More general than the Kullback-Leibler divergence and Pearson divergence measures are φ-divergence measures, defined as dφ(p, q) = I∑ i=1 ( qi1φ ( pi1 qi1 ) + qi2φ ( pi2 qi2 )) , where φ : R+ −→ R is a convex function such that φ(1) = φ′(1) = 0, φ′′(1) > 0, 0φ(00) = 0, 0φ(p0) = p limu→∞ φ(u) u , for p 6= 0. For more details about φ-divergence measures see Pardo (2006). Based on φ-divergence measures we shall consider in this paper two families of order- 8 restricted φ-divergence test statistics valid for testing (2) or (9). The first one generalizes the order-restricted likelihood ratio test given in (20) in the sense that we replace on it the Kullback-Leibler divergence measure by a phi-divergence measure and its expression is Tφ(p,p(θ̃),p(θ̂)) = 2n φ′′(1) (dφ(p,p(θ̂))− dφ(p,p(θ̃))) (23) = 2 φ′′(1) { I∑ i=1 ni ( π̂0φ ( Ni1 niπ̂0 ) − π̃iφ ( Ni1 niπ̃i ) + (1− π̂0)φ ( ni −Ni1 ni(1− π̂0) ) − (1− π̃i)φ ( ni −Ni1 ni (1− π̃i) ))} . For φ(x) = x log x− x+ 1, we get the likelihood ratio test. The second one generalizes the order-restricted Pearson test statistic given in (22) in the sense that we replace on it the Pearson divergence measure by a phi-divergence measure and its expression is Sφ(p(θ̃),p(θ̂)) = 2n φ′′(1) dφ(p(θ̃),p(θ̂)) (24) = 2 φ′′(1) { I∑ i=1 ni ( π̂0φ ( π̃i π̂0 ) + (1− π̂0)φ ( 1− π̃i 1− π̂0 ))} . For φ(x) = 1 2 (x− 1)2, we get the Pearson test-statistics. The following theorem provides the link between the both test-statistics, Tφ(p,p(θ̃),p(θ̂)) and Sφ(p(θ̃),p(θ̂)), and Wald-type test-statistics. Theorem 4 For testing (2) or (9), the asymptotic distribution of T ∈ {Tφ(p,p(θ̃),p(θ̂)), Sφ(p(θ̃),p(θ̂)),W (θ̃, θ̂), H(θ̃, θ̂), D(θ, θ̃, θ̂)} is common and is given by lim n→∞ Pr(T ≤ x) = I−1∑ i=0 wi(I − 1,V ) Pr(χ2 i ≤ x), where V = GI−1diag−1(ν∗)GT I−1 + 1 νI eI−1e T I−1, (25) and {wi(I − 1,V )}I−1i=0 is the set of weights such that ∑I−1 i=0 wi(I − 1,V ) = 1 and its values are given in Theorem 5. Proof. Let θ̂ be the I-dimensional vector given in (18). The second order Taylor expansion of 9 function dφ(θ) = dφ(p(θ),p(θ̂)) about θ̂ is dφ(θ) = dφ(θ̂) + (θ− θ̂)T ∂ ∂θ dφ(θ) ∣∣∣∣ θ=θ̂ + 1 2 (θ− θ̂)T ∂2 ∂θ∂θT dφ(θ) ∣∣∣∣ θ=θ̂ (θ− θ̂) + o (∥∥∥θ − θ̂∥∥∥2) , (26) where ∂ ∂θ dφ(θ) ∣∣∣∣ θ=θ̂ = 0I−1, ∂2 ∂θ∂θT dφ(θ) ∣∣∣∣ θ=θ̂ = φ′′ (1) I(n1,...,nI) F (θ̂), and I(n1,...,nI) F (θ) = XTdiag{nin πi(1− πi)} I i=1X. Let θ = X−1logit(π), where π is (19). In particular, for θ = θ we have dφ(p(θ),p(θ̂)) = φ′′ (1) 2 (θ − θ̂)TI(n1,...,nI) F (θ̂)(θ − θ̂) + o (∥∥∥θ − θ̂∥∥∥2) and for θ = θ̃ dφ(p(θ̃),p(θ̂)) = φ′′ (1) 2 (θ̃ − θ̂)TI(n1,...,nI) F (θ̂)(θ̃ − θ̂) + o (∥∥∥θ̃ − θ̂∥∥∥2) , and then taking into account that limn→∞ I(n1,...,nI) F (θ) = IF (θ), Tφ(p,p(θ̃),p(θ̂)) = 2n φ′′(1) (dφ(p,p(θ̂))− dφ(p,p(θ̃))) = n(θ − θ̂)TIF (θ̂)(θ − θ̂)− n(θ − θ̃)TIF (θ̃)(θ − θ̃) + oP (1) = D(θ, θ̃, θ̂) + oP (1) . In a similar way it is obtained dφ(p(θ),p(θ̃)) = φ′′ (1) 2 (θ − θ̃)TI(n1,...,nI) F (θ̃)(θ − θ̃) + o (∥∥∥θ − θ̃∥∥∥2) , 10 and then taking into account that limn→∞ I(n1,...,nI) F (θ) = IF (θ), Sφ(p(θ̃),p(θ̂)) = 2n φ′′(1) dφ(p(θ̃),p(θ̂)) = n(θ̃ − θ̂)TIF (θ̂)(θ̃ − θ̂) + oP (1) = H(θ̃, θ̂) + oP (1) According to Proposition 4.4.1 in Silvapulle and Sen (2005) G2 = D(θ, θ̃, θ̂) + oP (1) = H(θ̃, θ̂) + oP (1) = W (θ̃, θ̂) + oP (1) , which means that the asymptotic distribution of T ∈ {Tφ(p,p(θ̃),p(θ̂)), Sφ(p(θ̃),p(θ̂)),W (θ̃, θ̂), H(θ̃, θ̂), D(θ, θ̃, θ̂)} is common. Such a distribution can be established from the likelihood ratio test-statistic used for the problem formulated in (6.13) of Silvapulle and Sen (2005) lim n→∞ Pr(G2 ≤ x) = I−1∑ i=0 wi(I − 1,V ) Pr(χ2 i ≤ x), where V = Bdiag−1(ν)BT , with ν = (ν∗, νI) T = (ν1, ..., νI) T , B = (GI−1,−eI−1). Using the partitioned structure of B and ν, (25) is obtained. The following result is based on the third way for computation of weights given in page 79 of Silvapulle and Sen (2005). Theorem 5 The set of weights {wi(I − 1,V )}I−1i=0 of the asymptotic distribution given in The- orem 4 is computed as follows wi(I − 1,V ) = Pr ( arg min ζ∈RI−1 + (Z − ζ)TV −1(Z − ζ) ∈ RI−1+ (i) ) , i = 0, ..., I − 1 (27) Z ∼ NI−1(0I−1,V ), RI−1+ = {ζ ∈ RI−1+ : ζ ≥ 0} and RI−1+ (i) ⊂ RI−1+ such that i components 11 are strictly positive and I − 1 − i components are null. In particular, for I ∈ {2, 3, 4} (27) has the explicit expressions given in (3.24), (3.25) and (3.26) of Silvapulle and Sen (2005). For computing (27) is useful to know the following explicit expression V −1 = T TI−1Σν∗T I−1, where T I−1 = G−1I−1 is an upper triangular matrix of 1-s, Σν∗ = diag(ν∗) − ν∗(ν∗)T , and to simulate the probability according to the following algorithm. For simulation V̂ = GI−1diag−1(ν̂∗)GT I−1 + 1 ν̂I eI−1e T I−1, (28) V̂ −1 = T TI−1Σν̂∗T I−1, (29) are needed rather than V and V −1 respectively. Algorithm 6 (Estimation of weights) The estimators of the weights given in Theorem 5, ŵi = wi(I − 1, V̂ ), are obtained by Monte Carlo in the following way: STEP 1: For i = 0, ..., I − 1, do N(i) := 0. STEP 2: Repeat the following steps R (say R = 1, 000, 000) times: STEP 2.1: Generate an observation, z, from Z ∼ NI−1(0I−1, V̂ ). E.g., the NAG Fortran library subroutines G05CBF, G05EAF, and G05EZF can be useful. STEP 2.2: Compute ζ̂(z) = arg minζ∈RI−1 + 1 2ζ T V̂ −1 ζ−(V̂ −1 z)T ζ. E.g., the IMSL Fortran library subroutine DQPROG can be useful. STEP 2.3: Count i∗, the number of strictly positive components contained in ζ̂(z), and do N(i∗) := N(i∗) + 1. STEP 3: Do ŵi := N(i) R for i = 0, ..., I − 1. 4 Example In this section the data set of the introduction (Table 1) is analyzed. The sample, a realization of N , is summarized in the following vector n = (n11, n12, n21, n22, n31, n32, n41, n42) T = (48, 17066, 38, 14464, 5, 788, 2, 163)T . 12 The estimated vectors of interest are π = ( 48 17114 , 38 14502 , 5 793 , 2 165 )T = (0.0028, 0.0026, 0.0063, 0.0121)T , π̃ = ( 43 15 808 , 43 15 808 , 5 793 , 2 165 )T = (0.0027, 0.0027, 0.0063, 0.0121)T , π̂0 = 93 32 574 = 0.0029, and θ = X−1logit(π) = (−4. 400 6,−1. 473,−1. 541 2,−0.659 46)T , θ̃ = X−1logit(π̃) = (−4. 400 6,−1. 503 7,−1. 503 7,−0.659 46)T θ̂ = (logit(π̂0), 0, 0, 0)T = (−5.8558, 0, 0, 0)T . For the asymptotic distribution the weighs can be calculated though V̂ =  83774.0156250 −14428.7177734 0 −14428.7177734 15217.7109375 −788.9927979 0 −788.9927979 1457.5666504  , calculating the correlation coefficients ρ̂12 = −0.40411, ρ̂13 = 0, ρ̂23 = −0.16753, the partial correlation coefficients ρ̂12•3 = −0.4099, ρ̂13•2 = −0.07507 2, ρ̂23•1 = −0.183 15, and evaluating the following expressions ŵ3 = 1 4π (2π − arccos (ρ̂12)− arccos (ρ̂13)− arccos (ρ̂23)) = 0.07850, ŵ2 = 1 4π (3π − arccos(ρ̂12•3)− arccos(ρ̂13•2)− arccos(ρ̂23•1)) = 0.32075, ŵ1 = 0.5− w0(θ̂) = 0.5− 0.07850 = 0.4215, ŵ0 = 0.5− w1(θ̂) = 0.5− 0.32075 = 0.17925. If we take φλ(x) = 1 λ(1+λ)(x λ+1 − x− λ(x− 1)), where for each λ ∈ R− {−1, 0}, the “power 13 divergence family” is obtained dλ(p, q) = 1 λ(λ+ 1)  I∑ i=1 J∑ j=1 pλ+1 ij qλij − 1  , for each λ ∈ R− {−1, 0}. (30) It is also possible to cover the real line for λ, by defining dλ(p, q) = limt→λ dt(p, q), for λ ∈ {−1, 0}. It is well known that d0(p, q) = dKull(p, q) and d1(p, q) = dPearson(p, q). This is very interesting since this means that the power divergence based family of test-statistics contain as special cases G2 and X2. Finally, the expressions of the test-statistics are summarized in Table 2. It can be seen that the null hypothesis cannot be rejected for W (θ̃, θ̂), H(θ̃, θ̂), D(θ, θ̃, θ̂), T−1.5(p,p(θ̃),p(θ̂)), T−1(p,p(θ̃),p(θ̂)), T−0.5(p,p(θ̃),p(θ̂)), S−1.5(p(θ̃),p(θ̂)), S−1(p(θ̃),p(θ̂)), S−0.5(p(θ̃),p(θ̂)) and should be rejected for T0(p,p(θ̃),p(θ̂)), T 2 3 (p,p(θ̃),p(θ̂)), T1(p,p(θ̃),p(θ̂)), S0(p(θ̃),p(θ̂)), S 2 3 (p(θ̃),p(θ̂)), S1(p(θ̃),p(θ̂)). Even though the sample size seems to be large enough, this is a case of small values of π1, π2, π3, π4 which is known to have not reliable behavior in the val- ues calculated for the p-values in order to make decisions. In the simulation study we shall study such a case and according to the results the rejection of the null hypothesis is supported since with {Tλ(p,p(θ̃),p(θ̂))}λ∈{0, 2 3 ,1} are obtained the most realiable test-statistics. As conclussion, an increase in maternal alcohol consumption is associated with an increase in the probability of malformation. test-statistic λ = −1.5 λ = −1 λ = −0.5 λ = 0 λ = 2 3 λ = 1 Tλ(p,p(θ̃),p(θ̂)) 3.3068 3.8173 4.4920 5.4057 7.2076 8.4895 p-value(Tλ) 0.1177 0.0911 0.0650 0.0413 0.0169 0.0090 Sλ(p(θ̃),p(θ̂)) 3.2993 3.8124 4.4896 5.4057 7.2107 8.4942 p-value(Sλ) 0.1181 0.0913 0.0651 0.0413 0.0169 0.0090 W (θ̃, θ̂) 2.5979 H(θ̃, θ̂) 2.6363 D(θ, θ̃, θ̂) 2.6462 p-value(W (θ̃, θ̂)) 0.1686 p-value(H(θ̃, θ̂)) 0.1653 p-value(D(θ, θ̃, θ̂)) 0.1645 Table 2: Power divergence based test-statistics, Wald type statistics and their corresponding asymptotic p-values. 5 Simulation study For testing (1), by considering I = 4 binomial random variables, the following scenarios will be considered: • scenario A (small/big proportions): n1 = 40, n2 = 30, n3 = 20, n4 = 10. – scenario A-0: π1 = π2 = π3 = π4 = 0.05. 14 – scenario A-1: π1 = 0.05, π2 = π3 = π4 = 0.1. – scenario A-2: π1 = 0.05, π2 = 0.1, π3 = π4 = 0.125. – scenario A-3: π1 = 0.05, π2 = 0.1, π3 = 0.125, π4 = 0.135. • scenario B (small/big proportions): n1 = 60, n2 = 45, n3 = 30, n4 = 15. – scenario B-0: π1 = π2 = π3 = π4 = 0.05. – scenario B-1: π1 = 0.05, π2 = π3 = π4 = 0.1. – scenario B-2: π1 = 0.05, π2 = 0.1, π3 = π4 = 0.125. – scenario B-3: π1 = 0.05, π2 = 0.1, π3 = 0.125, π4 = 0.135. • scenario C (small/big proportions): n1 = 100, n2 = 75, n3 = 50, n4 = 25. – scenario C-0: π1 = π2 = π3 = π4 = 0.05. – scenario C-1: π1 = 0.05, π2 = π3 = π4 = 0.1. – scenario C-2: π1 = 0.05, π2 = 0.1, π3 = π4 = 0.125. – scenario C-3: π1 = 0.05, π2 = 0.1, π3 = 0.125, π4 = 0.135. • scenario D (intermediate proportions): n1 = 40, n2 = 30, n3 = 20, n4 = 10. – scenario D-0: π1 = π2 = π3 = π4 = 0.35. – scenario D-1: π1 = 0.35, π2 = π3 = π4 = 0.45. – scenario D-2: π1 = 0.35, π2 = 0.45, π3 = π4 = 0.475. – scenario D-3: π1 = 0.35, π2 = 0.45, π3 = 0.475, π4 = 0.485. • scenario E (intermediate proportions): n1 = 60, n2 = 45, n3 = 30, n4 = 15. – scenario E-0: π1 = π2 = π3 = π4 = 0.35. – scenario E-1: π1 = 0.35, π2 = π3 = π4 = 0.45. – scenario E-2: π1 = 0.35, π2 = 0.45, π3 = π4 = 0.475. – scenario E-3: π1 = 0.35, π2 = 0.45, π3 = 0.475, π4 = 0.485. • scenario F (intermediate proportions): n1 = 100, n2 = 75, n3 = 50, n4 = 25. – scenario F-0: π1 = π2 = π3 = π4 = 0.35. – scenario F-1: π1 = 0.35, π2 = π3 = π4 = 0.45. 15 – scenario F-2: π1 = 0.35, π2 = 0.45, π3 = π4 = 0.475. – scenario F-3: π1 = 0.35, π2 = 0.45, π3 = 0.475, π4 = 0.485. The simulation experiment is performed with R = 50000 replications and in each of them, apart from the Wald type test-statistics T (h) ∈ {W (h)(θ̃, θ̂), H(h)(θ̃, θ̂), D(h)(θ, θ̃, θ̂)}, h = 1, ..., R, all the power divergence test statistics, T (h) ∈ {T (h) λ (p,p(θ̃),p(θ̂)), S (h) λ (p(θ̃),p(θ̂))}λ∈I , h = 1, ..., R, associated with the interval I = (−1.5, 3) are considered. From the p-values it is possible to calculate the proportion of replications rejected according with the nominal size α = 0.05, i.e. ∑R h=1 I{p-value(T (h)) ≤ α} R , (31) where I{•} represents the indicator function. The scenarios ending in 0 represent that the null hypothesis is true, and are useful to obtain the simulated significance levels, α̂T , T ∈ {W (θ̃, θ̂), H(θ̃, θ̂), D(θ, θ̃, θ̂), Tλ(p,p(θ̃),p(θ̂)), Sλ(p(θ̃),p(θ̂))}λ∈I , with different sample sizes and kinds of test-statistics. The scenarios ending in either 1, 2 or 3 represent that the null hypothesis is false and are useful to obtain the simulated powers, β̂T , with different alternatives, sample sizes and types of test-statistics. For calculating both, α̂T and β̂T , (31) is applied, each one in the corresponding scenario. In Figures 1 and 2, α̂T and β̂T for all the aforementioned test-statistics are plotted in different scenarios. The curves represent either Tλ(p,p(θ̃),p(θ̂)) or Sλ(p(θ̃),p(θ̂)) power divergence test- statistics, located respectively on left or right of the panel of plots. The asterisk, square and circle symbols, represent W (θ̃, θ̂), H(θ̃, θ̂) and D(θ, θ̃, θ̂) respectively and all of them are repeated on the left as well as on the right in order to make easier their comparison with Tλ(p,p(θ̃),p(θ̂)) or Sλ(p(θ̃),p(θ̂)) respectively. The black color lines and symbols, representing α̂T , are useful to select the test-statistics closed to nominal level According to the criterion given by Dale (1986), a reasonable exact significance level should verify |logit(1− α̂T )− logit(1− α)| ≤ ε for ε ∈ {0.35, 0.7}, being “closed to nominal level” the exact significance levels verifying the inequality with ε = 0.35 and “fairly closed to nominal level” the ones with ε = 0.7. In this study only the test statistics satisfying the condition with ε = 0.35 are considered, and the corresponding upper and lower bounds appear plotted with two horizontal lines, having in the middle the line associated with the nominal level, α = 0.05. Among the test-statistics with simulated significance levels closed to the nominal level, the test-statistics with higher powers should be selected but since in general high powers correspond to high significance levels, this choice is not straightforward. For this reason, based on β̂T0 − α̂T0 or β̂S1 − α̂S1 as baseline, the 16 efficiencies relative to the likelihood ratio test (T0(p,p(θ̃),p(θ̂)) = G2) ρT = ( β̂T − α̂T ) − ( β̂T0 − α̂T0 ) β̂T0 − α̂T0 (32) or the Bartholomew’s test (S1(p(θ̃),p(θ̂)) = X2) ρ∗T = ( β̂T − α̂T ) − ( β̂S1 − α̂S1 ) β̂S1 − α̂S1 (33) are considered. In Table 3, the efficiency of S1 = X2 is compared with respect to T0 = G2, and then if ρS1 < 0, since T0 = G2 is better than S1 = X2, the plot of the efficiencies in Figures 3 and 4 will be only focussed on (32). Similarly, if ρS1 > 0, since T0 = G2 is worse than S1 = X2, the plot of the efficiencies in Figures 3 and 4 will be only focussed on (33). sc A sc B sc C sc D sc E sc F 1 ρS1 < 0 ρS1 < 0 ρS1 < 0 ρS1 < 0 ρS1 < 0 ρS1 < 0 2 ρS1 < 0 ρS1 > 0 ρS1 < 0 ρS1 < 0 ρS1 < 0 ρS1 < 0 3 ρS1 < 0 ρS1 > 0 ρS1 < 0 ρS1 < 0 ρS1 < 0 ρS1 < 0 Table 3: Efficiency of the Bartholomew’s test with respect to the likelihood ratio test. In view of the plots, it is possible to propose test-statistics with better performance in comparison with G2 and X2. From Figures 1 and 3, the so-called Cressie-Read test-statistic, T 2 3 (p,p(θ̃),p(θ̂)), can be recommended for small/big proportions either for small or moderate sample sizes. On the other hand, W (θ̃, θ̂), H(θ̃, θ̂) and the test-statistic based on the Hellinger distance, T−0.5(p,p(θ̃),p(θ̂)), can be recommended for intermediate proportions and moderate sample sizes, however for small sample sizes the likelihood ratio test-statistic still remains being the best one. 17 {Tλ(p,p(θ̃),p(θ̂))}λ∈(−1.5,3) {Sλ(p(θ̃),p(θ̂))}λ∈(−1.5,3) W (θ̃, θ̂), H(θ̃, θ̂), D(θ, θ̃, θ̂) sc A-0 (black), sc A-1 (red), sc A-2 (green), sc A-3 (blue) −1 0 1 2 3 W H D 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 λ β H 0 H 1 H 1 * H 1 ** H 0 , W H 1 , W H 1 *, W H 1 **, W H 0 , H H 1 , H H 1 *, H H 1 **, H H 0 , D H 1 , D H 1 *, D H 1 **, D −1 0 1 2 3 W H D 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 λ β H 0 H 1 H 1 * H 1 ** H 0 , W H 1 , W H 1 *, W H 1 **, W H 0 , H H 1 , H H 1 *, H H 1 **, H H 0 , D H 1 , D H 1 *, D H 1 **, D sc B-0 (black), sc B-1 (red), sc B-2 (green), sc B-3 (blue) −1 0 1 2 3 W H D 0 0.1 0.2 0.3 0.4 0.5 0.6 λ β H 0 H 1 H 1 * H 1 ** H 0 , W H 1 , W H 1 *, W H 1 **, W H 0 , H H 1 , H H 1 *, H H 1 **, H H 0 , D H 1 , D H 1 *, D H 1 **, D −1 0 1 2 3 W H D 0 0.1 0.2 0.3 0.4 0.5 0.6 λ β H 0 H 1 H 1 * H 1 ** H 0 , W H 1 , W H 1 *, W H 1 **, W H 0 , H H 1 , H H 1 *, H H 1 **, H H 0 , D H 1 , D H 1 *, D H 1 **, D sc C-0 (black), sc C-1 (red), sc C-2 (green), sc C-3 (blue) −1 0 1 2 3 W H D 0 0.1 0.2 0.3 0.4 0.5 0.6 λ β H 0 H 1 H 1 * H 1 ** H 0 , W H 1 , W H 1 *, W H 1 **, W H 0 , H H 1 , H H 1 *, H H 1 **, H H 0 , D H 1 , D H 1 *, D H 1 **, D −1 0 1 2 3 W H D 0 0.1 0.2 0.3 0.4 0.5 0.6 λ β H 0 H 1 H 1 * H 1 ** H 0 , W H 1 , W H 1 *, W H 1 **, W H 0 , H H 1 , H H 1 *, H H 1 **, H H 0 , D H 1 , D H 1 *, D H 1 **, D Figure 1: Simulated sizes (black) and powers (red, green, blue) for scenarios A,B,C (small/big proportions). 18 {Tλ(p,p(θ̃),p(θ̂))}λ∈(−1.5,3) {Sλ(p(θ̃),p(θ̂))}λ∈(−1.5,3) W (θ̃, θ̂), H(θ̃, θ̂), D(θ, θ̃, θ̂) sc D-0 (black), sc D-1 (red), sc D-2 (green), sc D-3 (blue) −1 0 1 2 3 W H D 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 λ β H 0 H 1 H 1 * H 1 ** H 0 , W H 1 , W H 1 *, W H 1 **, W H 0 , H H 1 , H H 1 *, H H 1 **, H H 0 , D H 1 , D H 1 *, D H 1 **, D −1 0 1 2 3 W H D 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 λ β H 0 H 1 H 1 * H 1 ** H 0 , W H 1 , W H 1 *, W H 1 **, W H 0 , H H 1 , H H 1 *, H H 1 **, H H 0 , D H 1 , D H 1 *, D H 1 **, D sc E-0 (black), sc E-1 (red), sc E-2 (green), sc E-3 (blue) −1 0 1 2 3 W H D 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 λ β H 0 H 1 H 1 * H 1 ** H 0 , W H 1 , W H 1 *, W H 1 **, W H 0 , H H 1 , H H 1 *, H H 1 **, H H 0 , D H 1 , D H 1 *, D H 1 **, D −1 0 1 2 3 W H D 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 λ β H 0 H 1 H 1 * H 1 ** H 0 , W H 1 , W H 1 *, W H 1 **, W H 0 , H H 1 , H H 1 *, H H 1 **, H H 0 , D H 1 , D H 1 *, D H 1 **, D sc F-0 (black), sc F-1 (red), sc F-2 (green), sc F-3 (blue) −1 0 1 2 3 W H D 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 λ β H 0 H 1 H 1 * H 1 ** H 0 , W H 1 , W H 1 *, W H 1 **, W H 0 , H H 1 , H H 1 *, H H 1 **, H H 0 , D H 1 , D H 1 *, D H 1 **, D −1 0 1 2 3 W H D 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 λ β H 0 H 1 H 1 * H 1 ** H 0 , W H 1 , W H 1 *, W H 1 **, W H 0 , H H 1 , H H 1 *, H H 1 **, H H 0 , D H 1 , D H 1 *, D H 1 **, D Figure 2: Simulated sizes (black) and powers (red, green, blue) for scenarios D,E,F (intermediate proportions). 19 {Tλ(p,p(θ̃),p(θ̂))}λ∈(−1.5,3) {Sλ(p(θ̃),p(θ̂))}λ∈(−1.5,3) W (θ̃, θ̂), H(θ̃, θ̂), D(θ, θ̃, θ̂) sc A-0 (black), sc A-1 (red), sc A-2 (green), sc A-3 (blue) −1 0 1 2 3 W H D −1.5 −1 −0.5 0 0.5 λ ρ H 1 H 1 * H 1 ** H 1 , W H 1 *, W H 1 **, W H 1 , H H 1 *, H H 1 **, H H 1 , D H 1 *, D H 1 **, D −1 0 1 2 3 W H D −1.5 −1 −0.5 0 0.5 λ ρ H 1 H 1 * H 1 ** H 1 , W H 1 *, W H 1 **, W H 1 , H H 1 *, H H 1 **, H H 1 , D H 1 *, D H 1 **, D sc B-0 (black), sc B-1 (red), sc B-2 (green), sc B-3 (blue) −1 0 1 2 3 W H D −1.5 −1 −0.5 0 0.5 λ ρ* H 1 H 1 * H 1 ** H 1 , W H 1 *, W H 1 **, W H 1 , H H 1 *, H H 1 **, H H 1 , D H 1 *, D H 1 **, D −1 0 1 2 3 W H D −1.5 −1 −0.5 0 0.5 λ ρ* H 1 H 1 * H 1 ** H 1 , W H 1 *, W H 1 **, W H 1 , H H 1 *, H H 1 **, H H 1 , D H 1 *, D H 1 **, D sc C-0 (black), sc C-1 (red), sc C-2 (green), sc C-3 (blue) −1 0 1 2 3 W H D −1.5 −1 −0.5 0 0.5 λ ρ H 1 H 1 * H 1 ** H 1 , W H 1 *, W H 1 **, W H 1 , H H 1 *, H H 1 **, H H 1 , D H 1 *, D H 1 **, D −1 0 1 2 3 W H D −1.5 −1 −0.5 0 0.5 λ ρ H 1 H 1 * H 1 ** H 1 , W H 1 *, W H 1 **, W H 1 , H H 1 *, H H 1 **, H H 1 , D H 1 *, D H 1 **, D Figure 3: Efficiencies for scenarios A,B,C (small/big proportions). 20 {Tλ(p,p(θ̃),p(θ̂))}λ∈(−1.5,3) {Sλ(p(θ̃),p(θ̂))}λ∈(−1.5,3) W (θ̃, θ̂), H(θ̃, θ̂), D(θ, θ̃, θ̂) sc C-0 (black), sc C-1 (red), sc C-2 (green), sc C-3 (blue) −1 0 1 2 3 W H D −1.5 −1 −0.5 0 0.5 λ ρ H 1 H 1 * H 1 ** H 1 , W H 1 *, W H 1 **, W H 1 , H H 1 *, H H 1 **, H H 1 , D H 1 *, D H 1 **, D −1 0 1 2 3 W H D −1.5 −1 −0.5 0 0.5 λ ρ H 1 H 1 * H 1 ** H 1 , W H 1 *, W H 1 **, W H 1 , H H 1 *, H H 1 **, H H 1 , D H 1 *, D H 1 **, D sc D-0 (black), sc D-1 (red), sc D-2 (green), sc D-3 (blue) −1 0 1 2 3 W H D −1.5 −1 −0.5 0 0.5 λ ρ H 1 H 1 * H 1 ** H 1 , W H 1 *, W H 1 **, W H 1 , H H 1 *, H H 1 **, H H 1 , D H 1 *, D H 1 **, D −1 0 1 2 3 W H D −1.5 −1 −0.5 0 0.5 λ ρ H 1 H 1 * H 1 ** H 1 , W H 1 *, W H 1 **, W H 1 , H H 1 *, H H 1 **, H H 1 , D H 1 *, D H 1 **, D sc E-0 (black), sc E-1 (red), sc E-2 (green), sc E-3 (blue) −1 0 1 2 3 W H D −1.5 −1 −0.5 0 0.5 λ ρ H 1 H 1 * H 1 ** H 1 , W H 1 *, W H 1 **, W H 1 , H H 1 *, H H 1 **, H H 1 , D H 1 *, D H 1 **, D −1 0 1 2 3 W H D −1.5 −1 −0.5 0 0.5 λ ρ H 1 H 1 * H 1 ** H 1 , W H 1 *, W H 1 **, W H 1 , H H 1 *, H H 1 **, H H 1 , D H 1 *, D H 1 **, D Figure 4: Efficiencies for scenarios C,D,E (intermediate proportions). 21 References [1] Barlow, R. E., Bartholomew, D. J. and Brunk, H.D. (1972). Statistical inference under order restrictions. Wiley. [2] Dale, J.R. (1986). Asymptotic normality of goodness-of-fit statistics for sparse product multinomials. Journal of the Royal Statistical Society, B, 48, 48–59. [3] Dardanoni, V. and Forcina, A. (1998). A Unified Approach to Likelihood Inference on Stochastic Orderings in a Nonparametric Context. Journal of the American Statistical Association, 93, 1112–1122. [4] Fleiss, J.L., Levin B. and Paik, M.C. (2003). Statistical Methods for Rates and Pro- portions. Wiley Interscience. [5] Graubard, B. I. and Korn, E. L. (1987). Choice of Column Scores for Testing Inde- pendence in Ordered 2× I Contingency Tables. Biometrics, 43, 471-476. [6] Martin, N. and Pardo, L. (2008). New families of estimators and test statistics in log-linear models. Journal of Multivariate Analysis, 99(8), 1590-1609. [7] Martin, N., Mata, R. and Pardo, L. (2014). Phi-divergence statistics for the likelihood ratio order: an approach based on log-linear models. http://arxiv.org/pdf/1402.5384v1.pdf. [8] Mancuso, J. Y.; Ahan, H. and Chen, J. J. (2001). Order-restricted dose-related trend tests. Statistics in Medicine, 20, 2305-2318. [9] Pardo, L. (2006). Statistical Inference Based on Divergence Measures. Statistics: series of Textbooks and Monograhps. Chapman & Hall / CRC. [10] Silvapulle, M. J. and Sen., P. K. (2005). Constrained statistical inference. Inequality, order, and shape restrictions. Wiley Series in Probability and Statistics. Wiley-Interscience (John Wiley & Sons). 22 1 Introduction 2 Formulation for isotonic binomial proportions in terms of log-linear and logistic regression modeling: Wald type test-statistics 3 Phi-divergence test statistics 4 Example 5 Simulation study