JOURNAL OF 
COMPUTATIONAL AND 
APPLIED MATHEMATICS 

EISEVIER Journal of Computational and Applied Mathematics 84 (1997) 207-217 

A comparison of some estimators of the mixture proportion of 
mixed normal distributions’ 

M.C. Pardo 
Universitary School of Statistics, Cornplutense University of Madrid, 28040-Madrid, Sptzin 

Received 29 October 1996; received in revised form 16 June 1997 

Abstract 

Fisher’s method of maximum likelihood breaks down when applied to the problem of estimating the five parameters of 
a mixture of two normal densities from a continuous random sample of size n. Alternative methods based on minimum- 
distance estimation by grouping the underlying variable are proposed. Simulation results compare the efficiency as well 
as the robustness under symmetric departures from component normality of these estimators. Our results indicate that the 
estimator based on Rao’s divergence is better than other classic ones. 

Keywords: Minimum-distance estimator; Simulation; Relative efficiency 

AA4S classification: 62FlO; 62F35 

1. Introduction 

Distributions which result from the mixing of two or more component distributions are designated 
as “compound” or “mixed” distributions. Such distributions arise in a wide variety of practical 
situations ranging from distributions of wind velocities to distributions of physical dimensions of 
various mass-produced items. The moment solution to the problem of estimating the five parameters 
of an arbitrary mixture of two unspecific normal densities was studied as early as 1894 by Karl 
Pearson [ 191. Yet, despite the fact that many random phenomena have subsequently been shown to 
follow this distribution, it is only recently that the estimation problem has been seriously reconsidered. 
Hasselblad [ 121 seems to have been the first to reopen the question. Since then, the problem has also 
attracted the attention of Cohen [7], who shows how the computation of Pearson’s moment method 
can be lightened to some extent. Maximum-likelihood estimates computed with all the information 
available are reported to be the best under all circumstances, however, they plainly misbehave in 
estimating mixed distributions because the maximum-likelihood function is not a bounded function 

’ This work was supported by Grant DGICYT PB94-0308. 

0377-0427197/$17.00 @ 1997 Elsevier Science B.V. All rights reserved 
PII SO377-0427(97)00124-6 


208 M.C. Pardol Journal qf Computational and Applied Mathematics 84 (1997) 207-217 

in this case, Le Cam [15]. Then the maximum-likelihood procedure with original data cannot be 
universally recommended. So Day [9] and Behboodian [2] find an appropriate local maximum of 
the likelihood function by using iterative techniques. Fryer and Robertson [lo] compare the moment 
estimates and the multinomial maximum likelihood and minimum x2 estimates obtained by grouping 
the underlying variable. They show that the grouped estimates are more accurate than the moment 
estimates for most distributions. Recently, Woodward et al. [24, 251 have carried out an interesting 
comparison between the maximum-likelihood estimator and the minimum distance estimators based 
on the Cramer-Von Mises and Hellinger distances, respectively. 

In this paper we examine the use of minimum-distance estimation based on Burbea and Rao 
divergence [6] (MR,Z ) as an alternative to maximum-likelihood (ML) estimation by ‘grouping the 
underlying variable in both cases for the estimation of the parameters of the mixture density 

f(x) = ~fI(X) + (1 - ~).l”2(x> 

when the component distributions in the simulated samples are normal and when they are not. 
There is no doubt that the choice of the number of classes, M, to group the underlying variable 

is an important question. However, in this paper, it is not so important since we only compare 
estimators obtained by grouping the underlying variable. Therefore, we are interested in studying the 
behavior of the estimators under the same conditions. In fact, there are no papers related with the 
mixture of normal distributions which carry out the study of the choice of M. Fryer and Robertson 
[lo] said “The method of grouping was dictated to a large extent by practical considerations, and it 
is not claimed that the groupings are in any sense optimal”. We join with them in that feeling. 

Anyway, we suppose that we need the estimations to construct a goodness-of-fit test. There are 
many papers that study the problem of choosing cells in this situation. One alternative is to do the 
same partition to estimate the parameters than to test the null hypothesis. This choice is guided by 
two considerations: the power of the resulting test, and the desire to use the asymptotic distribution 
of the statistic as an approximation to the exact distribution for sample size II. Mann and Wald 
[ 161 initiated the study of the choice of cells in the Pearson test of fit to a continuous distribution. 
They recommended, first, that the cells be chosen to have equal probabilities under the hypothesized 
distribution. The advantages of such a choice for the Pearson tests are (1) unbiasedness, (2) maximal 
power, and (3) empirical studies have shown that the asymptotic distribution of these statistics is a 
more accurate approximation to the exact distribution. Mann and Wald then made recommendations 
on the number M of equiprobable cells to be used. They found that for a sample of size n (large) 
and significance level a, one should use approximately M=4 [2n2/c(a)2] I”, where c(a) is the upper 
a-point of the standard normal distribution. Retracting the Mann-Wald calculations using better 
approximations, as in Schorr [22], confirms that the optimum M is smaller than this value. He 
recommended to use A4 = 2n2/5. 

Another alternative is to consider different A4 values and to calculate for each one the corre- 
sponding estimator. The best A4 would be that corresponding to the estimator with less bias and 
mean-squared error. 

In Section 2, we provide background material on the minimum &,-divergence estimator (MR,? 
E). In Section 3 we carry out a simulation study for comparing the ML estimator (MLE), the 
minimum chi-square estimator (MCSE) and the MR,7 E with different a values for grouped data. 


M. C. Pardo I Journal of Computational and Applied Mathematics 84 (1997) 207-217 209 

2. The minimum Burbea and Rao distance estimator 

Consider the probability densities fe(x) with respect to a o-finite measure p on the statistical space 
(WxJs)oEo~R”‘” and a decomposition {A,, . . .,AM} of X. Then the formula PO(&) = qi(O), i = 
1 , . . . ,M defines a discrete statistical model. Let X1,. . . ,X, be a random sample drawn from the 
previous population and let ji = ni/n be the Ai relative frequency, i = 1, . . . ,M. If we are interested 
in estimating 6 by the maximum-likelihood method we have to maximize for fixed (ni, . . . ,nM) 

Po(N,=n I,..., N&f=n&f)= 
n! 

n,! . ..Q! q1w . . . qM(Q)n.“, 

so 

log Po(N, = rzl,. . . , NM = n,,,,) = -nDKULLBACK(Ej, Q(O)) + 1, 

where i=(j,,... > i),J, Q<e> = (q@>, . . . , qu(6))t, DKULLBACK the Kullback divergence [ 141 and 1 
an independent value of 19. Then to estimate 8 by the discrete model maximum-likelihood estimator 
is equivalent to minimize on 19 E 0 & RMO the Kullback divergence. 

Now, then the Kullback divergence is not the unique divergence measure, so we can choose as 0 
estimator the 8 value which verifies the following: 

D being every divergence measure. 
Depending on the divergence measure chosen, you have different estimators. On the one hand, if 

M (Fi - qi(d))2 WR Q(Q)> = n c 
/=I qiCO> 

the corresponding 3 is the well-known minimum x2 estimator, studied in this context by Fryer and 
Robertson [ lo]. 

On the other hand, if we consider the Burbea and Rao divergence [6], 

where 

M 

fL,dP) = c L (Pg - Pi> i=, 1 - CI 
a# 1, 

-Pz ln Pi a=1 

is the entropy of degree a due to Havrda and Charvat [ 131, the corresponding 8 will be called the 
minimum Q-divergence estimator. Rao [21] used the family of $,-entropies in genetic diversity 
between populations. In the particular case of a=2, we obtain the Gini-Simpson index. This measure 
of entropy was introduced by Gini [ 1 l] and by Simpson [23] in biometry and its properties have 
been studied by various authors (Bhargava and Doyle [3], Bhargava and Uppulari [4], Agresti and 


210 M. C. Pardo I Journal of Computational and Applied Mathematics 84 (1997) 207-217 

Agresti [ 11). Note that if we consider the Gini-Simpson index, then the associated R#-divergence is 
proportional to the square of the Euclidean distance 

In order to solve the problem to estimate the mixture proportion of mixed-normal distributions, 
we will define in a convenient way the R4Y-divergence estimator. The following definition was given 
in Pardo [ 171. 

Definition 1. Let us suppose that n observations are drawn at random and with replacement from a 
population with statistical space (X,/I,, Po)BEoc fl~fO, the minimum R#Y-divergence estimator of 6’ is 

every dbX E 0 that verifies 

where f is the relative frequency vector. So the minimum R,x -divergence estimator will be 8, = 

arg inf RbY@., Q(e)>. 0~63 c R”O 
The importance of the family of divergence measures considered in the previous definition can be 

seen in the aforementioned paper of Burbea and Rao [6]. For example, a surprising result is the fact 
that the RbT-divergence is convex on dw x du, where du = {(RI,. . . , pw)’ / CE, pi = 1, pi 2 0, i= 
1 Ye.., M}, if only if a~[1,2] for M>2, and if only if a~[1,2] or a~ [3,:] for M=2. This 
establishes the range of CI for which this measure is useful in practical applications. Some important 
properties of this divergence family can be seen in Pardo and Vajda [ 181. 

Throughout, we assume the model is correct and MO < M - 1. Furthermore, we restrict ourselves 
to unknown parameters 8’ satisfying the regularity conditions of Birch [5] which are neccesary to 
prove that the MLE for grouped data is asymptotically distributed as a normal. Consider 

A(0) = diag (Q(e)(“2’-‘) J(0), 

where 

J(e> = (Jjr(e))/z,k: ;,io 

is an M x MO Jacobian matrix being 

Jj,(@ = y. 
I 

Then, assuming that the function Q : 0 * + AM has continuous second partial derivatives in a neigh- 
borhood of 8’, the following asymptotic properties were shown in Pardo [ 171: 

(1) 

& = 80 + (A(8°)tA(00))-1A(80)tdiag (Q(s”)(J2’-I) (? - ~(0~)) + 0(ll B - Q(e”) II), 

where Ptiz is unique in a neighborhood of 8’. 


M. C. Pardo I Journal of Computational and Applied Mathematics 84 (1997) 207-217 211 

(2) 
II 

where 

C = B(6’) (diag(Q(0’)) - Q(O”)Q(Oo)‘) B(0’)’ 

with 

B(e”) = (A(O”)tA(~o))-‘A(Oo)t diag (Q(0)(a’2)-1) . 

(y 
Q(0+) is a &z-consistent estimator of Q(0’), i.e., 

fi II Q&J - Q@‘) II d O,U>. 

Remark 1. We note that if we consider the R-divergence or equivalently a + 1, we get that 

iti, = 8’ + (A(t!I”)tA(t90))-‘A(60)t diag (Q(@-“*) (P - Q(0’)) + o([[ p - Q(d”) II), 

where 

A(8) = diag (Q(t1°)-“2) J(0). 

and 

fi(e,, - coy + N(o,r(eo)-11, 

being I(0) the Fisher information matrix. So the 64, estimator is a BAN (best asymptotically normal) 
estimator. 

In the following section we will present a simulation study to know the behavior of our estimator. 

3. Simulation results 

In this section we report the results of simulations designed to empirically compare ML, minimum 
chi-square (MCS), MR$, and MR,, estimations of parameters for a mixture of normal, in which 
we analyze the efficiency as well as the robustness of them. Simulations reported in this section are 
based on mixing proportions 0.25, 0.5 and 0.75. For each of these mixing proportions, firstly, we 
considered mixtures of the densities f,(x) and f&), where f,(x) is the density for the random 
variable X = aY and f2(x) is the density associated with X = Y + b where a > 0 and b > 0 and the 
distribution of Y is normal. Secondly, we consider Y as a Student’s t with two or four degrees of 
freedom, or double exponential, to study the robustness under symmetric departures from component 
normality. Thus, “a” is the ratio of scale parameters which we take to be 1 and -\/z while “b” 
was selected to provide the desired overlap between the two distributions. We considered “overlap” 
[24] as the probability of misclassification using this rule: Classify an observation x as being from 
population 1 if x <x, and from population 2 if x 2 x, , where x, is the unique point between ,u~ 


212 M. C. PardolJournal of Computational and Applied Mathematics 84 (1997) 207-217 

Table 1 
Simulation results for mixtures of normal components 

0.1 Overlap 0.03 Overlap 

i a Estimator G nm 2 B% nm II? 

0.25 

0.5 

0.25 

0.5 

0.75 

MRp, E 0.070 
M%, E 0.068 
MLE 0.063 
MCSE 0.060 

MQ E -0.009 
M%, E -0.007 
MLE -0.015 
MCSE -0.016 

Mh,E -0.020 
MRs, E -0.021 
MLE -0.025 
MCSE -0.023 

MQE -0.099 
Mh, E -0.099 
MLE -0.101 
MCSE -0.098 

MQ E -0.166 
Mh, E -0.167 
MLE -0.167 
MCSE -0.173 

1.99 
1.89 
1.95 
1.95 

1.30 
1.28 
1.52 
1.52 

0.874 
0.937 
0.928 
0.977 

2.004 
2.010 
2.137 
2.030 

4.32 
4.39 
4.57 
4.97 

1.02 
0.96 

1.00 

0.85 
0.84 

1.00 

0.941 
1.010 

1.052 

0.937 
0.940 

0.949 

0.94 
0.96 

1.09 

0.042 
0.042 
0.037 
0.036 

-0.008 
-0.010 
-0.012 
-0.011 

0.001 
0.003 

-0.001 
-0.001 

-0.058 
-0.058 
-0.058 
-0.060 

-0.085 
-0.084 
-0.076 
-0.077 

0.82 1.11 
0.81 1.10 
0.73 
0.78 1.06 

0.71 0.89 
0.72 0.90 
0.79 
0.78 0.97 

0.571 0.993 
0.562 0.976 
0.575 
0.570 0.991 

1.027 0.951 
1.027 0.951 
1.079 
1.042 0.965 

1.56 1.07 
1.48 1.01 
1.46 
1.52 1.03 

and p2 such that Ifi = (1 - 1)f2(x,). The overlaps examined in the current study are 0.03 and 
0.1. 

For each set of configurations considered, 500 samples of size n = 100 were generated from 
the corresponding mixture distribution, and for each sample considered the ML, MCS, MR,, and 
MR,, estimates were obtained. The iterative procedure proposed in Section 2 was implemented 
using the IMSL subroutine ZXMIN which minimize a function of various variables. Although all 
these estimation procedures provide estimators of all five parameters, Tables 1 and 2 present only 
estimations of 1. In Table 3 estimations for all the parameters are given. 

For either of the estimators proposed in the previous section to be used in practice, one must 
have starting values for the iterative procedures. We chose to obtain these values by employing 
an ad hoc quasi-clustering technique used by Woodward et al. [24] that is easy to implement. 
They allow as possible values for the initial estimate of 3, only the values 0.1,. . . ,0.9. For each of 
these values of 1, the sample is divided into two subsamples, Y,, Y,,. . . , Y,, and Y,,+,, Ynlf2,. . . , Y,, 
where z is the ith order statistic and ni is “nil” rounded to the nearest integer. So, i is that value 
at which A( 1 - A)( ml 
c; = ((ry) - 

- ~2~)~ is maximized, j& = ml, G2 = m2, iTif = ((ml - ri0.25))/0.6745)2 and 
m2)/0.6745)2, where mj is the sample median of the jth subsample and Y/!~) is the qth 

percentile from the jth subsample. 
In Table 1 we present summary results of the simulation comparing the performance of the 

estimators for mixtures of normal components. Estimates of the bias and mean-squared error (MSE) 


Table 2 

MC. Pardol Journal of Computational and Applied Mathematics 84 (1997) 207-217 213 

Simulation results for mixtures of nonnormal components 

0.1 Overlap 0.03 Overlap 

1 a Estimator G n@ E % II&& z 

0.25 

0.5 

0.25 

0.5 

0.75 

0.25 

0.5 

0.25 

0.5 

1 

1 

d? 

Jz 

Jz 

1 

1 

ti 

J2 

MRlh2 E 0.059 

MbIE 0.062 
MLE 0.053 
MCSE 0.053 

M%, E 
M&j, E 
MLE 
MCSE 

-0.001 
-0.002 
-0.003 
-0.003 

M&E 0.012 

MR+, E 0.016 
MLE 0.013 
MCSE 0.012 

MR9? E -0.056 
MQ,, E -0.052 
MLE -0.047 
MCSE -0.053 

MR,$ -0.106 

M&h E -0.108 
MLE -0.098 
MCSE -0.090 

MRg2 E 
M&I E 
MLE 
MCSE 

0.084 
0.085 
0.085 
0.077 

MRgiz E -0.005 

MRs, E -0.003 
MLE -0.008 
MCSE -0.009 

M% E 
M&I E 
MLE 
MCSE 

-0.002 
-0.005 
-0.009 
-0.009 

M% E -0.094 

MRd, E -0.096 
MLE -0.099 
MCSE -0.097 

Double exponential components 

1.069 1.066 
1.103 1.100 
1.002 
0.986 0.9837 

0.794 1.106 
0.748 1.041 
0.718 
0.718 1 .ooo 

0.702 1.074 
0.690 1.057 
0.653 
0.632 0.967 

1.178 1.146 
1.071 1.042 
1.027 
1.128 1.098 

1.950 0.927 
2.027 0.963 
2.103 
1.859 0.883 

t(4) components 

2.608 0.951 
2.673 0.975 
2.742 
2.478 0.903 

1.548 0.888 
1.529 0.877 
1.742 
1.791 1.028 

0.962 0.940 
0.946 0.925 
1.022 
0.983 0.962 

2.290 0.892 
2.316 0.903 
2.565 
2.516 0.981 

0.061 0.861 
0.064 0.903 
0.053 0.721 
0.05 1 0.726 

0.0008 0.400 
-0.001 0.416 
-0.001 0.391 

0.001 0.415 

0.039 0.572 
0.041 0.607 
0.035 0.559 
0.034 0.560 

-0.032 0.588 
-0.03 1 0.577 
-0.029 0.552 
-0.026 0.566 

-0.090 1.392 
-0.087 1.366 
-0.073 1.079 
-0.074 1.133 

0.05 1 
0.051 
0.047 
0.047 

0.941 
0.949 
0.905 
0.900 

-0.006 0.792 
-0.006 0.778 
-0.007 0.856 
-0.008 0.857 

0.010 0.559 
0.008 0.516 
0.005 0.552 
0.006 0.559 

-0.062 1.102 
-0.063 1.145 
-0.058 1.208 
-0.057 1.121 

1.194 
1.253 

1.007 

1.023 
1.063 

1.060 

1.022 
1.086 

1.001 

1.066 
1.045 

1.026 

1.290 
1.266 

1.049 

1.039 
1.048 

0.994 

0.924 
0.908 

1 .ooo 

1.011 
0.934 

1.011 

0.912 
0.948 

0.928 


214 h4.C. Pardol Journal of Computational and Applied Mathematics 84 (1997) 207-217 

Table 2. (Contd.) 

0.1 Overlap 0.03 Overlap 

1 a Estimator fG n&i&% E fG nh%Z 2 

0.75 

0.25 

0.5 

0.25 

0.5 

0.75 

v5 

1 

1 

J5 

xh 

fi 

M%E -0.182 5.257 0.923 -0.104 1.894 

M&J, E -0.182 5.375 0.944 -0.106 2.037 
MLE -0.184 5.692 -0.097 1.920 
MCSE -0.181 5.567 0.978 -0.095 1.854 

0.986 
1.061 

0.965 

t(2) components 

MQE 
M&E 
MLE 
MCSE 

0.113 
0.113 
0.109 
0.108 

1.515 
1.551 
1.470 
1.448 

MhE 
Mh E 
MLE 
MCSE 

-0.011 
-0.011 
-0.014 
-0.017 

1.150 
1.067 
1.080 
1.120 

6.633 0.991 0.063 1.030 
6.657 0.995 0.063 1.055 
6.689 0.059 
6.629 0.991 0.057 0.985 

3.594 0.900 -0.002 1.064 
3.633 0.910 -0.003 0.988 
3.992 -0.005 
3.863 0.967 -0.005 1.036 

4.372 0.945 0.012 0.982 
4.566 0.987 0.008 0.996 
4.622 0.011 
4.659 1.007 0.013 0.908 

4.255 0.894 -0.070 0.889 
4.366 0.917 -0.069 0.878 
4.759 -0.070 
4.810 1.010 -0.069 0.914 

10.760 0.966 -0.122 0.991 
10.580 0.950 -0.120 1.008 
11.131 -0.115 

MCSE -0.232 11.259 1.011 -0.112 0.968 

Mh,E 
Mh, E 
MLE 
MCSE 

0.035 
0.034 
0.038 
0.032 

0.852 
0.864 
0.867 
0.787 

MhE 
M% E 
MLE 
MCSE 

-0.117 
-0.117 
-0.124 
-0.124 

1.697 
1.675 
1.907 
1.745 

MR+: E 

MQ, E 
MLE 

-0.225 
-0.224 
-0.23 1 

3.202 
3.258 
3.230 
3.129 

based on the simulations are given by 

and 

where n, the number of samples and iii denotes an estimate 
noted that n@i? is the quantity actually given in the tables. 

of A for the ith sample. It should be 
We also table empirical measures of 


M.C. Pardol Journal of Computational and Applied Mathematics 84 (1997) 207-217 215 

Table 3 
Estimated relative efficiencies of the MRd, E, MRb, E and MCSE relative to the MLE for mixture model parameters 

0.1 0.03 

A a Estimator PI 01 P2 fJ2 1 P’I 01 P2 02 1 

0.25 

0.5 

0.25 

0.5 

0.75 

0.25 

0.5 

0.25 

0.5 

0.75 

MR,> E 
MRd, E 
MCSE 

MR$*E 
MR& E 
MCSE 

MRgzE 
MR+, E 
MCSE 

MR+: E 

MR& E 
MCSE 

MRb2 E 

MR#, E 
MCSE 

0.97 0.87 0.99 
0.98 0.93 0.98 
1.06 1.00 0.97 

0.97 0.95 0.90 
0.96 0.99 0.82 
1.02 1.07 1.01 

0.95 0.84 1.02 
0.96 0.84 1.04 
0.98 0.98 1.02 

0.97 0.83 0.81 
0.97 0.83 0.82 
1.01 0.95 0.98 

0.94 0.92 0.92 
0.95 0.92 0.90 
0.94 0.93 0.97 

MRdJE 
MRb, E 
MCSE 

MRs, E 
MRS, E 
MCSE 

1.19 0.80 1.07 
0.98 0.76 1.16 
1.02 1.01 1.06 

0.94 1.04 0.95 
0.94 1.01 0.88 
0.99 1.06 1.03 

MRgz E 0.96 0.83 1.05 

MRO,E 0.96 0.81 1.07 
MCSE 0.99 0.97 1.02 

MR,p> E 
MRA E 
MCSE 

MR@*E 
MRV‘J, E 
MCSE 

0.93 
0.93 
0.99 

0.98 
1.00 
0.97 

0.95 0.89 
1.02 0.94 
1.06 0.96 

0.90 0.84 
0.91 0.88 
1.06 1.02 

1.72 1.02 
1.06 0.96 
1.01 1.00 

0.90 0.85 
0.89 0.84 
0.96 1 .oo 

1.05 0.94 
1.04 1.01 
0.99 1.05 

0.83 0.93 
0.76 0.94 
0.98 0.94 

0.90 0.94 
0.85 0.96 
0.97 1.09 

t(4) 

0.97 0.95 
0.95 0.97 
0.95 0.90 

0.80 0.88 
0.90 0.87 
1 .oo 1.02 

0.91 0.94 
0.95 0.92 
0.96 0.96 

0.81 0.89 
0.89 0.90 
0.96 0.98 

0.86 0.92 
0.91 0.94 

0.95 0.88 1.11 1.05 1.11 
0.99 0.93 1.07 1.05 1.10 
1.01 1.03 1.03 0.98 1.06 

1.00 0.97 0.89 0.86 0.89 
1.05 1 .oo 0.86 0.92 0.90 
1 .oo 1.00 0.97 0.95 0.97 

0.94 0.80 1.03 1.06 0.99 
0.92 0.79 1.04 1.06 0.97 
0.98 0.92 0.96 1.02 0.99 

1.03 1.08 0.85 0.81 0.95 
1.03 1.11 0.88 0.83 0.95 
1.02 1.12 0.92 0.90 0.96 

1.08 1.02 0.82 0.69 1.07 
1.06 0.95 0.79 0.69 1.01 
1.02 1.00 0.95 0.97 1.03 

0.82 0.70 1.13 1.10 1.03 
0.78 0.77 1.15 1.10 1.04 
1.03 0.90 1.00 1.02 0.99 

1.03 1.08 0.85 0.81 0.92 
1.03 1.11 0.88 0.83 0.90 
1.02 1.12 0.92 0.90 1.00 

0.83 0.93 0.98 1.04 1.01 
3.14 0.91 0.96 0.97 0.93 
0.91 1 .oo 0.89 0.97 1.01 

1.01 0.89 0.85 0.68 0.91 
1 .oo 0.96 0.81 0.69 0.94 
0.95 0.90 0.97 0.89 0.92 

1.08 1.07 0.76 
1.09 1.08 0.87 

0.66 
0.82 
1.09 

0.98 
1.06 
0.96 1.08 0.97 0.99 1.00 0.95 

the relative efficiencies of the MCSE, MR,,E and MR,,E with the MLE, i.e., 

k = I%&(# MLE) 

%&MLE) ’ 

- 

Analyzing the results of Table 1, we find that the estimated bias and the &6% associated with 
the MRti7E are generally smaller than those for the MLE and the MCSE. 


216 M.C. Pardo I Journal of Computational and Applied Mathematics 84 (1997) 207-217 

In Table 2 we display the results for the nonnormal components. In the case of double-exponential 
components is not clear which is the best because, in general, the MCSE has less Bxs than the 
others but the MLE has less @E. However, in the case of Student’s t components are very 
clear than the MR,7E is the best. It has less I%% and B% than the others. The superiority of 
it is even clear for t(2) components, i.e., when the departure from normality is more extreme. In 
this setting the performance of the MLE and MCSE further deteriorates with respect to that of 
the MRbzE. 

Although our emphasis here has been on the estimation of the mixing proportion, the estimation 
routines used here obtain estimation for all five of the parameters. So, it seems obvious to question 
about whether the results shown for II are similar for the rest of the parameters ,u], cl, p2 and rs2. In 
Table 3 we display empirical relative efficiencies for all the parameters for normal and t(4) mixtures. 
From the table we see that the results for the other parameters also exhibited patterns similar to 
those shown in Tables 1 and 2, i.e., the MR,7E is a very attractive alternative to both the MLE and 
the MCSE. 

4. Concluding remarks 

Our results indicate that the MR,7E is better than the MLE and MCSE at the true model and under 
the Students’ t components. While all of them perform comparably under the double-exponential 
components. As would be expected, the performance of the estimators declines as the overlap between 
the two components increases. 

References 

[l] A. Agresti, R.F. Agresti, Statistical analysis of qualitative variation, K.E. Schussler (Ed.,), Sot. Methodology, (1978) 
pp. 204-237. 

[2] J. Behboodian, On a mixture of normal distributions, Biometrika 57 (1970) 215-217. 
[3] T.N. Bhargava, P.H. Doyle, A geometric study of diversity, J. Theoret. Biol. 43 (1974) 241-251. 
[4] T.N. Bhargava, V.R.R. Uppului, On an axiomatic derivation of Gini diversity, with applications, Metron 30-VI 

(1975) 1-13. 
[5] M.W. Birch, A new proof of the Pearson-Fisher theorem, Ann. Math. Statist. 35 (1964) 817-824. 
[6] J. Burbea, C.R. Rao, On the convexity of some divergence measures based on entropy functions, IEEE Trans. Inform. 

Theory 28 (1982) 489495. 
[7] A.C. Cohen, Estimation in mixtures of two normal distributions, Technometrics 9 (1967) 15-28. 
[8] I. Csiszar, A class of measures of informativity of observation channels, Period. Math. Hungar. 2 (1972) 191-213. 
[9] N.E. Day, Estimating the components of a mixture of normal distributions, Biometrika 56(3) (1969) 463474. 

[lo] J.G. Fryer, C.A. Robertson, A comparison of some methods for estimating mixed normal distributions, Biometrika 
59(3) (1972) 6399648. 

[ 1 l] C. Gini, Variabilita e mutabilita, Studi Economico-Giuridici della Facolta di Giiurisprudenza dell Universita di 
Cagliari, a III, PArte II, (1912). 

[12] V. Hassenblad, Estimation of parameters for a mixture of normal distributions, Technometrics 8 (1966) 431434. 
[13] M.E. Havrda, F. Charvat, Quantification method of classification processes: concept of structural a-entropy, 

Kybernetika 3 (1967) 30-35. 
[ 141 S. Kullback, R. Leibler, On information and sufficiency. Arm. Math. Statist. 22 (1951) 79-86. 
[ 151 L. Le Cam, Maximum likelihood: Au introduction, Intemat. Statist. Rev. 58(2) (1990) 153-171. 


M. C. Pardo I Journal of Computational and Applied Mathematics 84 (1997) 207-217 217 

[16] H.B. Mann, A. Wald, On the choice of the number of class intervals in the application of the chi-square test, Ann. 
Math. Statist. 13 (1942) 306317. 

[17] M.C. Pardo, Asymptotic behaviour of an estimator based on Rao’s divergence, Kybemetika 33 (1997). 
[ 181 M.C. Pardo, I. Vajda, About distances of discrete distributions satisfying the data proccesing theorem of information 

theory, Trans. IEEE Inform. Theory 43(4) (1997) 1288-1293. 
[ 191 K. Pearson, Contributions to the mathematical theory of evolution, Philos. Trans. Roy. Sot. Ser.A 185 (1894) 71-110. 
[20] K. Pearson, On the criterion that a given system of deviations from the probable in the case of a correlated system 

of variables is such that it can be reasonably supposed to have arisen from random sampling, Philos. Mag. 50 (1900) 
157-172. 

[21] C.R. Rao, Diversity and dissimilarity coefficients: an unified approach, J. Theoret. Pop. Biol. 21 (1982) 24-43. 
[22] A.R. Schorr, On the choice of the class intervals in the application of the chi-square test, Math. Oper. Forsch. u. 

Statist. 5 (1974) 357-377. 
[23] E.H. Simpson, Measurement of diversity, Nature 163 (1949) 688. 
[24] W.A. Woodward, W.C. Parr, W.R. Schucany, H. Lindsay, A comparison of minimum distance and maximum 

likelihood estimation of a mixture proportion, J. Amer. Statist. Assoc. 79 (1984) 590-598. 
[25] W.A. Woodward, P. Whitney, P.W. Eslinger, Minimum Hellinger distance estimation of mixture proportions, J. 

Statist. Plann. Inference 48 (1995) 303-3 19.