ISSN: 2341-2356 WEB DE LA COLECCIÓN: http://www.ucm.es/fundamentos-analisis-economico2/documentos-de-trabajo-del-icaeWorking papers are in draft form and are distributed for discussion. It may not be reproduced without permission of the author/s. Instituto Complutense de Análisis Económico A New Inequality Measure that is Sensitive to Extreme Values and Asymmetries Michael McAleer Department of Quantitative Finance, National Tsing Hua University, Taiwan Discipline of Business Analytics, University of Sydney Business School, Australia Econometric Institute, Erasmus School of Economics, Erasmus University Rotterdam, The Netherlands; Department of Quantitative Economics, Complutense University of Madrid, Spain; Institute of Advanced Sciences, Yokohama National University, Japan Hang K. Ryu Department of Economics, Chung Ang University, Seoul, Korea Daniel J. Slottje Department of Economics, SMU, Dallas Abstract There is a vast literature on the selection of an appropriate index of income inequality and on what desirable properties such a measure (or index) should contain. The Gini index is, of course, the most popular. There is a concurrent literature on the use of hypothetical statistical distributions to approximate and describe an observed distribution of incomes. Pareto and others observed early on that incomes tend to be heavily right-tailed in their distribution. These asymmetries led to approximating the observed income distributions with extreme value hypothetical statistical distributions, such as the Pareto distribution. But these income distribution functions (IDFs) continue to be described with a single index (such as the Gini) that poorly detects the extreme values present in the underlying empirical IDF. This paper introduces a new inequality measure to supplement, but not to replace, the Gini that measures more accurately the inherent asymmetries and extreme values that are present in observed income distributions. The new measure is based on a third-order term of a Legendre polynomial from the logarithm of a share function (or Lorenz curve). We advocate using the two measures together to provide a better description of inequality inherent in empirical income distributions with extreme values. Keywords Inequality Index, Extreme value distributions, Maximum entropy method, Orthonormal basis, Legendre polynomials. JEL Classification D31, D63 UNIVERSIDAD COMPLUTENSE MADRID Working Paper nº 1725 October, 2017 1 A New Inequality Measure that is Sensitive to Extreme Values and Asymmetries1 Michael McAleer2, Hang K. Ryu3 and Daniel J. Slottje4 Abstract There is a vast literature on the selection of an appropriate index of income inequality and on what desirable properties such a measure (or index) should contain. The Gini index is, of course, the most popular. There is a concurrent literature on the use of hypothetical statistical distributions to approximate and describe an observed distribution of incomes. Pareto and others observed early on that incomes tend to be heavily right-tailed in their distribution. These asymmetries led to approximating the observed income distributions with extreme value hypothetical statistical distributions, such as the Pareto distribution. But these income distribution functions (IDFs) continue to be described with a single index (such as the Gini) that poorly detect the extreme values present in the underlying empirical IDF. This paper introduces a new inequality measure to supplement, but not to replace, the Gini that measures more accurately the inherent asymmetries and extreme values that are present in observed income distributions. The new measure is based on a third-order term of a Legendre polynomial from the logarithm of a share function (or Lorenz curve). We advocate using the two measures together to provide a better description of inequality inherent in empirical income distributions with extreme values. JEL Classification: D31, D63 Keywords: Inequality Index, Extreme value distributions, Maximum entropy method, Orthonormal basis, Legendre polynomials. 1 This research was supported by the National Research Foundation of Korea (2017S1A3A2066657), National Science Council, Ministry of Science and Technology (MOST), Taiwan, and the Australian Research Council. 2 Department of Quantitative Finance, National Tsing Hua University, Taiwan; Discipline of Business Analytics, University of Sydney Business School, Australia; Econometric Institute, Erasmus School of Economics, Erasmus University Rotterdam, The Netherlands; Department of Quantitative Economics, Complutense University of Madrid, Spain; Institute of Advanced Sciences, Yokohama National University, Japan. Email: michael.mcaleer@gmail.com 3 Department of Economics, Chung Ang University, Seoul, Korea, 156-756, Tel.: +82-11-253-6500; Email: hangryu@cau.ac.kr 4 Department of Economics, SMU, Dallas, TX 75275, Tel: 214-732-9170, Email: dan.slottje@fticonsulting.com mailto:michael.mcaleer@gmail.com mailto:hangryu@cau.ac.kr mailto:dan.slottje@fticonsulting.com 2 I. Introduction Income inequality research has experienced a resurgence after losing some momentum in the late 1990s and the first decade of the Twenty-first Century. Piketty (1995, 2014) and Boushey et al. (2017) reignited some interest in the field; Piketty did so with his 2014 tome on “polarization.” There is a vast literature on the measurement of income inequality, cf. Cowell (2011) for an excellent bibliography of much of this work. This literature contains hundreds of papers on an appropriate index of income inequality and on what desirable properties such a measure (or index) should possess. We present and review some of this discussion below. There is also a concurrent literature on the use of hypothetical statistical distributions to approximate and describe an observed distribution of incomes. Pareto (1896) and others observed early on that incomes tend to be heavily right-tailed in their distribution. These asymmetries led researchers to approximating the observed income distributions with extreme value hypothetical statistical distributions, such as the Pareto distribution. Statisticians have done considerable work on extreme value distributions in other applications. The generalized extreme value distribution (GEV) and its family members, including the Weibull, Gumbel, Frechet and others, have been extensively explored by statisticians and inequality researchers alike (cf. Coles (2001) and Cowell and Flachaire (2007)). James McDonald has been a leading researcher in the area of functional forms of hypothetical statistical distributions to describe IDFs for a long time (cf. McDonald (1984), McDonald et al. (2013) and Slottje (1987)). Interestingly, even with the recognition of the fact that incomes are distributed with asymmetric higher moments, inequality indices constructed to capture the level of inequality inherent in these observed income distributions 3 (with a single number) are generally based on the mean and variance of the observed data. Cowell and Flachaire (2002, 2007) is the only work that seems to discuss the two concepts (that is, extreme values in the IDF and detecting it with an inequality index) in the same place. They do not introduce a new index or measure to deal with the issue, but note that the two most popular classes of measures, the Gini and Entropy-based measures, have different sensitivities to the problem in their first paper (cf. Cowell and Flachaire (2002)). In their second paper, the authors are primarily concerned about how sensitive commonly used inequality measures are to extreme values in the underlying distributions, and suggest some semi-parametric specifications of the commonly used measures to account for the extreme values (cf. Cowell and Flachaire (2007)). The Gini coefficient and Theil’s entropy measure (frequently generalized) are two very popular inequality indices, among others, that have not always performed well in describing some of the tail behavior in observed income distributions. Specifically, both measures fall short in detecting changes in various group’s share (cf. Ryu(2013) and Ryu and Slottje (2017))5. Another way to approach the problem is to realize that there are many income distribution functions which will produce the same value of a Gini coefficient. The overall shape of the income share function may be well described by the Gini coefficient (or by Theil’s entropy measure), but the poorest group’s share and the precise details of the richest group’s share generally are not described well by these measures. In this paper, a second inequality measure is introduced and added to the Gini coefficient to describe movements of the extreme values and asymmetries of observed income distributions as they change over time. 5 See Maasoumi (1986, 1989) for excellent work on the generalized entropy class of measures. 4 In the next section we discuss desirable properties an inequality measure should possess. In Section 3 and 4 we introduce the new measure, which is based on the expansion of the logarithm of the share function (or Lorenz curve) with a Legendre polynomial expansion. Section 5 of the paper discusses an application by fitting the new measure to CPS data. Section 6 concludes the paper. II. Desirable Properties of an Income Inequality Index, I(y)6 There is significant consensus among inequality researchers that any income inequality index, I(y), should possess statistical properties that allow it to reasonably describe the inequality inherent in an observed IDF. Given the inherent difficulty in describing the characteristics of an entire IDF with one number, the following properties are desirable: • Anonymity or symmetry The inequality measure should not depend on how individuals in an observed distribution are labeled. Another words, it doesn’t matter who receives the income, all that matters is the distribution of income. This is generally expressed mathematically as: ( ( )) ( )I P y I y= (1) where P(y) is any permutation of income y; 6 This list is a collection whose individual properties are discussed in many places, including Cowell (2011), Ryu and Slottje (1998), Basmann and Slottje (1987), and Basmann, Hayes and Slottje (1991), among others. https://en.wikipedia.org/wiki/Permutation 5 • Scale independence or homogeneity As Cowell (2011, p. 63) notes, the measured inequality of the slices of the cake should not depend on the size of the cake. This property says that if (say) every person’s income in an economy is increased by some constant, then the overall metric of inequality should not change. This may be stated as: ( ) ( )I ay I y= (2) where a is a positive real number. • Population independence Similarly, the inequality measure should be independent of the level of population. Cowell (2011, p. 63) notes the inequality of the cake distribution should not depend on the number of cake-receivers. This is generally written as: ( ) ( )I y y I y∪ = (3) where ∪ is the union of x with itself. • Transfer principle The Pigou–Dalton, or transfer principle, states, in its weak form, that if income is transferred from a rich person to a poor person, while still preserving the order of income ranks, then the inequality measurement should not increase. In its strong form, the transfer principle says the measured level of inequality should decrease. As will be shown below https://en.wikipedia.org/wiki/Union_(set_theory) https://en.wikipedia.org/wiki/Pigou%E2%80%93Dalton_principle 6 in our paper, our new second measure satisfies this condition if it is considered together with the Gini coefficient (see the Appendix for proof). • Non-negativity The inequality index I(y) must be greater than or equal to zero. • Egalitarian zero The index I(y) is zero when everyone has the same income, meaning when all values yi are equal. • Bounded above by maximum inequality The index I(y) attains its maximum value of one, reflecting the maximum level of inequality (all iy are zero except one). In the discussion to follow, we introduce a new measure that will be shown to satisfy these properties. III. New Measure of Inequality that Supplements the Gini Coefficient Given our objective to find a new income inequality measure which is sensitive to extreme values, we propose to describe the income distribution with two summary measures rather than a single measure. The Gini coefficient, Theil’s entropy measure, and other well-known measures are useful in describing the overall state of income inequality, but these measures do not provide precise information about the presence of extreme values in an underlying IDF, or in how change in the extreme values over time impact the level of inequality as reflected in the summary index over time. 7 In this paper, we conceptualize a complete set of distributions all having the same Gini value. A function derived using only the Gini coefficient will be called the basic model in the paper. This basic model is known to be imprecise in describing the presence of extreme values. A second inequality measure will supplement the Gini, and is designed to describe the movements of the poorest group’s income share and the extreme values of the richest income group. The choice of the second inequality measure is extremely important. The basic model can be derived using the first inequality measure, such as the Gini coefficient, Theil’s entropy measure, and others. The basic model used in this paper is the Gini coefficient-based model. When the second inequality measure is added, it is desirable to derive the functional form corresponding to this second measure and to add this part to the basic model. In the applications section, the income distribution of the basic model and the distribution of the extended model will be compared. To introduce the second inequality measure, two functional forms are considered in this paper. The first functional form is the expansion of the logarithm of the share function in terms of the Legendre polynomial series. The second functional form is the expansion of the Lorenz curve in terms of the Legendre polynomial series. For the first functional form, the parameter of the first order polynomial term can be derived from the Gini coefficient, and the parameter of the third order polynomial term will be used as the second inequality measure. Note that the second-order term of the Legendre polynomial series is a symmetric function, so that it cannot be used in describing the monotonic increasing function. Both forms will be explained below. For the second functional form where the Lorenz curve is expanded in Legendre polynomials, the parameter of the zero-th Legendre polynomial term corresponds to the Gini coefficient, and the parameter of the first Legendre polynomial term can be used as the second inequality measure. 8 3.1 Orthonormal basis expansion of the logarithm of income share function For the given income observations, there are many ways to approximate the functional form of the data generating model. If an orthonormal basis (ONB) expansion is applied, the parameter calculation is unaffected by the size of the series. In comparison, the estimated parameters of the ordinary least squares regression method change their values when a new term is added in the regression series. The addition of higher-order terms in the series will allow the approximated function to converge to the data generating model. These functions with different series lengths form a complete set of income distributions corresponding to the basic model derived from the Gini coefficient. Orthonormal basis expansion allows us to superpose new terms on the basic model without disturbing the basic model. Suppose we have a continuous share function ( )s z for 0 1z≤ ≤ , where the poorest person is located at 0z = and the richest at 1z = . We can approximate the logarithm of the share function with a sequence of orthonormal functions, 0 1( ), ( ),P z P z 2 3( ), ( ), ....P z P z . Arfken (1985) presents an explanation of the ONB method: 1 ( ) ( ) N N n n n log s z a P z = =∑ (4) An orthonormal sequence satisfies: 9 ( ) ( ) , , , 0,1, 2,n m nm Z P z P z dz n md= =∫  (5) where 1nmd = if n m= and zero otherwise. The parameters of (4) can be found with: 1 ( ) ( ) ( ) ( ) N m m N m n n n a P z log s z dz P z a P z dz =  = =    ∑∫ ∫ (6) (see Ryu (1993) for the continuous version of ONB, and Ryu and Slottje (1996) and Milne (1949) for a discussion of the discrete version of ONB). The orthogonal sequence { }nP in the space 2 ( )L Z is called complete if there is no element 0f ≠ of 2 ( )L Z which is orthogonal to all the elements of nP . If: ( ) ( ) 0 for 0,1, 2,n Z f z P z dz n= =∫  (7) it follows ( ) 0f z = for almost all z Z∈ . Suppose the Legendre polynomials are used for 0 1z≤ ≤ : 10 ( ) ( ) ( ) ( ) ( ) 0 1 2 2 3 2 3 4 3 2 4 5 4 3 2 5 ( ) 1 ( ) 3 2 1 ( ) 5 6 6 1 ( ) 7 20 30 12 1 ( ) 9 70 140 90 20 1 ( ) 11 252 630 560 210 30 1 P z P z z P z z z P z z z z P z z z z z P z z z z z z = = − = − + = − + − = − + − + = − + − + − (8) Fig.1 shows 0 ( )P z is flat and 1( )P z is a linear function but ( )nP z has 1n − peak values. To approximate the logarithm of the share function, the Legendre polynomials with degrees of even numbers seem to be less useful because they have peak values at 0z = . Those functions with degrees of odd numbers will be useful as they have their lowest values at 0z = and their largest values at 1z = . Consider the following basic model, which can be derived from the given Gini coefficient: 0 1 1( ) ( )Ginilog s z a a P z= + or 0 1 1( ) exp[ ( )]Ginis z a a P z= + (9) Yitzhaki (2013) has shown that knowledge of the Gini coefficient is equivalent to knowledge of the first moment of the share function. To find the parameters of (9) from the Gini coefficient, consider: 0 1 1 0 1 0 1 1( ) 3 (2 1) 3 2 3a a P z a a z a a a z A Bz+ = + − = − + = + (10) 11 1 ( )dz exp[A ] 1 Giniexp[B ] 1 2B z s z z Bz dz B z z dz e µ = = + + = = −  ∫ ∫ ∫ (11) where the parameter A is removed with normalization of the share function. Knowledge of the Gini allows us to find 0,B a and 1a of (10). Therefore, the basic model is derived from the given Gini coefficient. 12 -3 -2 -1 0 1 2 3 0.0 0.2 0.4 0.6 0.8 1.0 z P0 P1 P2 P3 P4 P5 Fig.1 Plots of Legendre Polynomials 13 To consider the extreme values at the fat right tail of the share function, the following extended functional forms can be applied: Basic model: 0 1 1( ) ( )Ginilog s z a a P z= + (12) Second order: 2 0 1 1 2 2( ) ( ) ( )log s z a a P z a P z= + + (13) Third order: 3 0 1 1 2 2 3 3( ) ( ) ( ) ( )log s z a a P z a P z a P z= + + + (14) Fourth order: 4 0 1 1 2 2 3 3 4 4( ) ( ) ( ) ( ) ( )log s z a a P z a P z a P z a P z= + + + + (15) Fifth order: 5 0 1 1 2 2 3 3 4 4 5 5( ) ( ) ( ) ( ) ( ) ( )log s z a a P z a P z a P z a P z a P z= + + + + + (16) The parameters can be found with: ( ) ( )m m Na P z log s z dz= ∫ (17) The parameter values calculated by (17) do not depend on the length of the series. For example, the 2a parameters of (13), (14), (15), and (16) are the same. This is the benefit of the orthonormal function expansion. In comparison, the parameters estimated using a least squares method will fluctuate when we increase the length of series. Therefore, we can superpose another function derived with the additional parameter to the basic Gini model without damaging the basic model. We have assumed knowledge of a continuous function ( )s z and expanded the logarithmic transformation with an orthonormal basis (4), so that 14 the parameters were found with (6) using the orthogonality of the Legendre functions. As an alternative method, suppose we do not know the functional form of the underlying share function ( )s z . If nothing is known, the share function can be assumed to be a flat function. Suppose the moments of the share function are known, as follows: ( )m m z s z dzµ = ∫ for 0,1,2,...,m N= (18) Then the following moments can be calculated based on (8): ( ) ( )m mP z s z dzλ = ∫ for 0,1,2,...,m N= (19) Zellner and Highfield (1988) and Ryu (1993) solved an entropy maximization problem: ( ) log ( )sMax W s z s z dz= −∫ (20) satisfying: ( ) ( )m mP z s z dzλ = ∫ for 0,1,2,...,m N= (19) Then: 15 0 ( ) exp ( ) N n n n s z c P z =  =    ∑ satisfying ( ) ( )m mP z s z dzλ = ∫ for 0,1,2,...,m N= (21) If the Gini coefficient is known, this is equivalent to knowledge of 0λ and 1λ , and so we have: [ ]0 0 1 1( ) exp ( ) ( )s z c P z c P z= + (22) which is equivalent to (12). The parameters of (22) can be determined from the given Gini coefficient, as derived in Ryu and Slottje (2017b). Two alternative methods to approximate the share function are now explained. The first method assumes knowledge of the continuous ( )s z , which is expanded with a Legendre series. The second method does not assume the functional form of ( )s z but maximizes entropy subject to known values of moments. The derived functional forms are the same, but the parameter calculation methods are different. As we add more terms to the series, the approximated function approaches log ( )Ns z : [ ] 2 2 2 2 2 2 0 1 2 1 ( ) ( ) N N n n N n log s z dz a P z dz a a a a =  = = + + + +   ∑∫ ∫  (23) Using 2016 CPS data (which will be discussed below in detail), we have: 16 2 2 2 2 2 2 0 1 2 3 4 527.921, 1.190, 0.0376, 0.1340, 0.0146, 0.0740a a a a a a= = = = = = (24) where 0a is used for normalization and 1a is the slope term corresponding to the Gini coefficient. If we have to choose a term in addition to the basic model, then we can choose a term with the largest parameter squared value. In our case, 2 3a has the largest value among the remaining terms. Now suppose we wish to introduce a second inequality measure as a supplement to the Gini coefficient. There are a few choices suitable for this purpose. Consider the following: Typical model: 0 1 1( ) ( ) ( )N N Nlog sh z a a P z a P z= + + (25) Basic model: 0 1 1( ) ( )Ginilog s z a a P z= + (12) Second order model: 2 0 1 1 2 2( ) ( ) ( )log s z a a P z a P z= + + (13) Third order model: 3 0 1 1 3 3( ) ( ) ( )log sh z a a P z a P z= + + (26) Fourth order model: 4 0 1 1 4 4( ) ( ) ( )log sh z a a P z a P z= + + (27) Fifth order model: 5 0 1 1 5 5( ) ( ) ( )log sh z a a P z a P z= + + (28) An approximated share function with the additional third-order term will be a monotonic increasing function if its slope is nonnegative for the given values of positive 1a and 3a : 17 23 0 1 1 3 3 1 3 ( ) ( ) ( ) 2 3 7 (60 60 12) 0log sh z a a P z a P z a a z z z z ∂ ∂ + + = = + − + > ∂ ∂ (29) If a monotonicity test is passed for (26), then the third-order parameter 3a can be used as the second inequality measure. A similar monotonicity test can be performed for (28): 5 0 1 1 5 5( ) ( ) ( ) 0log sh z a a P z a P z z z ∂ ∂ + + = > ∂ ∂ (30) IV. Lorenz dominance and expansion of the basic model Another way to understand the intuition behind our new measure is to think about it in terms of Lorenz dominance. There are many Lorenz curves which can generate the same Gini coefficient. If we expand the Lorenz curve with a Legendre polynomial series, the zero-th order parameter can be determined from the Gini coefficient. The basic model will be the second-order Legendre polynomial series with three parameters, which can be determined from two boundary conditions, ( 0) 0L z = = and ( 1) 1L z = = , and the Gini coefficient. Inclusion of higher-order Legendre functions will modify the basic Lorenz curve, but all these Lorenz functions will have the same Gini coefficient due to the orthogonality of the Legendre series. A related discussion can be found in Choo and Ryu (1994). Suppose the Lorenz curve can be expanded through Legendre functions: 1 ( ) ( ) N N n n n L z b P z = =∑ (31) 18 The parameters can be found from the following relation: 1 ( ) ( ) ( ) ( ) N m m N m n n n b P z L z dz P z b P z dz =  = =    ∑∫ ∫ (32) The Gini coefficient determines the zero-th order parameter: 1 1 0 0 0 1 Gini ( ) ( ) 2 NL z dz L z dz b− = =∫ ∫� (33) Notice the above relation does not depend on the size of the series N and all ( )NL z will share the same Gini coefficient. The Lorenz curve should satisfy two boundary conditions: ( 0) 0 and ( 1) 1N NL z L z= = = = (34) Now using: (z 0) ( 1) 2 1 and (z 1) 2 1n n nP n P n= = − + = = + (35) the second-order polynomial series, which we label as the basic model, is given 19 as follows: 2 0 0 1 1 2 2( ) ( ) ( ) ( )L z b P z b P z b P z= + + (36) Suppose the Gini coefficient is known, that is, 0b is known. Using the boundary conditions, 2 ( 0) 0L z = = and 2 ( 1) 1,L z = = the parameters 1b and 2b can be calculated for the given Gini coefficient: 2 2 1 2 1 Gini 1 Gini( ) ( ) ( ) 3Gini z (1 3Gini) 2 2 3 2 5 L z P z P z z− = + + = + −    (37) This function becomes a nonnegative convex function if Gini < 1/3 because the convexity is satisfied if 2 2 2 ( ) / 0L z z∂ ∂ ≥ for all z. (i) If the Gini coefficient is greater than 1/3, (37) will not be a convex function. (ii) If the Gini coefficient is zero, ( )L z z= ; (iii) If the Gini coefficient is 1/3, then 2( )L z z= . The third-order polynomial series is: 3 0 0 1 1 2 2 3 3( ) ( ) ( ) ( ) ( )L z b P z b P z b P z b P z= + + + (38) If we apply the boundary conditions 3( 0) 0L z = = and 3( 1) 1L z = = , we have the 20 following 1 3 1 1 2 3 (1 2 3 )1 Gini Gini( ) ( ) ( ) ( ) 2 2 5 2 7 bL z b P z P z P z−− = + + +    (39) if 1(1 2 3 ) / 2B b= − , rewrite (39) as: 2 3 3 ( ) (1 3Gini 5 ) 3(Gini 5 ) 10L z B z B z Bz= − + + − + (40) Sufficient conditions to make (40) a positive convex function are: 0, Gini 5 , 1 3Gini 5 0B B B≥ ≥ − + ≥ (41) These conditions can be simplified as: 1 50 5 Gini 3 BB + ≤ < ≤ (42) This condition limits the range of 0 0.1 and Gini 0.5B≤ ≤ ≤ . If the given data do not satisfy the above conditions, then the Lorenz curve derived by (40) may not be a nonnegative convex function. If the Gini coefficient is 0.5 and 0.1B = , then 3( )L z z= . 21 V. Applications In order to illustrate the usefulness of the new measure, we present examples using Current Population Survey (CPS) data from 2000-2016. The CPS is sponsored jointly by the U.S. Bureau of the Census and the U.S. Bureau of the Census. The CPS produced a technical paper, TP66, which describes the design and methodology of the CPS, cf. www.bls.census.gov/cps/tp66.htm. We use CPS household income data disaggregated into centiles for the years 2000-2016.7 The distribution of the data for each year can be summarized by the Gini index. Now using the logarithmic share function given in (26), we can calculate a secondary measure to supplement the Gini index. In Fig.2, the approximated function converges to the observed income shares for 2016 as we increase the number of expansion terms. The Gini-based model in (12) is a basic model, and it performs poorly for the very richest income group. Even-order polynomials of the second-order in (13) and fourth- order in (15) performed badly because the even power terms of the Legendre polynomial terms are symmetric functions, and do not fit well for the monotonically increasing function. The third-order model in (14) seems to perform well, but the fifth-order model in (16) produced minor fluctuations in the middle range of the IDF. 7 We are grateful to Martha Starr for providing these data to us. http://www.blg.census.gov/cps/tp66.htm 22 In Fig.3, the Gini-based model produced a straight line and could not approximate the share values for the very poor and very rich groups properly. In comparison, if the third-order term is added, (26) showed an improved result for the poorest and very richest group. In the middle ranges, slight improvements were observed. .00 .01 .02 .03 .04 .05 .06 .07 .08 .09 0.0 0.2 0.4 0.6 0.8 1.0 z Observed log shares (2016) Gini based function Second order polynomials Third order polynomials Fourth order polynomials Fifth order polynomials Fig.2 Converg. of Legendre polynomials to obs. log shares 23 In Fig. 4, the performance of the third-order model of (26) is shown. Except for the very rich group, this model provided a relatively good performance. In Fig. 5, the performance of the fifth-order model of (28) is shown. Here, there is a small fluctuation around 0.7z = , but it produced a better performance for the richest group. -12 -10 -8 -6 -4 -2 0 0.0 0.2 0.4 0.6 0.8 1.0 z Gini based model Legendre third order model (26) Observed log share Fig.3 Approx. log shares with Gini and third order models 24 .00 .04 .08 .12 .16 .20 .24 0.0 0.2 0.4 0.6 0.8 1.0 z Observed shares The third order model (26) Fig.4 Approximated observed shares with third order model .00 .04 .08 .12 .16 .20 .24 0.0 0.2 0.4 0.6 0.8 1.0 z Observed share function The fifth order model (28) Fig.5 Approximate observed shares with fifth order model 25 In Fig. 6, we used the CPS data from the year 2000 and examined the performance of the Legendre polynomial series expansion of the Lorenz curve. To impose the convexity of an approximated Lorenz curve of a third-order polynomial series, the Gini coefficient should not be larger than 0.5, as stated below (42). The Gini coefficient for CPS data in 2000 is 0.490. The CPS data for the years 2012~2016 have Gini coefficients greater than 0.5. If the Gini coefficient is larger than 0.5, we need a higher-order Legendre polynomial series expansion instead of relying only on (39). In comparison, to impose the convexity of the approximated Lorenz curve of the second-order, the Gini coefficient should be less than 1/3, as stated below (37). 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 Z Observed Lorenz Curve for 2000 Approximated Lorenz Curve for 2000 Fig.6 Approximate the Lorenz Curve for 2000 26 In Fig. 7, the movements of the Gini coefficient and income shares of the richest 5% are compared. They move more or less in the same directions, though the gap between the two curves decreased after 2012. This means the Gini coefficient is not as sensitive to extreme movement in the highest percentiles of income earners. .46 .48 .50 .52 .54 .56 .58 .60 .62 .26 .28 .30 .32 .34 .36 .38 .40 .42 1996 2000 2004 2008 2012 2016 2020 Year Gini(Left Scale) Rich5p(Right Scale) Fig.7 Comparison of Gini and richest 5% movements 27 Fig. 8 shows the third order parameter ( 3a ) of an ONB expansion of the log share in (26). This parameter ( 3a ) moves in an opposite direction relative to the movements of the poorest 5 percent of income earners (poor 5P) curve. In 2015, the poorest 5P faced a significant loss in income share but recovered in 2016. The parameter ( 3a ) shows the opposite movements, indicating more inequality as the poorest group suffered a loss in income share. For movement of the richest 5P and parameter ( 3a ), a similar trend is observed but more refined details are different. Here, the ( 3a ) measure goes up as the richest share increases and goes down as the richest share decreases. .10 .15 .20 .25 .30 .35 .40 .45 .0016 .0020 .0024 .0028 .0032 .0036 .0040 .0044 1996 2000 2004 2008 2012 2016 2020 Year a3 (Left Scale) Rich5P (Left Scale) Poor5P (Right Scale) Fig.8 Comparison of a3(ONB), rich 5P, and poor 5P 28 Fig.9 shows the usefulness of the Gini coefficient, Theil’s entropy measure, and the third order parameter ( 3a ) in describing the movements of the poorest 5P and the richest 5P. The Gini coefficient and Theil’s measure are more or less the same in that they are both are reasonably good at describing the movement of the richest 5P. As explained in the discussion of Fig. 7, the third parameter ( 3a ) was stronger in describing the movement of the poorest 5P group’s share. .0 .1 .2 .3 .4 .5 .6 .0016 .0020 .0024 .0028 .0032 .0036 .0040 1996 2000 2004 2008 2012 2016 2020 Gini(Left Scale) Theil(Left Scale) a3(ONB Parameter, Left Scale) Rich 5P(Left Scale) Poor 5P(Right Scale) Fig.9 Compare Gini, Theil, a3(ONB), Rich 5P, and Poor 5P 29 To check the performance of the Gini, Theil, and the third parameter 3a , a curve- fitting exercise is performed where least squares estimation results are compared: 2 (0.0007662) (0.001435) 15 0.01124 0.01616 Gini+u , 0.8943P R= − = (43) 2 (0.0002582) (0.001768) 25 0.004793 0.01523 Theil , 0.8319P u R= − + = (44) 2 (0.0004459) (0.001144) (0.001157) 3 35 0.008385 0.007416 Gini 0.01025 , 0.9840P a u R= − − + = (45) 2 (0.01404) (0.02630) 45 0.3677 1.2824 Gini+u , 0.9937R R= − + = (46) 2 (0.003108) (0.02128) 55 0.1371 1.2544 Theil , 0.9957R u R= + + = (47) 2 (0.01355) (0.03475) (0.03514) 3 65 0.4111 1.4155 Gini 0.1559 , 0.9974R a u R= − + − + = (48) Equations (45) and (48) show that the poorest group and the richest group are both described well if the Gini coefficient and the third parameter 3a are used simultaneously, as these combinations provide the best fit of the data. VI. Conclusion This paper introduced a new inequality measure to supplement the better known Gini Index, where the new measure is sensitive to the asymmetries and extreme values in the underlying IDF that the index is intended to measure. The inequality measurement literature contains hundreds of papers on an appropriate index of income inequality, and on what desirable properties such a measure (or index) should contain. 30 There is a concurrent literature on the use of hypothetical statistical distributions to approximate and describe an observed distribution of incomes. Even with the recognition by some of the fact that incomes are distributed with asymmetric higher moments, inequality indices constructed to capture the level of inequality inherent in these observed income distributions (with a single number) are generally based on the mean and variance of the observed data. This paper introduced a new inequality measure to supplement, but not to replace, the Gini coefficient that measures more accurately the inherent asymmetries and extreme values that are present in observed income distributions. The new measure is based in a third-order term of a Legendre polynomial from the logarithm of a share function (or a first-order term of a Lorenz curve). In this paper, we advocated using the two measures together to provide a better description of inequality inherent in empirical income distributions with extreme values. We applied the new measure to examine inequality in U.S. CPS household income data for 2000-2016 in income centiles. The new measure was shown to be an excellent supplement to the Gini coefficient. The Gini index provides an intuitive overall measure of the inequality inherent in an IDF. Changes in the level of inequality inherent in the empirical IDF (particularly for the extreme portions of the IDF) were detected more accurately by the new measure than by simply calculating the Gini index alone. 31 References Arfken, George, 1985, Mathematical methods for physicists, third edition, Academic Press, Inc. San Diego. Basmann, R. and D. Slottje, (1987), “A new index of income inequality,” Economics Letters 24: 385-389. Basmann, R., K. Hayes, and D. Slottje, (1991), “The Lorenz curve and the mobility function,” Economics Letters, 35: 105-111. Boushey, H., J. Delong, and M. Steinbaum, (2017), After Piketty, Harvard University, Cambridge, MA. Choo, Hakchung, and Hang Ryu, 1994, Gini coefficient, Lorenz curves, and Lorenz dominance effect: An application to Korean income distribution data, Journal of Economic Development 19, No.2, 47-65. Coles, S. (2001), An introduction to Statistical Modeling of Extreme Values, Springer-Verlag. Cowell, F. (2011), Measuring Inequality, 3rd Edition, Oxford: Oxford U. Press. Cowell, F. and E. Flachaire (2002), “Sensitivity of Inequality Measures to Extreme Values,” LSE STICERD Paper No. DARP 60. Cowell, F. and E. Flachaire (2007), “Income Distributions and Inequality Measurement: the Problem of Extreme Values,” Journal of Econometrics, 141: 1044-1072. Maasoumi, E. (1986), "The Measurement and Decomposition of Multidimensional Inequality," Econometrica, 54: 991-998. Maasoumi, E. (1989), "Continuously Distributed Attributes and Measures of Multivariate Inequality," Journal of Econometrics, 42: 131-144. McDonald, J.B. (1984), “Some Generalized Functions for the Size Distributions of Income,” Econometrica, 52: 647 – 663. McDonald, J., J. Sorenson and P. Turley (2013), “Skewness and Kurtosis Properties of Income Distribution Models,” Review of Income and Wealth, 59: 360 – 374. Milne, W. (1949), Numerical Calculus, Princeton University Press, Princeton. Pareto, V. (1876), Cours d'Économie Politique Professé a l'Université de Lausanne. Piketty, T. (1995), “Social Mobility and Redistributive Politics”, Quarterly Journal of Economics, 110: 551-584. Piketty, T. (2014), Capital in the Twenty-First Century, , Harvard University Press, Cambridge . 32 Ryu, H. (1993), "Maximum entropy estimation of density and regression functions", Journal of Econometrics, 56: 397-440. Ryu, H. (2013), “A bottom poor sensitive Gini coefficient and maximum entropy estimation of income distributions, Economics Letters, 118: 370-374 Ryu, H. and D. Slottje, (1996), "Two Flexible Functional Form Approaches for Approximating the Lorenz Curve", Journal of Econometrics, 72: 251-274. Ryu, H. and D. Slottje, (1998), Measuring Trends in U.S. Income Inequality, Theory and Applications, Springer, New York. Ryu, H. and D. Slottje (2017), “Maximum Entropy Estimation of Income Distributions from Basmann’s WGM Class,” Journal of Econometrics, 199 (2): 221-231. Slottje, D. (1987), “Relative Price Changes and Inequality in the Size Distribution of Various Components of Income,” Journal of Business and Economic Statistics, 5: 19-26. Yitzhaki, S. (2013), More than a dozen ways of spelling Gini, ch-2 in The Gini Methodology, Springer, 11-13. Zellner, A. and R. Highfield, (1988), “Calculation of maximum entropy distributions and approximation of marginal posterior distributions,” Journal of Econometrics, 37: 195-209. 33 Appendix: Pigou-Dalton Principle (PDP) for model (26) The logarithm of the share function can be expanded in the Legendre series: 0 0 1 1 2 2 3 3( )N N Nlog s z a P a P a P a P a P= + + + + + (4) Suppose we want to summarize income inequality with only a Gini coefficient. This corresponds to taking a basic Gini model (12) because higher-order Legendre polynomials do not influence the choice of 0a and 1a : Basic model: 0 1 1( ) ( )Ginilog s z a a P z= + (12) The Gini coefficient can be determined from 1a and vice-versa, as discussed in (11). Even if we include higher-order terms of (4), 1a will be the same in (4) and (12). Now to prove the PDP condition holds for our new measure, suppose i j< and ( ) ( )i js z s z< . After a transfer of small income share (∆ ) from the jth person to the ith person, new income shares of these two people become ( )is z + ∆ and ( )js z −∆ . This means the slope of log ( )s z is now lower. Thus 1a and the Gini coefficient are lower, and [ ] 2 ( )Nlog s z dz∫ has decreased. If [ ] 2 ( )Ginilog s z dz∫ is a good approximation of [ ] 2 ( )Nlog s z dz∫ , 2 2 0 1a a+ will decrease because we have: [ ] 2 2 2 0 1( )Ginilog s z dz a a= +∫ (A1) 34 In the standard discussion, income transfers from a rich person to a poor person is described with a lower value of the Gini coefficient, but here the same effect is represented with lower values of [ ] 2 ( )Ginilog s z dz∫ and 2 2 0 1a a+ . Similarly, if the logarithm of the share function is approximated with the first-order and third-order Legendre polynomials, then the logarithm of the share function is summarized with the ONB parameters 1a and 3a . For the Third-order model: 3 0 1 1 3 3( ) ( ) ( )log sh z a a P z a P z= + + (26) The parameters 1a of (12) and (26) are the same, and can be derived from the given Gini coefficient. If the income share transfer decreases [ ] 2 ( )Nlog s z dz∫ , and if [ ] 2 3 ( )log sh z dz∫ is a good approximation of [ ] 2 ( )Nlog s z dz∫ , then the income share transfer lowers 2 2 2 0 1 3a a a+ + : [ ] 2 2 2 2 3 0 1 3( )log sh z dz a a a= + +∫ (A2) Therefore, the PDP will have a decrease of 2 2 2 0 1 3a a a+ + which completes the proof.