This post originally appeared, in 2008, on an earlier version of

The Gini Coefficient measures income inequality in a country—or any region—using a single number. It was created by Corrado Gini, an Italian statistician, in 1912. I’ve know of it for a long time—and it has irritated me at right regular intervals. Why? I was taught by one of my wise elders, then a junior in business analysis, that any number that people produce in such analysis should present all of the data necessary to replicate it with a simple calculator. My guru, who then labored as the chief of statistics for Anheuser Busch, carried his calculator strapped to his suit-belt. He practiced what he preached. But when we look for an explanation, we’re bowled over by references to the Lorenz curve and presented with stuff that looks like this:

*LaMarotte*. It is reprinted here without changes.
* * *

The Gini Coefficient measures income inequality in a country—or any region—using a single number. It was created by Corrado Gini, an Italian statistician, in 1912. I’ve know of it for a long time—and it has irritated me at right regular intervals. Why? I was taught by one of my wise elders, then a junior in business analysis, that any number that people produce in such analysis should present all of the data necessary to replicate it with a simple calculator. My guru, who then labored as the chief of statistics for Anheuser Busch, carried his calculator strapped to his suit-belt. He practiced what he preached. But when we look for an explanation, we’re bowled over by references to the Lorenz curve and presented with stuff that looks like this:

*G = 1 – 2 ∫*

_{o}^{1 }L(X)dX

The last time this happened to me (yesterday) my irritation produced a determined search for a simple explanation. Therefore I am now prepared to explain the Gini Coefficient in plain language, namely how it is actually calculated and how the data, to be used, must be arrayed for the calculation. I’ll use the following chart using U.S. household data for 2008 for this explanation.

The raw data for this chart, which I’ve taken from this Census Bureau site, shows the cumulative share of household income as we proceed from the poorest toward all households, thus from the lowest fifth of all households up the line until all households are included. That is the blue line. Here is the way we must read the chart. The lowest 20 percent of households (lowest quintile), accounts for 3.5 percent of total income. The lowest 40 percent of households (the lowest and second quintile cumulated) account for 12.1 percent of total income… And so on to the last column where—surprise—

*all*households account for*all*of the income. Clear so far?
The blue line represents actual results for 2008. The red line, by comparison, shows what the results would be if every group earned exactly the same amount. Not surprisingly, 60 percent of all households, then, would be earning 60 percent of the total income. This is not rocket science either.

Notice now the area surrounded by these two curves. It represents the

*inequality*in income, thus the difference between an*ideal*and an*actual*state of affairs. What the Gini coefficient (also called an Index or a Ratio) actually calculates and reduces to a single number is the magnitude of this difference. I’ll present the formula and how its elements are obtained. I have not penetrated deeply enough to explain the formula itself.
To begin with, we make note of the last number—100 percent in our case. We’ll call that T for Total. Next, we calculate a value called Sigma. It consists of the sum of all of the numbers added together—up to but excluding the last. In our case that is 3.5 + 12.1 + 26.7 + 50.9 = 93.2. That is Sigma. Finally we note the number of groups we used in the analysis. We used quintiles, therefore we used five groups. We generalize that number by calling it n. Now we insert these values into the formula used to obtain the Gini Ratio. That formula is:

Gini = 1 – (2 divided by T times Sigma + 1) divided by n.

Translated into numbers, this means Gini = 1 – (2/100 * 93.2 +1) / 5. The result of this calculation is 0.4272. That’s the Gini Ratio. You may encounter it multiplied by 100 for easier readability (here 42.7).

If we apply the same approach to the top line, we have a T=100, Sigma = 200 and the formula becomes Gini = 1 – (2/100) * 200 + 1) / 5. This results in 0. In the ideal case, in other words, there is zero inequality.

Having followed this procedure, we have now generated a single number for each curve and we can therefore compare them. The rule here is:

*The lower the Gini**the more equal is the income distribution.*It can’t get any lower than zero–and can never exceed 1. A result of 1 would mean that a single group has*all*the income and nobody else has*any*.
Let me follow this up by looking at the Gini Ratio over some period of time. The following graph (its source is here) does that for us for the period 1967 to 2007.

Income inequality, although it rises and falls year to year, has been increasing steadily over the recent forty year history presented above. The Gini is useful especially at this level of macro analysis. It holds a vast amount of detail in a single number. And now that I know how it is obtained, I find it much more acceptable. [For an updated Gini Ratio to 2013, see the previous post here, same date.]

An additional note. Country to country comparisons using Gini calculation are interesting but not much more than that. Several organizations (the CIA and UN are two) calculate this number for many countries. The U.S. falls generally into the upper ranges of inequality–but not at the very top. To find the peaks, we can single out Brazil and Mexico. China? China’s inequality is just about the same as ours. And Japan’s falls below ours. Bulgaria is hugging the bottom range–at least in the list of countries shown in

**this**Wikipedia chart.