Tuesday, January 29, 2008

I Dream of Gini

Once again, please bear with me on this one, as sometimes it's necessary to endure a tedious first half to fully understand the more interesting second half. Those of you who've seen 'Cloverfield' know what I'm talking about.

I begin by introducing the Gini Index. Developed in the early 1900's by Italian mathematician Corrado Gini, the Gini Index is a statistic oft cited by the UN and other such organizations to measure the inequality of income distribution in countries around the world. It's calculated by taking everybody in a given society, lining them up from left to right in order of increasing income, cumulating their income as you go down the line, and measuring how much this running total differs from what the running total would be if the society's collective income were evenly distributed among all members. The end result is a value between 0 and 100, where 0 is perfect equality (everybody has exactly the same income) and 100 is perfect inequality (the collective income of the entire society belongs to only one member - everybody else makes zero). If you want more details on calculating the Gini Index, click here.

For those readers who don't use the Gini Index on a day-to-day basis (what rock do you live under?), below is a sampling of the Gini values for income from selected countries:

Country: Gini Index:
Denmark 24.7 (lowest in world)
India 32.5
United States 40.8
China 44.7
Brazil 58.0
Namibia 74.3 (highest in world)

So Denmark is the country with the most evenly distributed earning power among its citizens. Namibia on the other hand has the largest imbalance, with huge amounts of wealth concentrated among very few citizens. The United States lies somewhere in the middle.

Now that your Gini value processor is somewhat calibrated, let's employ the Gini Index to things it was never intended to be used for, such as the home run distribution of baseball teams. If we treated baseball teams as countries and home runs as income, we could quantify how much of a team's power is concentrated among the few or how evenly it is spread across the lineup.

Here are the 30 MLB teams, ranked in order from most evenly distributed to least evenly distributed (most concentrated) home run hitting. I've also added a column "HR Total Rank" to indicate how the teams ranked in terms of total home runs hit:

HR Distribution Rank: Team HR Total Rank: HR Distribution (Gini Index): Similar to Income Distribution of:
1 (most distributed) Texas 8 30.8 Netherlands
2 Atlanta 12 36.0 Italy
3 Baltimore 23 37.2 Vietnam
4 Seattle 20 37.8 Latvia
5 Oakland 13 37.9 Jamaica
6 Detroit 13 38.2 Portugal
7 Boston 18 39.1 Israel
8 Kansas City 30 (fewest HRs) 39.4 Burkina Faso
9 Pittsburgh 22 39.5 Morocco
10 LA Dodgers 26 40.3 Trinidad and Tobago
11 Cleveland 9 41.1 United States
12 San Diego 14 41.6 Senegal
13 Arizona 15 41.9 Thailand
14 Tampa Bay 7 42.8 Iran
15 NY Yankees 4 43.4 Hong Kong
16 Milwaukee 1 (most HRs) 44.2 Venezuela
17 Cincinnati 3 44.3 Camaroon
18 Washington 27 44.4 Ivory Coast
19 Toronto 19 44.6 China
20 NY Mets 11 45.6 Rwanda
21 St Louis 24 46.4 Philippines
22 Florida 5 47.6 Mexico
23 Colorado 16 48.0 Madagascar
24 San Francisco 25 48.5 Malaysia
25 LA Angels 28 50.3 Gambia
26 Houston 17 50.4 Malawi
27 Philadelphia 2 50.7 Niger
28 Chicago Cubs 21 52.5 Argentina
29 Chicago White Sox 21 53.4 Chile
30 (least distributed) Minnesota 29 62.5 Sierra Leone

So the Texas Rangers are the Denmark of Major League baseball (although statistically they are closer to the Netherlands), topping the list of most evenly distributed HR production. Their top 6 players account for just over half of the team's home runs. Contrast that with Minnesota, where it takes only their top 2 guys (Torii Hunter and Justin Morneau) to account for half. The Dodgers rank 10th on the list, with nobody demonstrating great power but with 7 guys producing moderate power. They boast the distinction of being the Trinidad and Tobago of baseball.

Another thing to note is that there's no obvious correlation between HR frequency and HR distribution. While Minnesota ranked at or near the bottom in both categories, many of both the best and worst power teams (Milwaukee, Cincinnati, NY Yankees and Washington, LA Dodgers, Kansas City) congregated around the middle of the HR distribution rankings, as Orel pointed out.

Well that's all the insight I have for now...if you have any, please share. Thanks for reading through.


Orel said...

This is cool. Is there a version of this stat that factors in total homers?

Eric Karros said...

Not to my knowledge. I suspect it'd be tough to meaningfully fit the two on a common scale. All else being equal, I think most would agree that for total homers, the higher the better. But for homer distribution, all else being equal, it's not as clear which is better, being more dispersed or more concentrated. It'd be like trying to combine mean and standard deviation into one stat. There's probably something related out there in the world of math.

Orel said...

Maybe we could see how Gini correlates with winning percentage—do teams that share the HR load perform better?

Eric Karros said...

well winning % is probably dominated by too many other factors, as even HR total doesn't present clear correlation.

here's study I'd like to see but don't have the diligence to conduct - throughout the past 10 years or so, for teams at a given total HR level, is there a correlation btwn total runs scored and HR dispersion. however one of the downsides of being too concentrated wouldn't be accounted for here - that being if one of your top HR hitter goes down - cuz their stats wouldn't be factored in anyways.

Anyways I am heading to airport will try to check in from time to time over next 2 wks -

Steve Sax said...

Are the Dodgers the only team which had a former player name that matched their country analogy (Trinidad Hubbard)?

Orel said...

Did Ruben Sierra ever play for the Twins?

Eric Karros said...

I don't think Israel Valdez ever played for Boston nor Fernando Venezuela for Milwaukee, so I think so