What makes a man turn neutral? Lust for gold? Power? Or were you just born with a heart full of neutrality?
- Zapp Brannigan, Futurama
I'm an avid fan of r/dataisbeautiful, and I've seen plenty of US map based visualizations over time. I've always wanted to create something spectacular that would stand out amongst all these beautiful maps, but I never got the inspiration until now.
While keeping an eye on the US elections like everyone else, I noticed something curious. Most large cities were voting Democrat; even in hardcore red states.
I've been meaning to learn R for quite some time now, so I thought this would be the perfect chance to get started. After a quick, painless installation, I am ready to go.
My goal here is to make a map that shows how each county voted in the years past and to then compare that to the county's population density.
Firstly, I need to plot the map of the US. Turns out, there's a pretty comprehensive package for that.
usmap by Paolo Di Lorenzo is built specifically for the purpose of plotting choropleths on the US map. Yeah, I had to look up what that means.
It's at this point that I realize I've got my priorities messed up. I first need to learn R. A crash course is in order. Bill Petti tells me everything I need to know. Now, I'm truly ready to go.
usmap dataset has FIPS codes that are necessary to plot a US choropleth. But before getting at that data, there was some data cleaning that I had to do due to different naming schemes followed by
usmap and the US Census. Some counties also changed after the last census. Shannon County, SD changed their name to Oglala Lakota County in 2015, and Bedford County, VA merged with the previously independent city of Bedford in a very confusing administrational restructure. Thank me when you win Trivia Night.
Due to the way the data differed, cleaning it proved to be easier in Excel. So, I exported the data as csv, cleaned it, and reimported it before proceeding.
Let's see how that looks.
Well that was bad. It's probably because the population density is extremely skewed right. There are a few small counties with ridiculously high population densities (NYC, I'm looking at you), while the majority of the counties have less than 100 people/sq. mile. The former don't show up because they are tiny. The skew can easily be fixed by logarithmic scaling. So let's try that.
We can see the coastal areas are densely populated, while nobody wants to live in the rural Midwest, Utah, Wyoming, or Alaska. Now that that works, we can move on to plotting how each county voted.
What's with Alaska, you ask? Maybe they are just religiously neutral? Maybe they vote white because they love the snow? This led me into another rabbit hole where I learnt that Alaska has boroughs and parishes instead of counties. But, for elections, they have voting districts, and these coincide with their boroughs in only 3 cases: Aleutians East, Aleutians West, and Anchorage; all of which are visible on the map. I've decided that that's more than enough detail for the scope of this blog, so I'm going to pretend Alaska doesn't exist for a while.
Now to see if we can make an animated map with all the results from 2000 - 2016.
gganimate is a wonderful library that helps animate
ggplots (on which
usmap is built) based graphs.
There is a workaround to this that involves increasing the memory available to R using
memory.limit(), but my computer freaked out when I did that. It seems that R runs entirely on the RAM and hence has trouble with large animations. So, I decided to tone it down.
Why does it darken at the end of every year? I don't know. Why is most of Alaska missing? I don't know. Why is the animation not smooth? I don't know. Actually, I do. 20 fps. It doesn't matter though because I'm not exactly being sponsored by Disney here.
Now, to overlay population density data on this. My idea is to set alpha values based on the population density, so high density counties show up brighter. Unfortunately, I simply can't see a way to do that. My search led me to a StackOverflow question, but the solution provided there throws an error. After much trial and error, and banging my head against a wall for a ridiculous amount of time, I've decided that the grapes are sour and this is not the best way to check if population density has anything to do with voting habits.
I opt for a simpler method. Statistical analysis. Our null hypothesis is that voting habits and population density are independent. I feel that a Chi Squared test would suit us just fine, so I group the counties into categories based on population density. For lack of better terms, let's call them the wilderness (0-10 people/sq. mile), villages (10-100), towns (100-1000), cities (1000-10,000), megacities (10,000-60,000), and NYC (way too many).
With P-values that close to zero, our null hypothesis can be safely rejected. We can say with absolute certainty that voting habits and population density are correlated. People who live in high density areas definitely vote Democrat. The 9 most densely populated cities in the US all consistently voted Democrat from 2000-2016. Of the top 40 most densely populated counties, 29 consistently voted blue. In fact, the first and only city in that list to consistently vote Republican is Richmond, Virginia which is the 29th most densely populated city in the US. My guess has been proved right.
Are crowds inherently blue, though? Does living in a city magically make you a liberal? Or does being a liberal make you move to cities? It is far more likely that there are some other variables at play here that affect both these factors, and hence the high correlation. Maybe educated folk tend to lean left and tend to live in cities. Maybe people with exposure to different cultures tend to be more progressive and enjoy living in cities. Or maybe the economic policies of the Democrats help city folk while Republican policies help rural folk. Future posts will delve deeper into this topic.
Let me end by reiterating the golden rule of statistics: correlation doesn't imply causation. We've proved that population density and voting habits are intensely intertwined (at least in the US); not that high population density makes people vote blue. If that was the case, the Dems need only stuff people in a small room to gain votes.
Mission accomplished, I'd say. Despite not having a beautiful map to show-off, I've learnt R. And in the end, isn't the journey all that matters?