Andrew Gelman over at Fivethirtyeight.com purports to show how whites at various income levels voted in 2008at the presidential level, which is an interesting enough concept. However, Gelman inexplicably avoids using exit poll data to make his charts. Rather, he bizarrely decided to make broad assumptions about race in this country (a touchy subject under the best of circumstances) by using pre-election polling data from Pew as the basis for his regression analysis.
To reiterate, since I could hardly believe it myself -- Gelman based an entire study of the white vote in 2008 based not on, you know, election results, but on pre-election national telephone polling.
Originally I theorized that Gelman relied on the Pew data because it broke down the income into granular levels (under $20K, $20-40K, $40-75K, $75-150K, over $150K), and that it provided racial breakdowns for those categories. But looking closer, I found that 1) Pew didn't break down those categories by race -- Gelman filled in the blanks with statistical sleight of hand, and 2) counted eight income categories in the exit polling, so even more granularity than Pew provided. What's more, the exit polling helpfully provided race and income data for two categories -- whites (and non-whites) making over and under $50K.
In other words, there was a great deal more data, and BETTER data, that Gelman could've used to tease out the maps he created. So I decided to compare his results to the actual exit polling. The first state I checked was Colorado, knowing from memory that Obama easily won the white vote in the state despite Gelman claiming otherwise. My suspicions were immediately proven right: While Gelman claims only the under-$20K white demo went for Obama, the results were far different. Per the exit poll -- real voters -- Obama won all whites: 54-45 percent for those making under $50K, and 51-47% for those making over $50K.
So having one flubbed state under my belt, I decided to investigate further. Here are Gelman's maps:
Now remember, the exit polls don't offer race and income breakdowns at the level of income granularity Gelman worked with, but the map below shows, per the exit polls, how white voters making more than $50K per year voted.
What's different? Per exit polls, Obama handily won Colorado whites as already mentioned above. But in addition, he easily won them in New Hampshire as well, 54-45, despite Gelman's claim that Obama lost the $40-75K cohort. And given that New Hampshire is 94% white, and that Obama won all Granite State voters making $75-100K by a hefty 58-41 margin, I think it's safe to assume that there's little chance Gelman is correct.
On the other hand, check it out -- Michigan is red. Yup, McCain won whites making over $50K by a slight 50-48 margin. Are there enough Michigan voters making more than $150K a year to offset a supposed Obama victory among those making $40-150K? I bet a good analysis of the exit poll data along with your standard regression analysis could probably yield some interesting answers. That's what Gelman should've looked at, not pre-election telephone poll data.
I should note that Gelman seems to have nailed New Jersey, Maryland, Pennsylvania, and a few other of the more surprising results, which suggests his work had merit as a predictive effort, but the various misses make it unacceptable as a tool to explain election results. Telephone polling doesn't offer a solid foundation for making claims about election results.
Now let's look at numbers for poorer whites. First we'll look at Gelman's maps:
Below is the map of the under-$50K white vote based on actual exit polling.
Gelman has Obama winning Nevada in that (rough) income group, but the exit polls peg it an exact 48-48 tie. He has Pennsylvania Blue, when McCain won the under $50K white vote 50-49. Gelman has North Dakota red, which surprisingly was a 49-49 tie. And he has New Mexico blue, while the exit polls say McCain won 49-48. All of these are tight enough that they could tip one way or another based on the exit polling's margin of error, which is supposedly around 1 percent.
In Ohio, Obama won under $50K whites 51-47, and in Indiana, 50-49. Gelman's maps split these states, with Obama winning the poorest voters, and McCain the $20-40K group. We don't know if Gelman is right, but it's certainly possible. It would be better if we didn't have to guess.
He has Connecticut red for the under $20K white vote, while Obama won the under-$50K white crowd by a solid 59-36. Looking at the income and race exit poll data in that state, it seems improbable that Obama would've lost the under $20K crowd. His projections for Montana (where Obama won the under-$20K demo, but lost those making $20-40K) could theoretically be correct, but probably not. From the exit polls, Obama won the under $50K white demo 51-45. The state is 90 percent white, and looking at the overall income numbers, Obama won the $15-30K crowd 58-38, and the $30-50K group 51-45. With those results, it's hard to see how Obama loses the $20-40K white crowd.
New Hampshire is solidly Blue unlike Gelman's maps, 58-40 -- one of the most obvious misses in Gelman's analysis.
Look, I'll posit something up front -- I suck at math. Half the shit above, while at the "simple math" level, could be wrong. What I have a hard time understanding is why, given the wealth of exit poll data, someone would claim to make an analysis of the 2008 vote based on a pre-election telephone poll. I can't begin to fathom it.
Had Gelman made the same analysis based on this exit poll data, then all the power to him. Nate himself referred to exit poll data when he wanted to compare demo results. And yes, while exit polls have their own margin of errors and sample composition problems, they sure as heck beat anything done over the telephone. If nothing else, sample sizes of about 8,000 per state of confirmed (as opposed to "likely") voters makes them a little better as data tools than 2,587 randomly dialed individuals across the entire country.
I'd love to see Gelman's redo this analysis using exit poll data. When we talk about election results, we should be basing our analysis on election results.