Polling in the 2008 Presidential general election worked really well, at a time when the whole nation was paying attention. Polling averages correctly predicted the winner of every state except Indiana, which was close. So the polls were right.
But a funny thing happens if you go back and look carefully - the polling wasn't quite perfect... sometimes, it was off by quite a bit... and there's a pattern.
We'll start with the heaviest polled states, those with 10 or more polls. Polls from the final 10 days of the election were included in averages if a trend was evident; otherwise, polls from October 1st onward were included. Research 2000 polls were excluded. The polling margins were off by anywhere from three points in favor of Obama to three points in favor of McCain. Really, quite good predictions. But look what happens if you plot it against percent Obama:
The horizontal black line indicates perfection. This is what all polls are aiming for. If Obama's margin was larger than predicted, the point falls above this line; if smaller, it is below. In other words, above the line the election results favored Obama compared to the polls, below the line the election results favored McCain when compared to the polls.
What we see is no random error. Excluding NH, there's a lovely correlation. And if you read the previous post in this series, this is a familiar looking graph...
So what does this mean for 2012? Polls will very likely underestimate Obama's performance in deep blue states, while underestimating Romney in deep red states. So don't worry if you see Obama polling at only 50% in Massachusetts. But in frequently polled states where Obama and Romney are nearly tied, the polling averages should be within a few points of the real vote margin.
Geeking out on polling data.
So what about the other states? Can the pattern hold up? In the other states, the polling was sometimes off by quite a bit. For instance, McCain got 6.5% of the vote in DC, half or less than half of what the two polls of DC showed - 13% and 15% of the vote.
So if we throw all the states up on the graph, this is what we get:
Again, if the polling averages were perfect, all the dots should fall on the solid horizontal black line. Clearly, this is not the case. Instead, in both the reddest and bluest states, the polls underestimated the margin of the winning candidate. For example, in Oklahoma, Obama lost by 31 points (with 34% of the vote) instead of the 26 points predicted, so its point is below the horizontal line at -5. In Vermont, Obama won by 37 points (with 67% of the vote) instead of the 28 predicted, so Vermont falls above the horizontal line at +9.
You'll note that the heavily polled swing states, where Obama got around 50-55% of the vote, mostly hug the purple regression line, except Indiana, the surprise state of the election, and... Iowa and New Hampshire. That can't be a coincidence; perhaps a two-year long election makes voters there a little twitchy.
For the other states, they tend to wander from the regression a bit, but most of this is due to a small number of polls in the averages. Here's a graph of the absolute value of the polling error versus number of polls in the polling average; it behaves as it should, except perhaps for NH, IA, and HI.
It's all the same.
We saw a similar regression in the first part of this series, with Governor and Senate races, and we can see the same thing in the 2004 numbers, although there's far fewer polls from 2004. In fact, they all plot pretty much along the same line:
Hawai'i has an impressive outlier there - I've heard said that Hawaii is hard to poll because of cultural reluctance to respond to polls among the non-white majority. But if you compare exit polls to pre-election polls, in the three cases when cross-tabs are available (one per election), the margin was off just about as much, if not more, among whites as minorities. The graph above shows most of the problems with polling in Hawai'i can be attributed to its Democratic tendencies, and the previous graph shows the typical dearth of polling there adds even more error on top of that.
So now we know that polling in the reddest of red states underestimated McCain's winning margin, while in the bluest of blue states it seriously underestimated Obama's winning margin. And the same thing happened in 2004. And the error of most Governor and Senate races is related to presidential vote share too. But why?
There's no relationship between percent Obama and percent undecided, so the trends can't be explained by more fence-sitting voters in non-competitive (ie not swamped with ads) states. And in VT, HI, and DC, McCain was actually polling much higher than his eventual performance, and in OK Obama was polling higher than the final results. So even apportioning all the undecideds to the winner can't explain everything. So, while certain individual polls can have a problem of too many undecideds who aren't really undecided (see the 2006 Idaho polls!), overall it's not the answer. However, if undecideds vote, they may contribute to the Lemming Effect.
The Lemming Effect?
If all your friends jumped off a bridge, would you, too? For many people the answer is yes. We are all social animals; we like to do what are friends are doing, and we like to associate ourselves with winners. (Hmmm, wait a minute, are people jumping off bridges winners?) So do people change their vote at the last minute, or, if undecided, base their decision entirely on which candidate they think will win? Just because everybody else is doing it? There's a good test case that may be evidence of this behavior: the 2008 elections in Montana.
In 2008, Montana voted for McCain and a Republican US House member, a Democratic US Senator, and five Democrats for statewide office, with margins ranging from 25 points in favor of the Republican to 45 points in favor of the Democrat. Fortunately, PPP polled all these races just before the election.
What we see is that for the five races with a margin less than 10 points, and the Senate race, the change in margin from the poll to the vote was 4 points or less. But for the Governor's race, the margin changed 7 points in the winning Democrat's favor, while in the Representative's race, the margin changed 7 points in the winning Republican's favor. In both cases, the loser underperformed the poll by a few points, while the winner overperformed by a few points. Clearly, it wasn't a matter of a more partisan crowd turning out to vote, one way or another, nor could the results be explained by all the undecideds voting for the winner; somebody had to change their minds, too.
The Governor, Representative, and Senator all won with more than 60% of the vote, but the incumbent Senator was running against an apparent whackadoodle nutcase. An incumbent effect could have been at play, except the margin for Secretary of State's race did not change at all, and it also featured an incumbent. We also cannot discount last minute events in the campaigns.
Another example is Maine. Poll averages showed Obama with 54% and Collins at 55%; Obama ended up with 58%, Collins 61%.
The Lemming Effect does seem to play a role.
The Apathetic Losing Voter
If you know that some of your preferred candidates - or perhaps even all the candidates you want to vote for - are going to be crushed by a lopsided 65-35 margin, are you really going to bother to vote? Maybe not. Even if you told some pollster you would. Conversely, if you're living in a Presidential swing state, you're being pressed to vote - 54% of Ohio voters said a Presidential campaign had contacted them personally in 2008. So if voters of the losing party end up not bothering to vote in states that aren't Swing States, we would expect these states, on average, to have lower voter turnout.
The census numbers do, indeed, show this, although the trend is very weak. So voters in the reddest and bluest states do sometimes end up sitting it out. Are they the voters for the losing team? Certainly it's consistent with the data, but we can't be sure of the relative strength of the Apathetic Voter Effect versus the Lemming Effect.
Moral of the story.
When you see polls out of New York or California showing Obama with barely 50% of the vote, don't panic. That's just how these things work. Polls underestimate the winner's margin in Presidential contests in highly partisan states, but because those states are so partisan, they easily predict the winner. It happened in 2004, it happened in 2008, and from the current polls, it looks like it's happening again.
Beyond the Margin of Error is a series exploring problems in polling other than random error, which is what the margin of error measures.
When Polls Fail, or Why Eilizabeth Warren Will Dash GOP Hopes. Why polls for close races for Governor and Senate are sometimes way off, and how to predict how far off they will be.