Although I'm no expert on political polling specifically, I am somewhat of a professional expert on developing scientific simulation models, cross-validating them using published empirical data, and recalibrating the model's parameters when its answers are divergent from the evidence.
Conceptually this process is identical whether you are simulating hurricane, trajectories, the long-term outcomes of disease, or the results of political contests. The main difference between the fields is the source of the validation data. I have not conducted a thorough evaluation of the polls (and unfortunately don’t have the time or the budget to do so), but I think what we saw in New Hampshire resulted not from last minute changes in the electorate, but from uncertainties in modeling the voter pool, and the all too familiar tendency to second guess your model and follow the herd.
Speaking of herds, follow me below the fold . . .
Let me briefly sketch out the three basic steps of predictive modeling and try to describe how errors flaws in these steps across pollsters compounded themselves and resulted in incorrect results.
1. Model development – In this stage you try to design a model of reality. Any model is going to be wrong to some extent as it’s a simplification of a complex system, but you attempt to capture all the major elements of the system that seriously influence the outcome. In terms of an election poll the model building involves creating the sample that you think is going to show up at the polls on Election Day. In a perfect world (for pollsters) everyone eligible would vote, and thus a simple random sample of the population would generate poll results within two standard deviations of the true population results 95% off the time. The problem of course is that not everybody votes, so a simple sample of the population will not result in an estimate of the preferences of the people who do vote.
For that, people build ‘A model’ of the electorate. That is, they make educated guesses about which type of people will be voting on Election Day. They then sample these subgroups and weight their preferences according to the probability of participating in the election. The estimate is then equal to the sum of the individual preferences of the subgroups multiplied by the proportion of the electorate each of these groups is likely to comprise. A big question is where do these educated guesses come from? Two sources, first, historical data, and second current polling data.
As has been stated ad nausea, this election is a very different animal from previous elections for a few big reasons;
- two elections are taking place simultaneously and participation in one excludes participation in the other
- both elections were competitive
- a major candidate in both races relied heavily on crossover independents in order to win, and offered an attractive voting option to independents
- poll driven success of one candidate might lead to increased voting day participation in the election of the opposite party.
These three conditions, the large number of independents in the state, and the lack of any historical benchmark election to compare this election to, seriously inhibits my ability to think up an appropriate turnout model for each election. In fact, thinking of all these conditions, its hard to fault the pollsters for getting it wrong. What is their fault is to produce any numbers at all without underscoring the hugely speculative nature of the result.
2. Cross-validation – What do you do personally when you face a great deal of uncertainty about a decision? If you’re like most people you’ll look for previous examples and look to the opinions of other people to try to verify that you’re doing the right thing. Mind you, you might not do what everyone else did or what people did before, but you’ll feel better about your decision if its at least in tune with some external evidence, and the greater your uncertainty probably the more likely you are to trust historical experience and what other people are saying rather than just roll the dice with your own uncertain guess in a vacuum.
Consider what the validation data consisted of in this election.
- First, you have the strong and recently salient example of a post-Iowa bounce. What is supposed to happen after Iowa? Well, the candidate is supposed to get a bounce.
- Second, you have the alternative evidence of other people’s polls. You might think your polling method is the best, but seriously at this point all polling takes place in the context of a team, and you probably can’t vouch for every thought of every team member. You might expect your results to be better/worse than other team’s by a few points, but I guarantee you, you will be nervous if you’re five to ten points off. You will be REALLY nervous if your results give an entirely different answer than everyone else’s.
Now think about the situation you’d be in if you were a pollster with a strong model that predicted the correct result in Iowa – A Clinton victory.
- You would know that even given your best intentions and best abilities, given the conditions of the election your model assumptions are little better than hunches (in this case your hunches would be correct, but you wouldn’t know that prior to voting. You’d just know they were hunches.)
- Your results would be in conflict with your expectations based on historical norms which would predict a big Obama bounce after Election Day.
- Your results would be giving a different result than the other polls published that day in terms of who was going to win.
- Quantitatively, your results would be way, way off other estimates (in the realm of 5 to 10 points) to a degree not explained by sampling error but that could only be explained by modeling error of either you or the modeler from the other poll.
If the other poll was only a little different and there was only one other poll you’d probably be tempted to stick to your guns. However, as more polls contradicted your findings, you would be very, very tempted to go back and reassess your model. Which brings us to step 3.
3. Calibration – In calibration you essentially rethink the parameters of your model and consider how if you set them up slightly differently it might result in a different answer. This sounds like cheating, but really its not. In science, there often are not all that many parameters with uncertainty in them and when you reflect on your model you find that you really could think about something differently in a way that makes sense.
The problem with the New Hampshire model is that there were a very large number of parameters that could be altered given the underlying uncertainty of the model of who was going to turn out. The answer to the polling problem here is really that there is no answer. The "correct" answer would have been to realize the complexity of the voting mix and note that the uncertainty of the model overwhelmed its ability to distinguish between Obama and Clinton.
A sign of a strong result in this election would be to find the same results under a lot of different conditions and assumptions. If you got the same winner when you predicted that 30% of the independents voted in the Democratic primary as when you predicted 60% would, that would be a strong result.
My hunch here is that in reality you could get much different results based on different mixes of your key parameters.
So what do I think happened here? I think the election started with a series of polls that with a few with the correct assumptions, and most with some error in their assumptions, with all of these assumptions being closer to hunches than evidence. Over the week, people recalibrated their models to conform with historical trends (an expected Obama bounce), and with other polls that reflected those historical trends.
I recall there was one early CNN poll that had the race as a tie. For future races, I’d like to emphasize that although this was a result in congruence with the eventual reality, it was not the right answer. The only correct answer for pollsters to make about this race was that the likely voter model contained too many uncertainties to distinguish between Obama and Clinton. That’s hard and difficult to say in the heat of the moment and it won’t sell ad space, but that was the right answer.