Long-time readers may recall when I unveiled the first Meta-Analysis of State Polls during the 2004 campaign. It was one of the first analyses of its kind. The response was overwhelming. The act of reducing the vagaries of state polls to a single number, the Median Electoral Vote Estimate, attracted hundreds of thousands of readers and coverage by The Wall Street Journal and even Fox News.
Now we’re back as election.princeton.edu, or the Princeton Election Consortium. It’s automated and safeguarded against biases. It has information others cannot provide, even Poblano. As November approaches I will feature Princeton colleagues’ research that pertains to the election - and perhaps entice a few of them to guest-blog.
So...Does the onslaught of state-level polls have you dizzy? Do outliers leave you distracted? Have you ever wanted a way to reduce the confusion? Then read on.
Individual polls have problems. First, they sample only a small number of likely voters. Second, pollsters have to devise strategies to identify who the likely voters are. Third, polls are done in advance of the election, and opinion can shift by Election Day.
Meta-analysis of state polls can help with the first two problems. The basic idea is to use multiple recent polls (courtesy of Mark Blumenthal at Pollster.com) to determine what is happening in individual states, then use the results to make an overall estimate of who would win the Electoral College if the election were held today. Such an approach combines the collected judgement of all pollsters, thus taking advantage of the wisdom of crowds. The result is a very accurate current snapshot of current polls. To put it quantitatively: The Meta-Analysis presented at election.princeton.edu is at least four times more accurate than any national poll, including the Pollster.com and RealClearPolitics averages. It also returns a result in the only units that matter: electoral votes.
This approach resembles other sites you may already know about. Many of you are fans of Poblano’s excellent site, fivethirtyeight.com, which combines data with his own detailed methodological judgements. There is also electoral-vote.com, an Ur-polling site. Purists may prefer Pollster.com or RealClearPolitics for straight-up polling numbers. What I provide is an approach that is purely poll-based, but simpler to follow than a river of data.
And now, answers to some questions you might be asking.
What happens when you are not sure who will win a state?
Based on a direct data feed from Pollster.com, a probability is calculated for every state based on recent polls. Here is an example from today's data:
On any given day, some states are neither pure blue (Obama probability > 95%) or pure red (McCain probability > 95%). I take this uncertainty into account by calculating the distribution of probabilities for 0 to 538 electoral votes for each candidate for all possible combinations of states. The median of this distribution is the Electoral Vote (EV) Estimator. The current results are Obama 330, McCain 208.
I’ve heard that there are over 2 quadrillion possibilities. How many of them do you account for?
Every single one. In my analysis I treat all 50 states and the District of Columbia as going either way. That gives 2^51 permutations - 2,251,799,813,685,248, to be precise.
That’s a lot. You can’t possibly go through them one by one!
Correct. I don’t take a brute force approach, which at a rate of checking a million permutations per second would take over 70 years. Instead I use a math trick in the form of a polynomial that gives the exact probability corresponding to a particular number of electoral votes. This is equivalent to evaluating all possibilities, but takes less than one second.
Why don’t other analysts do that?
Good question. Our brains naturally think in terms of concrete, individual possible outcomes. So the first approach a hobbyist might take is to run a simulated election thousands of times. This is not as accurate as my approach. Perhaps it hasn’t occurred to others to take the polynomial shortcut, which is made possible by the fact that the Electoral College follows a relatively simple system in which EV are added up. For example, Poblano works on sabermetrics and fantasy baseball, where a lot of enjoyment comes from thinking about the outcomes of individual simulations. I am unsentimental; I just want the exact answer. (Note that I've patented the Meta-Analysis, mainly in case someone else such as a traditional media organization wants to make money from it.)
Some people can’t get enough of all that detail, and some want the most accurate bottom line. I think there’s room for both approaches.
I don’t understand all this math. Does your analysis give anything that would be useful to me?
Yes! The Meta-Analysis gives you three useful results.
- It gives an estimate that is extremely precise. At any moment it uses about 140 state-level polls - and only state-level polls. It reduces them all to a single electoral vote estimate. In principle, you no longer have to worry about what an individual poll means.
- The estimate is precise enough that tracking it over time gives an excellent barometer of how the campaign is going. Do you want to know how much Obama’s foreign trip affected the standings? Apparently, very little. See this graph:
- The calculation allows some cool tricks. For instance, it can be used to assign a value to your vote. As of today, a Virginia voter has about 16 times as much power as I do in New Jersey in influencing the overall outcome. In other words, that person’s vote is worth 16 jerseyvotes, and might be well worth going after.
I’m a math geek. What’s in it for me?
The methods are completely transparent. The data and programs are available for download and inspection. You can even do your own calculations if you know the MATLAB programming environment. Hey, didn’t you always want to learn MATLAB? Now you have a reason.
Ahem. Didn’t you predict a Kerry victory in 2004, Professor Smartypants?
Actually, the method was fine, but I did make a serious error. In the closing weeks of the campaign, I assumed that undecided voters would vote against the incumbent, a tendency that had been noticed in previous pre-election polls. Compensating for the "incumbent rule" had the effect of putting a thumb on the scales, lightly - but unmistakably - biasing the outcome.
Leaving out this assumption, the prediction in 2004 was exactly correct: Bush 286 EV, Kerry 252 EV. In retrospect, it’s clear that the incumbent rule is subjective and cannot be relied upon. You can read about the confirmation of the prediction in the The Wall Street Journal (pre-election story here).
I won’t be repeating that error. I decided that the best approach for 2008 is to suppress my own biases and stay as close to real data as possible. This year the analysis is being kept simple and transparent. Data and code for doing the calculations are freely available. Anyone can check my results.
And with that...I present election.princeton.edu.