I'll admit that this is a bit of a shameless plug for my personal blog, at joshuaramirez.net, but I wanted to share a post I threw up last night that is pertinent to tomorrow's election. Look below the orange squiggly for the post in its entirety, or click this link to go to my journal and read it there (with additional photos and graphs!).
With only a couple days remaining until the 2012 Presidential Election, I am glad to announce that I am ready to publish my predictions regarding what will occur on November 6th! I have run a final series of 100,000 simulations, using state-level polling data, as well as some statistical wizardry and methodology, all of which I will explain below.
If you just want to know what the most likely scenario is in terms of the winning candidate, electoral vote breakdown, and exact vote numbers, skip to the bottom. But if you would like my methodology and analysis alongside those figures, read on!
The Simulator
At the core of my prediction, as well as my math, are the outputs from a simulator I have built slowly over the past four months. The simulator itself is relatively straightforward, written entirely in C++, and is very spartan by design. But it is also well-built to take in state-level polling, adjust polls for known bias, and then determine what would happen if an election were to run at that particular instance. Furthermore, it logs polls over time, as well as simulation results, giving me quite a bit of catalogued data.
In fact, I was able to take this data and plot the daily percentile results of each state election (when run through the simulator) on a time graph, which allowed me to track the volatility and direction of the data itself. This helped to establish a certain level of confidence in given average poll numbers on a state-by-state basis, and made for some interesting graphs to boot (see my original post for an example graph of Florida).
In terms of what I accounted for directly in my calculations, I only used state-level polling that was from polling firms which ran more than three polls, and were found to have a bias of less than 4.5% in a given direction (I put these rules in place to root out the institutions that had garbage results).
As far as what my simulator does not account for:
National polling. I do this purposefully, since national polling is uninformative and useless - EVs on a state-by-state basis elect the president, not the national vote. A good example of these two not correlating is the year 2000, where Gore received over half a million more votes (.5%) than Bush, but lost the Electoral College. So far, in fact, there have been four elections where the popular vote winner lost the election! Consequently, although national polling in aggregate can be a helpful indicator as to which candidate has momentum, the individual state polling is better in nearly every meaningful way.
Economic factors or conditions. I may love statistics, but I am only confident in analyzing the polling data, and I frankly have no idea how to take in something like job report numbers, and use those to affect the expected turnout of voters. Furthermore, I figure that anything that could change the dynamic of the race would effect the polling I am measuring anyhow. To that end, I leave analysis that includes other factors to people who are a bit better with this sort of thing than I can ever hope to be (cough Nate Silver cough).
Voting restrictions, Voter ID laws, reduced early voting, etc. I don't disclude these factors from my analysis out of some expectation that these things won't have an effect on the voting outcome. I disculde them because there isn't much precedent to them, and so there really isn't any reliable way to account for them that I can discern.
The Math
Here is what my simulator does, expressed in a simplified step-by-step logical path:
1. Accepts all polls for a given day for all states as input.
2. Adjust polls for bias, which is tabulated by taking into account factors such as the known bias in the previous election, the difference from the mean of all polls in the previous month, and others.
3. Derive a mean data point for all states, from all polls for a given day, given as (Obama%, Romney%). This is then weighed slightly against the data from the previous day to try to reduce rapid fluctuations and changes in the simulator results.
4. Run 100,000 simulations with the calculated vote share percentages, using the estimated votes to be cast (derived from 2010 Census data) to find actual vote numbers.
As you might expect, a lot of this math is rather tedious, and I won't bore with the specific formulas and such that I used, in part because I programmed them into my simulator and automated most of it anyway. But most everything I did follows SOP for any statistician: Collecting data points, establishing a mean and standard deviation and confidence interval, running repeatable trials against a normal model, and so forth.
The Results
In consideration of the fact that we are so close to the election cycle, and in particular due to the data points from pollsters moving in a uniformly inward direction (that is, polls are tracking towards the mean of all polls, for one reason or another), I decided to make today's simulation the last. Below is the chart expressing results of this simulation, excluding the vote percentages for states that are won by a gap of more than five points:
To help with interpreting the data: The darker-colored states on the chart are those that are, for all intents and purposes, already decided (In extremely infrequent instances, certain states that are categorized in this way could flip, but they would just add to a landslide, rather than act as the decisive state). With just these states, we see Obama start with a base of 253 electoral votes, and Romney with 191 out of the gate.
The lighter-colored states are contests that are within the margin of error for the aggregate of all polls, but are tinted blue or red to indicate the current leader. I have colored all the states so as to put them on one side or the other, but would point out that, although Romney is awarded Florida, he carries it by barely more than a tenth of one percent.
All 50 states (and D.C.) are listed in order from the biggest to smallest gap in votes between Obama and Romney, where an Obama surplus is expressed as positive and a Romney surplus is expressed as negative.
Taking these results at face value, we get the following map:
...and, as it turns out, this map was the most frequent one to crop up during the 100,000 simulations (appearing about 18% of the time)! For reference, the second most frequent map was identical but gave Florida to Obama; the third most frequent was identical but gave Virginia to Romney.
Also on the chart are two dividing lines on either side of Ohio, and the respective Electoral Vote tally for Obama if he was to win every state up to those lines. I added these to make it very clear that, indeed, this entire contest will come down to Ohio, since for Romney to win Ohio, he would (for the most part) have to win every state below it, and would reach 270 EVs if he did manage such a result.
To put it another way: It is extraordinarily implausible for Romney to lose a state like Florida or North Carolina, but win New Hampshire, given the linearity of the states (and polling data), relative to each other, as of late. Meaning that most every scenario in which he loses Ohio but wins the election is significantly unlikely.
However, the most important numbers on the chart are the two colored ones on either side of the dotted line, for they indicate what I have decided to declare as my prediction for the election: Obama will be awarded 303 Electoral Votes, to Romney's 235. The preceding election map is what I have determined will be the most likely to occur, though I do think that Florida, Colorado and Virginia could quite easily flip, although this would not change the overall winner.
In addition, I have found Obama to have a 89.846% chance of winning the election.
However - and this ought to be stressed - I am not saying that Obama is going to win, period. A 10% chance of a Romney victory is still quite a significant possibility, and to that end, complacency on the part of either political camp is quite unwarranted. A good ground game and GOTV effort could flip any of the battleground states, as well as torrents of money and advertising.
That being said, if I had to put a bet down on one candidate or the other, a second Obama term is looking like quite the favorite.