There are three kinds of lies: lies, damned lies, and statistics - Benjamin Disraeli
"Charlie the Congressman is up 3 points in North Carolina in a poll with a margin of error of 3 points. So it's a statistical dead heat" - Patty the Pundit
Statistics is tricky business. As a field it's rooted in very advanced math, and it is hard to distill down to ten words for the corporate media anchors and journalists. Margin of error is one statistic that is almost never correctly interpreted. For example, if I were to place a bet on Charlie the Congressman (haha) winning because of that single poll, which is inside the margin of error, I would win 3/4 of the time.
I'll attempt to explain margin of error a bit more (with a chart) after the fold.
There are two different kinds of error in any poll, systematic and statistical. Systematic errors occur because of some choice or problem with the poll creating a (undesired) bias in the poll. Examples of systematic effects include undersampling cell phone only households, the (purported) Bradley effect, bad questions that lead the person polled in a specific direction, incorrect sampling of certain demographics, incorrect likely voter models etc. There's no good way to really address these kinds of systematic errors when looking at a single poll or pollster. Realistically we hope to get a sense of these systematic errors by comparing different pollsters, which is something sites like Sam Wang's website, the Princeton Election Consortium and Nate Silver's FiveThirtyEight partially address. This is of course not a valid strategy to remove systematic errors if all the pollsters are making the same consistent mistake (like undersampling cell phone voters or the Bradley effect this election cycle)
Statistical errors are a different beast. They arise purely out of "randomness". An example would be if I was randomly dialing phone numbers from the Chicago phone book, what are the chances I dial two Asian families in a row. Based on the percentage of Asians in the Chicago area, it is possible to calculate the chance of that happening.
In a typical poll, somewhere between 400 and 1000 people are sampled. Simply because there's a chance certain demographics are over and under sampled when choosing 400 people randomly, there is going to be a statistical error on any response by that group. This statistical error is something that can be calculated and depends entirely on sample size and the chosen sampling technique. Effectively, the statistical error is the difference in the answer to my polling question between my chosen sample and the population of voters.
Pollsters typically combine their systematic (or what they believe to be their systematic anyway) errors with the statistical errors and report a single number for margin of error. This is the plus or minus three points you see on most polls.
The great thing is, knowing the results of a poll, it's easy to predict the chances of an upset (person behind in the polls wins). I've produced a simple chart that makes the process easy.
Reading off the chart, a 3% lead with in a poll a 3% margin of error leads to a ratio (lead / MoE ) = (3.0 / 3.0) = 1. Reading 1.0 off the horizontal axis, the chance of an upset is roughly 16%. Clearly not a statistical tie. Even a 1% lead in a poll with a 3% margin of error makes the chance an upset 37%, which would be good enough for betting odds if it was consistent across more than three states.
In the case of combining polls, like the poll aggregators do, only the statistical part of the errors is reduced in a mathematically predictable way. Systematic error is more complicated and needs to be dealt with on a case by case basis, which is the reason Nate Silver does all the fancy reweighting and adjusting for pollster specific bias etc. If you believe his way of dealing with systematics, the numbers currently are
OH :: Obama + 3.5 (MOE 3.6)
PA :: Obama + 8.5 (MOE 3.6)
FL :: Obama + 2.1 (MOE 3.6)
VA :: Obama + 6.2 (MOE 3.6)
Looking things up on the chart after dividing the lead by the margin of error, the probabilities of an upset (McCain wins the state) by state are
OH :: ~ 16 %
PA :: ~ 3 %
FL :: ~ 28 %
VA :: ~ 4 %
If I was a betting man, I would be feeling pretty good about now, since McCain probably needs all four of these states to win. Of course this depends on all kinds of contingencies like people actually voting in the expected amounts. So get out and vote, it's time to destroy the conservative movement for a generation.
I hope this is useful. This is my first diary at DailyKos. If you have any questions as to my methodology let me know.