Another way of looking at polling

by bobmul

Community

(This content is not subject to review by Daily Kos staff prior to publication.)

Monday, Sep. 24, 2012 Monday, Sep. 24, 2012 at 8:02:29am PDT

Generally, if there are a lot of polls whose results vary, taking their average gives a better picture of the state of affairs. In the current circumstances, however, the picture is fuzzy - Obama is leading by about 3 percent, a number that allows Romney and Republicans to say that the race is very close. Here is a different way of looking at what the polls tell us. Imagine that the real circumstance is that exactly 50% of voters are for Obama and 50% for Romney. Then, if you do 1 poll and the sampling is not perfect, there is a 50-50 chance either candidate will appear to be ahead. Now do 2 polls. What is the chance that both will favor one candidate? Well, there are 4 possible outcomes which we can represent as: OO, OR, RO, RR. So, in 2 of the 4 cases, the 2 polls will agree. What about 3 polls? We can still write out all the results: OOO, OOR, ORO, ROO, ORR, ROR, RRO, RRR. With 8 outcomes only 2 are in complete alignment.

OK - what about N polls? The rule is that there are 2^N (2 to the Nth power) outcomes, but still only 2 outcomes in which all N polls are in full agreement. For a numerical example, consider 10 polls. 2^10 = 1024 or about a thousand. Then, only 2/1000 of the outcomes will say that one candidate or the other leads all the polls. This is 1 chance in 500 or 0.002 or 0.2%, a very low probability.

There is an even more interesting associated case, namely, an outcome in which Obama comes out ahead in every poll or Romney does so. (This is the difference between a one-tailed and two-tailed statistical test.) If we focus on the outcome where there are 10/10 polls in Obama's favor, the chance is 1/1000 or 0.1%.

Let's say we actually observe the OOOOOOOOOO outcome. What do we conclude? Well, there are two competing inferences. In one, we continue to assume that the voting population is split 50-50 and decide something really unlikely happened. In the other, we abandon the original hypothesis and reason that if all the polls come out one way the voting population must contain more people favoring Obama.

And now for a little reality. In the RealClearPolitics list of presidential polls including today's (9/24) Rasmussen and yesterday's (9/23) Gallup trackers (writing this at 10:45AM EDT) going back to the start of September, 20/20 indicate that Obama is ahead. Is this likely? Not very - 2^20 is about 1,000,000 so the chance of the outcome (given equal numbers of Obama and Romney voters) is 0.000001. Since, in most statistical testing, the "null hypothesis" that leads to the probability estimate is rejected if the probability is less that 0.05 or 0.01, it is virtually certain that there are currently (from the start of Sept to now) more people in favor of Obama.

How do you calculate the odds if the outcomes are not in full agreement - if, say, 15/18 polls favor Obama and 3/18 favor Romney. In short, the method is to use the binomial distribution which some of you may have been brain-washed about in high school. Importantly, there are binomial calculators everywhere including in Excel.

Here is one: http://stattrek.com/...

Using this for the 15/18 case and selecting from the calculator choices the one for P(X ≥ 15) gives a result of 0.0038. If we saw this result we would again reject the idea that the numbers of Obama and Romney voters are equal.