There's a lot of confusion about MOE and polling.  People using words like 'tied' when the difference is within the MOE, or saying the difference is only significant if it's bigger than the MOE, or bigger than two MOE, or all sorts of things.  But, what we're really interested in is not the MOE.  It's who's going to win, and, in non-winner-take-all races (like the primaries) by how much.

We can estimate this for any combination of percentages and sample sizes.

More below the fold

When we're estimating two proportions, it's called a binomial, and if it's more than two proportions, a multinomial.  If we look at, say, Obama and Clinton, then all the polls have an 'undecided'; some have other, as well. We can make this three categories: Obama, Clinton, Other. A recent Gallup poll had Obama 45, Clinton 44, leaving 11 for other.  I don't see the sample size, but let's guess it was 500.  So, let's simulate 100,000 replications of a multinomial of 500 people with those proportions.

What do we care about?  Who wins.  Well, Obama is ahead of Clinton 58.95% of the time; Clinton is ahead of Obama 39.58% and they get exactly equal numbers 1.83% of the time.  Since, in the real election, there are so many voters that a tie is almost impossible, lets assign those 1.8% in proportion, and find that Obama wins 59.8% of the time and Clinton 40.2% of the time.

Now, how does sample size affect this? Well, suppose the same results were based on samples of 300.  Then we simulate Obama winning 56.36% of the time, Clinton 41.24%, and a tie 2.40%.  Let's do some others.  Remember, all these are for results of Obama 45, Clinton 44, other 11

Obama wins         Clinton wins        tie
100                      52.06%              44.05%            3.89%
300                      56.36               41.24            2.40
500                      58.95               39.58            1.83
1000                     62.48               36.26            1.25

We also care about the margin of victory.  Let's say we're curious about whether either would win with a margin of 5% or more.

Obama + 5 or more     Clinton +5 or more
100                        32.5%                     25.4%
300                        22.5                      13.0
500                        16.9                       7.7
1000                        8.8                       2.1

Notice how, with larger samples, the chances of being way off decline.

I can give you these for any combination of sample size and polling results.

Of course, that all assumes the polls are perfectly done.  (Yeah, like that's gonna happen!

#### Comment Preferences

• ##### Tip jar(22+ / 0-)

I'll be in and out all night

• ##### Did you know there are three types of people(4+ / 0-)

in the world?

Those who can count, and those that can't.

Ok, this is pretty technical.  Is this based on a monticarlo simulation?  or is this strictly MOE calculations?

This world is broken, I want a new one.

• ##### Montecarlo simulation(5+ / 0-)

ironically, if computers had been invented before Fisher was born, nearly all statistics would be done this way, and no one would have heard of margin of error.

• ##### 20 years ago(5+ / 0-)

I taught my FIL (professor in Psyc) how to do a Pearson's r using excel.  He was flabbergasted that it could be done that fast.  He had grad students do it by hand for all his studies before then.

This world is broken, I want a new one.

[ Parent ]

• ##### Math. . . .(3+ / 0-)
Recommended by:
jimraff, plf515, anotherdemocrat
• ##### What about 95% confidence intervals?(1+ / 0-)
Recommended by:
plf515

It was my understanding that when a given poll's margin of error is quoted, say at +/-4%, it means the probabilities are 95% that the true distribution (breakdown one way vs the other) within the total population will be within +/-4% of the smaller number actually sampled (i.e who participated in the polling).

Of course, this assumes a perfectly representative sample, which you can usually assume with e.g. coin flips, dice throws, and to a close approximation with a well-shuffled card deck.  With political polling, however, a significant potential source of error is that you are forced to make assumptions about the extent to which you are able to obtain a sample of people who are accurately representative of those who will vote on election day.  The most infamous example of this is the Liberty Magazine poll taken before the 1936 Presidential election which predicted a high probability of a lopsided win by Alf Landon over Franklin D. Roosevelt, failing to make any statistical adjustments for the fact that the magazine's readership was overwheilmingly republican.

We also have to consider the fact that statistical methods effectively assume a perfectly flipped coin, an adequately shuffled deck so that taking a statistical sample is not skewed by the way the sample is taken (i.e. the way polling questions are asked).  This is far tricker to do with political polling.

• ##### The thing is that the 95% confidence interval(3+ / 0-)
Recommended by:
cfk, anotherdemocrat, Neon Vincent

is not what we are likely to be interested in, when it comes to polls.  The reason it is so universally quoted is (more or less) an accident. Ronald Fisher, who invented a lot of this stuff, was compromising because he had no computers.

• ##### Reminds me of graduate stat classes (ugh)(4+ / 0-)

...but a heckuva lot easier to understand. You've got a real gift here, plf515, and I appreciate your willingness to share it.

• ##### I think maybe(1+ / 0-)
Recommended by:
plf515

I might understand this. I'm always iffy about numbers, but this does seem to make sense. Great diary, as usual.

Hill Country Ride for AIDS my HCRA Page