OK

This is only a Preview!

You must Publish this diary to make this visible to the public,
or click 'Edit Diary' to make further changes first.

Posting a Diary Entry

Daily Kos welcomes blog articles from readers, known as diaries. The Intro section to a diary should be about three paragraphs long, and is required. The body section is optional, as is the poll, which can have 1 to 15 choices. Descriptive tags are also required to help others find your diary by subject; please don't use "cute" tags.

When you're ready, scroll down below the tags and click Save & Preview. You can edit your diary after it's published by clicking Edit Diary. Polls cannot be edited once they are published.

If this is your first time creating a Diary since the Ajax upgrade, before you enter any text below, please press Ctrl-F5 and then hold down the Shift Key and press your browser's Reload button to refresh its cache with the new script files.

ATTENTION: READ THE RULES.

  1. One diary daily maximum.
  2. Substantive diaries only. If you don't have at least three solid, original paragraphs, you should probably post a comment in an Open Thread.
  3. No repetitive diaries. Take a moment to ensure your topic hasn't been blogged (you can search for Stories and Diaries that already cover this topic), though fresh original analysis is always welcome.
  4. Use the "Body" textbox if your diary entry is longer than three paragraphs.
  5. Any images in your posts must be hosted by an approved image hosting service (one of: imageshack.us, photobucket.com, flickr.com, smugmug.com, allyoucanupload.com, picturetrail.com, mac.com, webshots.com, editgrid.com).
  6. Copying and pasting entire copyrighted works is prohibited. If you do quote something, keep it brief, always provide a link to the original source, and use the <blockquote> tags to clearly identify the quoted material. Violating this rule is grounds for immediate banning.
  7. Be civil. Do not "call out" other users by name in diary titles. Do not use profanity in diary titles. Don't write diaries whose main purpose is to deliberately inflame.
For the complete list of DailyKos diary guidelines, please click here.

Please begin with an informative title:

Whenever we see a poll, we see a margin of error, or confidence interval. These are always wrong. They are wrong, even if there are only two candidates, and they are even more wrong if there are more than two candidates.  But they are simple.

The truth is complicated.

This complication exists even if we assume that the sample is a perfectly random sample of the population of voters. This assumption is ludicrous, but without it, things get really hairy. In fact, the truth is more complicated than this diary makes it out to be

If you have only two candidates then the results follow what is known as a binomial distribution. If you have more than two they follow what is known as a multinomial distribution. "Distribution" is itself a statistical term. It means an assignment of probability to each possible outcome; in this case, the proportion of the vote a candidate will get. In sampling, we try to estimate a population distribution from a sample distribution. Of course, our estimate isn't perfect, but, again assuming it's random, we can estimate how badly off it might be.

There are a few problems with the way margins of error (MoE) are usually presented in polls.

First, we interpret them wrongly.  Even if we used the right MoE (see below) our interpretation is off.  A confidence interval (CI) is given by the estimate plus or minus the MoE. The correct interpretation of a 95% confidence interval is that, if the population value was X, 95% of the time, the sample value would be in the 95%CI.  What we usually assume is that, since the sample estimate is XXX, we can be 95% sure that the population value is within the 95% CI.  That's wrong.  This interpretation is VERY common; I've even fallen into it myself.

A second wrong interpretation is that we assume either a) That all values within the CI are equally likely or b) That values outside the CI are impossible.  Neither is correct. If our poll estimates that 52% will vote for Joe Shmo, then the most likely result is 52%; the farther you go from 52%, the less likely. The likelihood of any particular result is given by the likelihood function - and ANY result from 0 to 100 is possible, it's just that when you get far from 52%, they are very unlikely.  (You COULD flip a fair coin 100 times and get 100 heads; it's not LIKELY, but it's POSSIBLE).

But we also give the wrong MoE, because we give a single MoE for each poll, and that's not right. The classical formula for a 95% MoE is

1.96*(pq/n)^.5,  

where p is the proportion saying something, q = 1-p and n is sample size.

This is approximately accurate, and the approximation is pretty good for results from polls where n is usually pretty big and we aren't interested in very rare events. It doesn't work well for estimating very rare things, like prevalence of rare diseases, but it's OK for polls.  But it gives a different MoE for each candidate.  But when there are two candidates who get all (or almost all) of the votes, then this difference doesn't matter too much. For example, if we poll 400 people and 60% say they will vote for Obama, 35% for Bachmann (should she be the Repub. nominee) and 5% for someone else, then the MoE for these three are
Obama  4.88%
Bachmann 4.78%

But the pollsters like to give ONE MoE, so they use an even simpler formula:
0.98/n^.5; this is only exactly correct if p = .5

For the above, it would give
Obama  4.9%
Bachmann 4.9%

not far off.

But what if we are polling a primary?  A recent Iowa poll of 500 Repubs gave these results

Bachmann 25%
Romney 21%
Pawlenty 9%
Cain 9%
Paul 6%
Gingrich 4%
Santorum 2%
Huntsman 1%

It said the MoE was 4.4%; that uses the simple formula .98/n^.5. But the right ones, with the formula 1.96*(pq/n)^.5  are different for each candidate and they are

Bachmann 3.8%
Romney 3.6%
Pawlenty 2.5%
Cain 2.5%
Paul 2.1%
Gingrich 1.7%
Santorum 1.2%
Huntsman 0.9%

There are still problems with Huntsman's, but these are much more reasonable figures. They are asymptotically accurate.

Intro

You must enter an Intro for your Diary Entry between 300 and 1150 characters long (that's approximately 50-175 words without any html or formatting markup).

Extended (Optional)

EMAIL TO A FRIEND X
Your Email has been sent.