Skip to main content

The concept of margin of error has been much discussed here in the last few days since clearly the morons who babble on TV don't get it.  Now I know must people don't want to delve into the dark arts of the Integral Calculus, but TPM has a nice little explanation that makes it very clear what this is all about.

Margin of Error
In public opinion surveying, the margin of error refers to the expected range of variation in a poll, if it were to be conducted multiple times under the same procedures. This is not necessarily indicative of a true result, but establishes a statistical average within which most polls will line up. As an example, if a Democratic candidate has 47% in a poll and the Republican 45%, with a ±4% margin of error, the poll could potentially show the Democrat as high as 51% or as low as 43%, and the Republican ranging from 41%-49%, if the poll were repeated. In statistical terms, the margin of error is typically established for a 95% confidence interval, meaning that the officially listed variation will occur in 19 out of 20 times that a poll is conducted -- but also that in 1 of 20 cases, a pollster will produce a result that lies far outside the norm.
Your Email has been sent.
You must add at least one tag to this diary before publishing it.

Add keywords that describe this diary. Separate multiple keywords with commas.
Tagging tips - Search For Tags - Browse For Tags


More Tagging tips:

A tag is a way to search for this diary. If someone is searching for "Barack Obama," is this a diary they'd be trying to find?

Use a person's full name, without any title. Senator Obama may become President Obama, and Michelle Obama might run for office.

If your diary covers an election or elected official, use election tags, which are generally the state abbreviation followed by the office. CA-01 is the first district House seat. CA-Sen covers both senate races. NY-GOV covers the New York governor's race.

Tags do not compound: that is, "education reform" is a completely different tag from "education". A tag like "reform" alone is probably not meaningful.

Consider if one or more of these tags fits your diary: Civil Rights, Community, Congress, Culture, Economy, Education, Elections, Energy, Environment, Health Care, International, Labor, Law, Media, Meta, National Security, Science, Transportation, or White House. If your diary is specific to a state, consider adding the state (California, Texas, etc). Keep in mind, though, that there are many wonderful and important diaries that don't fit in any of these tags. Don't worry if yours doesn't.

You can add a private note to this diary when hotlisting it:
Are you sure you want to remove this diary from your hotlist?
Are you sure you want to remove your recommendation? You can only recommend a diary once, so you will not be able to re-recommend it afterwards.
Rescue this diary, and add a note:
Are you sure you want to remove this diary from Rescue?
Choose where to republish this diary. The diary will be added to the queue for that group. Publish it from the queue to make it appear.

You must be a member of a group to use this feature.

Add a quick update to your diary without changing the diary itself:
Are you sure you want to remove this diary?
(The diary will be removed from the site and returned to your drafts for further editing.)
(The diary will be removed.)
Are you sure you want to save these changes to the published diary?

Comment Preferences

  •  Correct. And I hate the way they call races tied (3+ / 0-)
    Recommended by:
    sparkysgal, jamess, greblos

    when a candidate has a lead within the margin of error.

    48 to 48 is tied in a survey.

    50 t0 46 is a 4 point lead within the margin of error.

    •  there are journalist guidelines that CNN violated (0+ / 0-)

      basically, they say that if you're within one MOE then it's a toss-up, if you're greater than 1 MOE but less than 2 MOEs (since the MOE applies to both numbers) then the candidate leads but within the MOE, and if it's outside two MOEs the candidate is leading.

    •  It is especially bad (0+ / 0-)

      When you have several polls with 0-4 point leads all "within the margin of error" but no polls in the negative regions...and you claim the race is "essentially tied"

      Margin of error assumes noise in both directions.  That's why averaging polls has some value in REDUCING the margin of error.

      It's not as good as taking a larger sample size because of differences in polling questions, who can be reached by the poll, blah blah.  But it is better than treating them as if they're all independent events.

      Averaging trends is better, because within a given poll, its methodology should be consistent and that eliminates most of the "baseline" variation.   So if several polls show an average of a 2% rise for a candidate, you can be pretty sure things are trending in his direction, even if the MOE is an average of 4% on each individual poll, because the MOE on the TREND for ALL the polls considered as a single sampling universe will be a lot less than 2%, even though individually it is 4%

  •  so (0+ / 0-)

    in the example given in the post,

    is that 1 in 20 case, what would be called an "Outlier"?

    And what values would an Outlier case have, relative to the "high confidence" typical values?

    Also how "freely" can the typical excepted values "vary" in the other 19 cases?  
    ie, can they flip places, if you kept repeating the same poll?

    thanks for the help here.

    Are you ready to Vote? Are you still 'allowed' to Vote?
    -- Are you sure?

    by jamess on Sat Oct 27, 2012 at 07:11:18 AM PDT

    •  ... (0+ / 0-)

      An outlier could be the 1 in 20, or it could be a flawed survey. Margin of error assumes that things were properly done and any error is only random.

      A proper distribution is depicted by the area under a bell curve. If you flip a coin 500 times, the most likely result is 250 heads. 251 or 249 heads is slightly lower on the curve, etc. The area under the curve from 228 through 272 heads will cover 95% of the probable outcomes. The chance of getting 228 is much lower than 250. In the chart below, the 0 line is 250 and the dark blue area would be <228 and >272. The higher the curve, the higher the probability of getting that particular result.

      95% is the usual range, but there is nothing magic about that number. The margin of error moves with the chosen confidence interval. You can look at it in the opposite direction, and determine the confidence based on the result. If your Ohio poll has Obama up 2, you can say that there is a 75% chance that Obama is truly in the lead. That's basically what 538 is doing with his projections.

      Disclaimer: If the above comment can possibly be construed as snark, it probably is.

      by grubber on Sat Oct 27, 2012 at 09:17:10 AM PDT

      [ Parent ]

  •  Most polling uses a 95% confidence interval (0+ / 0-)

    which means by definition that '1 in 20" (ie 5%) result is an outlier.

    In the other 19 cases, that's the MOE they're talking about.

    The narrower the MOE, the narrower results can cluster and still be in 95% confidence interval.

    HOW they cluster depends on the kind of data.  There are two common clusterings you see a lot in physics and in human behavior.

    One is the "bell curve" where it centers around the average value, ie if the score is 0 with a MOE of 2, with 1000 data points, you'd expect about 50 results to fall outside of -2 or +2, but you would also expect a hell of a lot more results to be exactly zero than -1.5 or +1.5.

    This is also called a "normal distribution" by statistics geeks.

    Probably the next most common in my experience is the Log Normal distribution.  This has a big hump near zero (or whatever the lower bound is) and a long tail.  If you imagine a cartoon rat, with pointy nose toward zero, back arched, and a long tail out flat.   Again, with 1000 samples you'd get 50ish way off to the right, with the bulk near zero but not exactly zero.

    The third most common is the pareto, usually used for exponential functions or, in business-speak "where only a few things really matter and we're trying to find out which from a large list"

    Political campaigns will often try to use polling to figure out what those things are, by asking voters 10ish questions about what matters most to them.  Unfortunately, unlike failure modes of a disc drive, people have opinions based on a cluster of things and their responses will vary based on random events in their own recent life and how the question is answered.   So a pareto chart is more helpful in a lot of business and engineering contexts than it is in polling political motivations.

    Anyway, Pareto charts are usually expressed as bars with each bar labeled as something a human can understand, and the biggest bar on the left.  If it really is an exponential function, if you have 20 things you are considering, the first bar will be a lot bigger than the second, the second a lot bigger than the third (but a little less than the difference in the first two, and there will be a long, long tail of small values (think of this like a children's slide, with the first part steep but fairly rapidly flattening out)

    Data can also not fall into any of these distributions, being fairly evenly distributed.  That tends to lead to very wide confidence intervals though.  Polling something like a presidential race is most likely to be a normal distribution.  Polling "what matters most" will either be no distribution or a Pareto.   Polling something like 'odds that Fred will win", over a long period of time with the same sample (like the RAND company poll) might look more like a log normal distribution if the data is organized so zero means 100% chance that the favored candidate by the sample set is going to win.

Subscribe or Donate to support Daily Kos.

Click here for the mobile view of the site