Statistics is complicated and often a misunderstood and misrepresented science. In fact, statistics can seem so confusing and convoluted that when I remember when I was in engineering school, my peers and I would call the class: Probabilities and Sadistics.
Sadly, because statistics is so complicated, journalists and talking heads often incorrectly represent a political poll.
I am going to keep this Diary as simple as possible so as not to confuse or lose any readers and I will try to not go outside of the scope of this Diary.
Example:
Let's say you poll 1,000 people and ask who they will vote for in November for President and 500 say they'll vote for Obama, while 450 say they'll vote for Romney.
Since:
500/1000 = 50%
450/1000 = 45%
Would you think that Obama leads 50% to 45%? Would you think Romney is down 5 points?
If you said, "yes" to either, then you are not correct. Now you might ask, why?
For one thing, we don't know who makes up the sample. In other words, in order to have a good sample of the electorate, you would need to make sure you have a "representative" sample of the general voting population.
Hopefully, no one reading this would expect any political polling sample to be absolutely perfect, mostly because it is impossible to know the exact makeup of the voting population until the election is over.
This time, for the example I gave above, let's assume you have a good, representative sample, you know, one that matches the voting population as best as can be imagined.
Again, your sample is 1,000 people where 500 say they're going to vote for President Obama and 450 say they're going to vote for Romney.
President Obama = 50%
Romney = 45%
So does that mean Obama leads 50% to 45% and Romney is down 5 points? umm ... not exactly.
You see,
if, and only
if, the 1000 people you polled were the
entire voting population;
then, and only then, would it mean President Obama leads 50% to 45% and Romney is down 5 points.
Meaning: your 1000 person sample is still just a representation of the voting population and this is where the Margin of Error becomes very important and cannot be ignored.
Because your sample could never create an absolutely perfect representation of the voting population, there will always be some sort of sampling error.
The Margin of Error characterizes the random sampling error in a survey. The Margin of Error is calculated using the standard deviation.
Keeping it simple, the larger the Margin of Error, the less likely it is the poll results represent the population and therefore, the less confident you should be in the results of the poll.
The Margin of Error is usually expressed in terms of a "confidence interval." The confidence interval tell us that we can be certain to a specific degree (usually 95%) that the makeup of voting population, as a whole, is within a specified amount of the data from the survey.
Back to the example above:
Obama leads Romney 50% to 45% with a Margin of Error of 3.5%.
A reasonable assumption, but somewhat incorrect assumption, would be for you to say "Obama leads by 5% and since Obama's numbers are outside the Margin of Error, Obama leads outside the Margin of Error."
The reason that would be an incorrect assumption is because the Margin of Error does
not refer to the magnitude of the lead, but rather the magnitude of each candidate's support in the poll.
Simply put, it means Obama has the support of 50% of those polled and Romney has the support of 45% of those polled with a 3.5% Margin of Error.
The accurate way to look at the poll is to employ the Margin of Error and realize that for each candidate, the data show support anywhere from 3.5% below the cited figure to 3.5% above the cited figure. In other words, you must look at the upper and lower limits (also known as upper bound and lower bound.)
From the example employing the Margin of Error:
Obama's support lower limit: 50% - 3.5% = 46.5%.
Obama's support upper limit: 50%+3.5% = 53.5%.
Romney's support lower limit: 45%-3.5% = 41.5%.
Romney's support upper limit: 45%+3.5% = 48.5%.
So, one looking at that could say, with a 95% certainty:
Obama's actual support is between 46.5% and 53.5%,
Romney's actual support is between 41.5% and 48.5%.
Statistically speaking, that is why it would be incorrect to say Obama has a 5 point lead in the example above and why it would be incorrect to say Romney is down 5points in the example above.
Statistically speaking:
IF: Obama's actual support is at the lower limit of the confidence interval, 46.5% and
IF: Romney's actual support is at the upper limit 48.5%
THEN: Romney could actually be in the lead and that is why it would incorrect to say Romney is down 5 points in the example above and incorrect to say Obama leads by 5%.
Therefore,
if Obama's lead is not
more than double the Margin of Error (in the examples herein
more than 7.0%) then his lead in the polls are not statically significant.
In other words, IF: President Obama's lead is more than double the Margin of Error stance, let's say 7.01%, then Obama's lead is statistically significant and we can be 95% certain he is actually winning.
Now let's say a second poll comes out by the same pollster showing:
Romney =48%
President Obama =46%.
It would be incorrect to say Romney went from 5 down to 2 up thus giving Romney a 7 point swing.
It would also be incorrect to say President Obama went down 4 and Romney went up 2 so Romney has a 6 point swing.
Why would both of those be incorrect? Answer: Because of the Margin of Error and the upper and lower limit of the confidence interval.
You see, looking at the first example where President Obama had 50% and Romney had 45% and employing the Margin of Error, one could say that if the same poll were taken 100 times, then the data results would be the same 20 times of those 100 times. And since you would use the upper and lower limits for each candidate found in example 1, it would be incorrect to say Romney had a 7 point swing and/or a 6 point swing.
To break that down:
For Romney
From example 1:
IF: Romney's actual support was the upper limit of the confidence interval, 48.5%
From example 2:
Data show Romney at 48%
... Meaning no 7 point swing
For Obama
From example 1:
IF: Obama's actual support is at the lower limit of the confidence interval, 46.5%
From example 2:
Data show President Obama at 46%
... Meaning no 6 point swing for Romney and no 4 point swing for Obama.
Let me reiterate: if Obama's lead is not
more than double the Margin of Error (in the examples herein more than 7.0%) then his lead in the polls are not statically significant and the same goes for Romney.
Remember, 95% is not equal to 100% thus, there would still be a 5% possibility that Romney could win.
The bottom line is, even if you had a gazillion polls showing one candidate with a 5 point lead, that lead could still be insignificant to either one of the candidates.
Political polls employ Inferential Statistics, so in order to determine if either candidate's lead is significant across your sample, you would have to employ Meta-Analysis.
I should note here that multiple polls by the same pollster can, but not always, increase accuracy. I say not always because some pollsters just suck.
To me, the Margin of Error description in this Diary is the most important thing to understand if you want to know what a poll really means, sadly the significance of the Margin of Error is one thing far too many people, especially journalists and talking heads, do not understand.
I intentionally did not discuss: weighting and/or other biases some pollsters put in polls. While some pollsters use weighting to correct the sample so as to get the sample to represent the actual population as best as possible. Different pollsters can, and do, use biases in many directions including, but not limited to: weighting, phrasing the question etc. Because weighting problems and other biases in polls could take up an entire Diary, the discussion of "weighting" is outside of the scope of this Diary.
I hope I have illustrated the importance of understanding the Margin of Error when examining political polls as simple and understandable as possible.