Are polls misproportionally missing younger people and minorities? What are the effects? I also include a primer on polling and margin of errors. Let's dive in further below the fold.
I’ll be the first to admit that I have a cell phone fixation. I’m checking the internet on my Samsung Sync constantly. It’s my only addiction (drinking is a hobby, fantasy sports are an investment and smart, beautiful women are a calling) and I don’t apologize for it. My only regret is that I never get to take part in any polling. That got me thinking. I’m a young African-American with a college degree and no landline. That’s a prime Obama demographic and polling skips me entirely. Is there a cell phone bias? Is it larger than the debunked Bradley Effect? (If anything, there is probably a reverse Bradley Effectin swing states.)
Will this bias work out worse for McCain than it did for the 1986 Celtics?
Factcheck.org lists a Pew Research Center poll that finds landline only polls return the same results. The same study admits that it underrepresented people and minorities under 30. I’m curious how many other people like me are out there. The What Do I Know?blog discusses many of these same issues. They find that 25% of young people don’t even have landlines. Cable companies routinely threw in a digital landline as a package deal with cable and the internet. My roommates and I never even bothered to hook up the phone.
Being the amateur statistician that I am, I decided to conduct an (overly simplistic) experiment. The normal distribution curve says that all things being equal, 95% of the time, the actual results should fall within the expected results plus or minus the margin of error. Let’s look at an example. If a poll says Obama is up 2% with a 3% margin of error, it really won’t tell you who is going to win. A perfectly constructed poll could get actual results of Obama winning by 5% or losing by 1%. It can be absolutely wrong and be a flawless poll! Sometimes people say the polling was wrong even though it was within the expected margin of error. That’s what makes polling close contests so difficult. Things only get froggy when the results fall outside the expected margin of error. Think Obama vs. Hillary in New Hampshire. Even then, the best polls will be completely wrong 5% of the time. It’s only a systemic problem when the results are consistently outside the margin of error.
Now this makes perfect sense in the lab. A seemingly infinite amount of variables can invalidate the whole poll. My hypothesis is that more than 5% of the actual primary results outside the margin of error of the polls taken right before the vote. This would lead me to believe that there is an error in the way the polls were constructed. It won’t tell us exactly how though. The media also has a bad tendency to conflate poll numbers, rendering them a lot less reliable. Let’s delve into some numbers and see what we find. If my methodology, assumptions or explanations are wrong, please discuss in the comments.
Lindsay Politics already conducted this research. This is a chart of their findings:
Friday, July 11, 2008
This is a study I did Based on the polling of the 2004, 2006 and 2008 (Primary) elections. This is the average amount of races they blow per election cycle. 0 is the perfect score.
- Zogby 1.67
- Survey USA 1.74
- Quinnipiac 2.00
- Mason-Dixon 2.13
- Gallup 2.33
- RealClearPolitics* 2.67
- Rassmussen 3.00
- Research 2000 3.00
- Strategic Vision 4.00
- ARG 7.50
*- RCP is an average of the polls. The primaries were an outlier, without them it would have a stunning 0.5
There are 100 races every presidential primary cycle, 50 Democratic and 50 Republican. We would expect the polls to completely whiff an average of five times. Almost all these polls performed far better than this. I assume the Zogby poll is their regular phone poll. Their internet poll is a lot less reliable. Another point of consideration is that was easier to predict the 2004 primaries because Kerry and Bush locked up their nominations a lot earlier. The 2006 Congressional primaries also had a lot of incumbents. I’m not sure what Lindsay Politics considers "blowing the race" but I am assuming that they mean the results fell outside the acceptable margin of error.
The polls were far more accurate than I expected. This leads me to believe that another cause will be to blame if the polls disappoint like in 2004. My theory is that likely voter models need to be tweaked. Most polls separate likely voters from registered voters. Their models indicate that people who voted in three straight elections are more likely to vote again. This obviously works against college students who aren’t old enough to vote for three cycles. The massive amount of newly registered voters leads me to believe that more young people will vote than normal. I also expect much heavier African-American turnout than normal. I predict that polls nationwide will stay open hours later to accommodate the extra demand. Expect long lines in swing states. The only problem is that no one knows how to tweak the models for this novel election.