As the 2016 election comes to a close, let's take a look at why the polls could be off

by Stephen Wolf for Daily Kos Elections

Daily Kos Staff

Friday, Nov. 04, 2016 Friday, Nov. 04, 2016 at 6:00:09am PDT

Obama for America's 2012 internal polling plotted against Gallup's. — Obama's internal polls found a much more stable race than Gallup did in 2012.

With this year’s presidential election at long last entering its final stretch, the polls should be at the peak of their predictive power. However, if you followed the polling in the 2012 and 2014 elections, you might remember watching with surprise as one party—first the Democrats, then the Republicans—consistently outperformed the polls both years. These two election cycles offered stark reminders that polling is by its nature prone to error, and there are many factors that could cause a systematic polling miss for or against either party once again in 2016.

No chart better illustrates just how off public polling can be than the one at the top of this post, which shows the wide differences between the Obama campaign’s internal polling and Gallup’s data in 2012. Obama’s polling was incredibly stable, only barely budging during supposed “game-changers” like Romney’s “47 percent” gaffe and the president’s poor first debate performance. By contrast, Gallup exhibited wide swings, with Obama surging during the Democratic National Convention and plunging after the first debate. Gallup ultimately predicted Romney would win, but Obama prevailed by 4 points, right in line with his own polls.

We’re not saying we think the 2016 polls will miss in similar fashion, or that they’ll even miss at all. But they could—and certainly some will—so in this post, we’ll take a look at all the ways polls can go wrong, and how they might impact the data.

How polls can go wrong

Polls are statistical samples of a small subset of the population that’s supposed to be representative of the electorate as a whole. Because we can’t be certain that any sample is perfectly representative, there is a margin of error associated with every poll simply due to random chance. Basic statistics says that the more respondents you contact, the smaller this error will be, since a larger sample is more likely to be representative. (It also costs more to survey more people, of course.) But even the phrase “margin of error,” when used in its technical sense, can be misleading, because a poll’s true margin of error is often greater than what you see reported in most news coverage, which usually only focuses on the kind of error that's a function of sample size.

One reason for this larger error is because pollsters can’t actually contact voters from every particular subset of the electorate at an equal likelihood, and there are many reasons why. Some types of voters might be more likely to respond to surveys than others: In the U.S., women, older folks, and white people tend to to be more willing to talk to pollsters. And while pollsters who don’t contact cellphone-only households are a vanishing breed, some do still only call landlines, meaning they can’t even reach the nearly half of the electorate that doesn’t have a traditional landline phone.

Americans have also become increasingly less likely to respond to polls of any sort, with response rates a mere fraction of what they were a few decades ago. Subsequently, pollsters have to weight certain respondents more to try to accurately reflect the electorate, since some demographics like young voters or Latinos are particularly harder to reach. Consequently, a poll’s true margin of error will be even greater thanks to weighting than one might assume just from its sample size. The New York Times’ Nate Cohn recently offered an excellent look at how several pollsters arrived at different election conclusions using the same exact raw data from the exact same group of respondents because they made different judgments about how to weight that data, reflecting their different views of the electorate that’s likely to show up this year.

In 2012, many pollsters—although by no means all of them—ran into trouble because they underestimated the share of the electorate composed of Democratic-leaning demographics like minority voters. It’s possible that pollsters could be making similar mistakes this year, especially because the electorate’s precise composition is constantly evolving. Estimating the turnout of particular demographics is particularly fraught, and pollsters take different approaches to determining who is a likely voter, or even if they have already voted early.

One hazard that may be plaguing pollsters recently is called “differential partisan response.” What that means is that at various points during a campaign, supporters for one candidate are more likely to answer polls than the other candidate’s supporters, or are more likely to claim they are undecided—even if voters haven't changed their minds about who they’re voting for or whether they’ll actually vote. This happens when supporters of one candidate feel discouraged: They still plan to vote the same way, but they’re just less apt to want to be polled when their spirits are lower.

Indeed, this factor is likely one key explanation for the wide divergences seen above between Obama’s internal polling and Gallup, particularly after that notorious first debate that seemingly sent Democrats scurrying under their beds. But Internet-based pollster YouGov, which was able to recontact the members of its panel rather than having to rely on calling up an entirely new set of respondents, found that Obama’s poll numbers barely fell. Gallup, however, simply couldn’t reach these dismayed Democrats, while Obama’s legendarily sophisticated data operation knew how to account for them, hence Gallup’s huge drop and the Obama team’s much smaller one.

YouGov revisited the topic this week with a compelling investigation into how these different partisan response rates might have played out lately with various events such as Trump’s sexual assault tape scandal and FBI Director James Comey bombshell “emails” letter. YouGov’s data suggests that these supposed game-changers really didn’t move underlying vote intentions all that much, but that other public polls swung wildly simply because one side’s partisans were less enthusiastic about answering polls. If that’s true, it would mean Clinton has long held a modest lead, rather than seeing it balloon then collapse over the course of October.

Analysts for many years have realized the value of averaging polls from different outfits, which can help minimize the impact from any individual inaccurate poll, but even the averages can be wrong if most underlying polls are as well. After 2012’s infamous polling miss, Gallup and others stopped releasing presidential polls in the 2016 cycle. The overall volume of polling has also declined compared to 2012, particularly among higher-quality polls that contact cellphones. That means that a few badly inaccurate polls can have even greater impact on the average, such as the infamous 2016 LA Times tracker poll that engages in highly questionable weighting practices.

Finally, 2016’s higher-than-usual proportion of third-party supporters and undecided voters could increase the chances that the polls are off. In most modern elections, third-party candidates tend to fade as Election Day nears, and that appears to be occurring in 2016. However, with two uniquely unpopular major-party nominees, it’s an open question as to just what proportion of these third-party respondents will ultimately come home to a major party or even vote at all. Polls excluding third parties generally give Clinton a slightly wider margin of victory than those that include them. If there are more or fewer third-party voters than expected, it could affect the ultimate result.

Why polling errors might favor Republicans ...

So what are some reasons that might systematically favor Republicans? One possibility is the so-called “shy Trump voter,” who might think openly voicing his or her support of Trump is socially unacceptable. However, there’s little evidence of this phenomenon in 2016: While elites might think expressing support for Trump is gauche, actual Trump backers don’t seem to think so, as evidenced by their rabid behavior on social media and in real life.

Trump particularly likes to highlight the United Kingdom referendum on European Union membership back in June, or Brexit, as a reason he could win, claiming the polls were wrong. (As with Trump, elite opinion was solidly arrayed against Brexit.) However, Trump’s comparison is, like almost everything else he says, completely off-base, because many pollsters did indeed show the U.K. would vote to leave the E.U. It was the pundits who were incorrect.

Declining Democratic enthusiasm might also cause polls to overstate Democratic voters. In particular, data from states that conduct early voting indicate lagging turnout among black voters, the most bedrock Democratic constituency. If pollsters have overestimated the African American share of the vote, that would in turn overestimate Clinton’s chances. However, the data is far from conclusive, and Clinton could still make up for faltering black turnout with other groups like Latinos who are newly motivated to vote by Trump’s terrifying rhetoric, or well-educated, Republican-leaning white voters who’ve been turned off by the GOP nominee. Furthermore, pollsters might already be factoring in these changing demographic turnout rates into their model of the likely voter pool.

… And why they might favor Democrats

On the flip side, Republicans might be less likely to vote than the polls assume. We’ve previously written in detail about three scenarios that could cause the polls to overstate Trump’s support, particularly if pollsters overestimate Republican turnout.

Trump has drawn condemnation from hostile Republican elites, many of whom (at one point, at least) have called on him to drop out of the race. Such abandonment by the party establishment helped cause Republican Rep. Todd Akin to dramatically underperform the polls in his 2012 Missouri Senate campaign following his legendary “legitimate rape” gaffe. No one can be sure why Akin fared even worse than his worst polls, but his unusual status as a “deplorable” pariah is one of the closest parallels we have to Trump’s situation. It’s possible, then, that Trump could similarly fail to inspire many Republicans to turn out for a nominee they despise and don’t think can win.

Trump himself might even be discouraging Republican voter turnout by falsely claiming the election is rigged. Political science research and some recent polls offer some evidence that Trump could be having a negative impact on Republican turnout. While some of this dip in enthusiasm could be the result of differential partisan response mentioned above, some of it might indeed be real, because if Trump supporters think the election won’t be fair, many will conclude there’s no point in voting. Whether this effect is substantial enough to cause a major polling miss we just can’t say, but we’ll know soon enough.

Finally, Trump could underperform his polls specifically thanks to his lack of a traditional campaign. Hillary Clinton and the Democratic Party have pumped countless millions into building a turnout operation to ensure their supporters actually vote. Meanwhile, Trump has relied more on free media coverage, and while that might keep his name in the news, it doesn’t do as much to motivate infrequent voters to get to the polls as a neighbor knocking on their door and asking for their vote. This “field effect” could particularly throw off the polls in the swing states, where Clinton has spent millions to get her voters to the ballot box.

While pollsters will of course try to account for many of these factors, 2012 and 2014 polling misfires show they were already struggling, even before 2016 and Donald Trump introduced even more opportunities for error. Thus far, we do not have a solid basis to assume the polls are systematically biased in favor of one party or the other, but we shouldn’t be surprised if that turns out to be the case once the results are in.