An Excessively Detailed Response to "The Polls were Wrong in 2016"

by DrFrink

Community

(This content is not subject to review by Daily Kos staff prior to publication.)

Saturday, Dec. 21, 2019 Saturday, Dec. 21, 2019 at 5:47:30am PST

ScreenShot2019-12-21at8.32.27AM.png — Where are we now? In surprisingly good shape.

Listen, we’re all still a little shell-shocked from 2016, and pundits and the media, especially so. Looking at the polls, it seemed like Hillary was going to win, and whatever other legitimate narrative you want to invoke (the polls were tightening, Comey, etc.) the fact remains that she didn’t. Which means that in talking about the Dem’s prospects for 2020, the cynical and worldy-wise answer, from both left and right, is to point out that the polls were wrong in 2016 and leave it at that.

But do you know what? I've been looking at election statistics for a long time, and do you know what I've learned? By and large, candidates who are leading in the polls actually end up winning. Let’s take a look at the 2018 general election:

Federal and governor’s races in 2018, polling prediction vs. result.

Each dot represents a single 2018 race (Senator, Governor House). My model is just about as simple a one as you can imagine: I use public, nonpartisan polling that 538 rates as a B or higher. I then apply a weighting (a Gaussian filter, in case anyone’s curious) so that polls much older than a month become pretty much ignored, and then plot them against the final 2 party results. There is no other “special sauce” and in these cases, no attempt at debiasing.

The data is best fit to a straight line (Dems technically overperformed by ~0.5% on average, but that’s so close to “fair” that I’ll ignore it for now), with very little scatter, consistent with a systematic uncertainty (above and beyond sampling) of about 3%.

While I'm plotting 2018, the same has been true for elections in 2017 (NJ, VA), 2019 (KY, LA), and special elections (AL).

All of the well-polled non-2018 races since 2016.

Polls, by and large, are pretty good at predicting the future.

“But wait!" our hypothetical conversational companion says, “the polls totally screwed up 2016!” Well, yes and no. Nationals were off by 2, state polls, in high-consequence states, were off by 3, which is actually pretty typical. Putting in ~3% systematic noise, here are the odds you might have given Hillary (along with relevant events in the timeline):

How a straight analysis of Hillary’s prospects would have looked in 2016, using polls only.

First, a note on the model:

It’s relatively straightforward. State polls are assumed to be well modeled by high-quality polling. If there is no or little state polling, the estimate is given by taking the national improvement since the last election, and adding that to the states without data. For most states, this won’t matter, since competetive states are almost always well polled. I then allow the possibility that all polls are off by some systematic amount, around 3 points in either directions. Finally, I use these probabilities to simulate a large number of possible outcomes and count the number of times the Democrat wins. You can see the current race for a number of top candidates here.

There’s a lot going on in the plot above, but looking at just the quality polls, this model estimated that Hillary had an 84% chance (and falling) on election day. 538 gave her 70%. Betting markets gave her 80. If your response is, “Well, she lost! So your model must be wrong” then the conversation is tough to continue. Simply saying "polls were wrong" is like saying a .160 hitter NEVER gets a hit.

What’s more, is even just eyeballing the plot, you can get a sense that while the probabilities bounce around, they do so within a range. Looking at the last 4 presidential elections, I’ve found that it’s around the equivalent of a typical swing of 1.6% a month, but in random up/down directions. (A so-called “random walk” to the statistical types).

So where are we now?

Well, the map at the top is Biden’s current map. The polling is probably pretty consistent with your intuition. Dems, overall, are leading or ahead in places like AZ, OH, NC, FL, GA — in addition to the Clinton states, plus PA, MI, WI.

We can also look at the trends over the last few months:

A model of Biden’s expected win probability and electoral margin, over the last few months

The last week has been a little rough, and while Cassandras will be quick to attribute the dip to impeachment backfiring, I’m not so sure. There was a budget deal, space force, and much else that may have moved the needle a bit, but you’ll also note, looking at the context, that the movement is pretty typical.

For what it’s worth, some candidates are hit harder than others. Here are Sanders’s and Warrens trendlines in the same analysis:

Now, lest anyone accuse me of pushing one candidate over another, consider the alternate possibility that at this stage, H2H numbers are probably best read in the following way: Whoever is seen as the “frontrunner” probably has the numbers most in line with the competitiveness of the ultimate nominee. The Republican party has already settled (in every way imaginable). Based on history, Dems picking theirs will, if anything improve their relative standing. In other words, don’t read too much into “electability” based on the relative rankings of the current numbers.

Don't get complacent, but don't assume we're going to lose. The narrative that Dems are in trouble or that this is Trump's race to lose is frankly insane. Based on history, on actual data, the Dems are the overwhelming favorites to win next year. Please act like it.

—

Finally, a little pitch:

I have a facebook page, The Progressive Physicist, where I do preliminary versions of these calculations, data dumps, and the like. Please consider liking and subscribing.