I just found a source for Democratic primary polls for all 50 states! There is a national sample size of 39,590, with samples varying between 8,380 for California and 90 for Wyoming. So be more skeptical about the results for small states.
The data comes from Morning Consult Intelligence, which compiles polling data from all of Morning Consult’s polls since June. If you like looking at polling data, make a (free) account and you can look at the data in more detail.
While this is an excellent resource, there are also some caveats that you should be aware of while looking at the data:
- First, Morning Consult is an internet pollster — so this is (probably) a bit methodologically worse than live cell phone pollsters that call cell phones.
- Second, since the data comes over a long period of time, it does not fully reflect the current state of the race. Since Sanders' support has gone up over time, the raw numbers overstate Clinton’s support and understate Sanders' support. However, we can correct for that. And we can still see which states are relatively stronger or weaker for each candidate.
First, here is the raw data:
Remember, this overstates Clinton's support and understates Sanders' support. The point is to see in which states Clinton is doing better than her national average, and in which states Sanders is doing better than his national average:
But there is a problem with this. For some states, the percentages of Sanders, Clinton, and Don’t Know/No Opinion don’t add up to 100%. Most states are either 100 or close enough that it doesn’t make much difference. But for a few small states (Alaska, Hawaii, North Dakota, and Wyoming) it is below 90% — enough that it could make a difference.
Second, here is data corrected for the adding up problem:
We can try to correct for that by normalizing everything to 100%. You should still take the states that don't quite add up to 100% with a grain of salt, but if you do that, you get the table below. From this, we can compare which states are relatively stronger and weaker for each candidate. Strong states for each candidate are colored in green, weak states are highlighted in red. From this, we can see that Clinton's support has been heavily concentrated in one region — the south. Sanders' support has been relatively more evenly distributed throughout the rest of the country. But Sanders has polled relatively well in the west. There is one curious exception to that in this data — Nevada.
This is important for understanding what is going to happen next in the campaign. South Carolina and Nevada (at least in this poll, though not necessarily in others) are two of Clinton's strongest states.
In addition, many of the the states that are going to vote on Super Tuesday are literally the very best states for Clinton — including Alabama, Arkansas, Georgia, Tennessee, and Texas. That means that Clinton doesn't just need to beat Sanders there, but she needs landslides at least as strong as the Sanders landslide in New Hampshire.
For Clinton, this is both a blessing and a curse. It is a blessing because some of her strongest states are about to vote. But it is a curse because once they vote, if Clinton hasn’t built up an insurmountable pledged delegate lead, the states that are stronger for Sanders will start voting. So if Clinton doesn't outright knock Sanders out of the race quickly, then she will be in some serious trouble.
The Polling Drought and Effects of Poll Types on Clinton's and Sanders' Numbers
We are currently in a polling drought. There has only been a single national live phone poll since January 27 (before Iowa). But on the Pollster polling average, there have been 11 internet or automated robopolls since then.
Internet polls and robopolls have consistently understated Sanders' support and have consistently overstated Clinton’s support. The amount of that bias can be roughly quantified using national poll data from Huffington Post pollster since November, smoothing it with a bandpass filter, and comparing the average distances from the trend for each type of poll.
Understatement of Sanders and overstatement of Clinton support by robopolls and internet polls (relative to live phone polls):
Poll Type |
Pro-Clinton Bias |
Pro-Sanders Bias |
Internet |
+.1 |
-1.8 |
Automated |
+3.8 |
-4.9 |
IVR/Online |
+2.8 |
-6.0 |
Relative to live phone polls conducted at the same time, Internet polls have on average overstated Clinton's support by .1% and understated Sanders’ support by 1.8%. Automated polls (robopolls) have on average overstated Clinton's support by 3.8% and understated Sanders' support by 4.9% relative to live phone polls. Combination IVR/Online polls have overstated Clinton's support by an average of 2.8% and understated Sanders’ support by 6.0% relative to live phone polls.
If all polls were live phone polls, a smoothed polling average would look like this (currently 47-41):
So we are likely to see a substantial tightening of the race when live phone polls start to be released again. I would guess the true state of the race is probably within 6 points, and possibly a bit closer depending on how much of a bounce Sanders is getting out of New Hampshire and the debate. Sooner or later we are bound to get a poll showing Sanders with a national lead, although at least the first one of those will have to be considered an outlier.
Third, here is data normalized with a "least resistance swing”:
We can normalize the Morning Consult polling data to this approximate state of the race (47-41) by applying a “least resistance swing.” This assumes that as Sanders has gained support, he has probably won over undecided voters in proportion to his pre-existing support. With that assumption, and with the race at about 47-41, the Morning Consult data would give us the following estimate of the current state of the race:
Overall that looks about right. But it probably overestimates the swing to Sanders in states that have already voted or where Sanders is already well known - New Hampshire, Iowa, and especially Vermont. Other than that, and with the other caveats for states like Wyoming and Alaska, this is a useful estimate of the current state of the race across all 50 states.
Democratic Primary Model
I incorporated this into my Democratic primary model, treating these as particularly low-weighted state polls. Since the last update, I also made a few other improvements to the model.
- First, I more rigorously used GLM regression to estimate the demographic characteristics of the Democratic primary electorate from the 2008 Presidential exit polls for states that did not conduct exit polls. Previously I had just entered in plausible numbers as a starting point. The reason why I used 2008 exit polls is because those exist for all 50 states, while 2012 exit polls are not available for all states.
- Second, I am now taking into account variations in the proportions of urban, suburban, and rural Democratic voters across states, using 2008 exit polls (rather than just accounting for variations in rural, suburban, and urban voting age population).
- Third, I incorporated New Hampshire exit polls into the model. Together with the Iowa exit polls, this forms half of the weight that goes into the national polls/demographics portion of the model.
With these improvements, and including all new polls from RCP and Huffington Post, the model estimates that the current state of the race is this:
Accuracy of the Model
- The model correctly predicted the results of the Iowa Caucuses within 1 percent. It was also within a percentage point or two in all Iowa Congressional Districts as well.
- The model slightly underestimated Sanders in New Hampshire. It estimated Sanders would get 57% of the vote. Other than the general fact it won’t always be exactly right, my best guess as to why it slightly underestimated him is that he probably gained a bit of ground following Iowa, but this wasn't picked up in national polls for the reasons explained above (lack of up to date national polls and the fact that most of the national polls were automated phone or internet polls).
Congressional District Level Projections:
On the CD level, here is the model’s current forecast:
This is a continuing part of an ongoing series using polling data, past exit poll data, census data, and other data sources to analyze the 2016 Democratic Primary.
Previous posts are:
- Poll Meta-Analysis: The Bernie and Hillary 2016 Coalitions, and how they compare to 2008 Obama/HRC
- Poll Data Analysis: The Current State of the Democratic Primary
- How the delegate math shakes out for Bernie and Hillary down to the Congressional District level
- Bernie Sanders Did Much Better With Non-Whites In Iowa Than You Think
- Democratic Primary Model, Feb 8 Update (Pre-NH)
- Democratic Primary Model, Feb 9 Update (Pre-NH)
For more detail on how delegates are allocated across different states, check out this excellent resource from Torilahure.