I have been working on making a model to forecast the Democratic primary (see links to previous posts at the bottom).
I have now made some further improvements — in particular adding state polls and by incorporating the exit polls from the Iowa Caucuses into the model.
If you want to read about the methodology, scroll to the bottom.
Click here for a complete projection for all Congressional Districts with current national polls (37% Sanders — 50% Clinton). [warning, large image]
Click here for a complete projection for all Congressional Districts if Quinnipiac is right and the race is 44-42 with Clinton slightly ahead. [warning, large image]
Now that I have a fairly complete model, I will try to update this every day (or at least every day or two).
How good is the model?
For most states, that remains to be seen.
But, it came very close to the mark in Iowa. It predicted that Clinton would win Iowa by about 50.7%-49.3% (not including O'Malley). It also did well in predicting Congressional District level results. It correctly predicted that IA-03 would be Clinton's strongest congressional district (predicted 52.0% Clinton - 48.0% Sanders). It predicted IA-01 would vote 50.6% Clinton-49.4% Sanders, IA-02 would go 50.1% Clinton-49.9% Sanders, and IA-04 would go 50.2% Clinton-49.8% Sanders.
It was also right on the money predicting the statewide delegate split (23 Clinton — 21 Sanders). However, those are not actual statewide delegates, but are calculated from statewide delegate equivalents. The delegate split could subsequently change in additional rounds of caucuses that Iowa will hold in the coming months.
The model also predicted other things about the results correctly that many people were not expecting:
- That Sanders would do well among voters with less education and lower incomes (in contrast to the narrative that his support was supposedly concentrated among higher income, higher education white liberals).
- That Sanders would get solid support in rural areas, and not just in urban areas and college towns.
- That Clinton and Sanders would both draw support from portions of the Obama ’08 and Clinton ‘08 coalitions. County results confirm this:
Clinton and Sanders also split the most heavily non-white precincts in Iowa.
The Impact of Iowa (and New Hampshire)
The most important outcome from Iowa (and New Hampshire) are the effects that they will have on the race going forward — on the results in Nevada, and South Carolina, and on the media narrative — and especially on any momentum Sanders may gain in national polls following the results.
The Bounce Factor
Fladem has found that in the past, upstart candidates (such as Sanders) typically get a significant bounce following Iowa and New Hampshire, if they have good results and win at least one of the two states. Read the full diary from Fladem for the details (it's worth it). Fladem says:
My own work estimates the bounce based on a linear regression of the 10 instances since 1980 where a front runner has won. You plug the current national numbers and then you project the shift across the states.
BUT — I reduce the bounce based on the percentage of the African American vote — essentially I cut the bounce in half.
Here is the predicted bounce if Sanders wins Iowa and New Hampshire. The R squared for this is high: .735
Results of linear regression where front runner is beaten in Iowa or New Hampshire |
|
|
|
Intercept |
5.903613917 |
|
|
Prior National Poll |
0.614479111 |
|
|
Won either Iowa, NH
or both (either yes or no) |
16.28478719 |
|
|
|
|
Clinton |
Sanders |
Current National Polling |
|
51.2 |
38 |
Predicted if Sanders wins Iowa |
|
37.4 |
45.5 |
So far we can't really tell very well what sort of bounce Sanders is likely to get out of Iowa and New Hampshire. Surprisingly, there have not been many national polls since Iowa, and even fewer from high quality live phone pollsters. The only poll fitting that bill is one from Quinnipiac, which showed Sanders with a very large bounce (Clinton 44-Sanders 42) as compared to 61 Clinton-30 Sanders in the last Quinnipiac poll).
Other polls have been either robopollsters or internet pollsters of mid to low quality:
-
PPP — A robopollster. Clinton 53-Sanders 32. That doesn’t seem good for Sanders when you look at it in isolation, but it is a significant improvement for Sanders from PPP’s last pre-Iowa national poll (which was Clinton 56-Sanders 28). The problem with robopollsters is that it is illegal to robo-poll cell phones. Pollsters that release crosstabs of cellphones and landlines have consistently shown that Clinton does much better with landline voters and Sanders does much better with cell phone voters. As a result, robopollsters significantly underestimated Sanders and overestimated Clinton in Iowa. PPP substantially over-estimated Clinton’s strength in Iowa (they had it at 48 Clinton-40 Sanders, probably because of these methodological issues).
-
Rasmussen — another robopollster. They have Clinton 50-Sanders 32. In their pre-Iowa poll, they had Clinton 46-Sanders 30, so this is similar but with lower undecideds. For some of the same reasons as with PPP, this is likely to over-estimate Clinton and under-estimate Sanders.
-
Reuters — an internet pollster. This has shown movement to Sanders. The problem is that it is not clear which group of voters to look at. Reuters' Registered voters category is not ideal because it includes independents and Republicans who will not vote in the Democratic primary. The “Likely Democratic Primary voter” screen is also not good either, because it seems to exclude all independents (even in closed Democratic primaries, self-identified independents made up a significant portion of the electorate in 2008). The ideal screen would be to include both self-identified Democrats and self-identified Independents (but not ones who say they won't vote in the Democratic primary). Unfortunately, reuters doesn’t do anything like this. Without this, the Reuters poll is really not really usable or useful. What a waste of what could be a decent poll.
-
Morning Consult — An internet pollster. They have Clinton 51-Sanders 35. This is a slight improvement for Sanders since their pre-Iowa poll (which was 50-34) and a bit more of an improvement from their poll a week earlier (48-31).
And that's really it. So there is not much to go on, and even less of quality to go on. Probably most national pollsters are waiting until after New Hampshire to do new national polls; if that is true, then we will be due for a slew of higher quality national polls around Thursday and Friday.
As a result of the lack of quality polls, we can't really tell with any certainty how the race has changed since Iowa. All the polls above except Rasmussen show some movement to Sanders after Iowa, but polling averages still mostly consist of pre-Iowa polls, and most of the post-Iowa polls are problematic for the reasons described above.
Without more certainty from higher quality polls, we can at least look at google trends.
Google Search Trends after Iowa
On Google trends, we can see that Sanders got a huge spike in attention nationally as the Iowa Caucus results were reported, and candidates gave their (victory?) speeches. Clinton got only a very small spike in comparison, and also was dwarfed by the spikes for Cruz and Rubio on the GOP side.
Looking at the whole past week, Sanders has dominated search interest (beating out Clinton and all Republicans). In addition to the spike from the Iowa caucuses, he got large spikes from the town hall and also from the debate. In addition, Sanders got a larger and more sustained spike in search interest following his SNL appearance than any of the Republicans got from their debate:
Democratic Primary Projection (current national polls)
For the current level of national poll averages (roughly Sanders 37%, Clinton 50%), the model projects this:
The prediction for New Hampshire is of particular interest. The model has it at 58.1% Sanders and 41.9% for Clinton (if undecideds split evenly). The demographic/national polls portion is a bit more skeptical of how well Sanders will do — it predicts Sanders 55.4% and Clinton 44.6%. But that only has 17% weight in the overall projection. 87% of the weight is to recent state polls in New Hampshire which mostly show solid Sanders leads.
By Congressional District, the model predicts Sanders will win NH-01 by about 57.4% to 42.6%. and win NH-02 by about 58.6%. If what the model predicts happens, Sanders will win 5-3 delegate splits in both Congressional Districts and also win a 5-3 delegate split statewide, and end up with a total of 15 delegates to 9 for Clinton. Personally, I suspect that the model may be underestimating the difference between NH-01 and NH-02 by a point or two.
If the result is a bit better for Clinton (near a 55.4%-44.6% win for Sanders), then that will mean that Sanders is not over-performing what you would expect based on demographics. On the other hand, if he does better, that may be a sign that there has indeed been a shift to Sanders that hasn't been picked up fully in the national poll averages yet.
What if Quinnipiac is Right?
What if Quinnipiac is right, and the race is about 44% Clinton to 42% for Sanders nationally? In that case, it should look something like this:
This is a continuing part of an ongoing series using polling data, past exit poll data, census data, and other data sources to analyze the 2016 Democratic Primary.
Previous posts are:
- Poll Meta-Analysis: The Bernie and Hillary 2016 Coalitions, and how they compare to 2008 Obama/HRC
- Poll Data Analysis: The Current State of the Democratic Primary
- How the delegate math shakes out for Bernie and Hillary down to the Congressional District level
- Bernie Sanders Did Much Better With Non-Whites In Iowa Than You Think
Methodology: short overview of how it works
- There is a component based on national polls, 2016 exit polls, and demographics.
- National poll crosstabs for region, party identification, race, gender, age, ideology, education, income, and neighborhood type are collected for national polls since
- This is used to infer, based on demographic data from 2008 Democratic primary exit polls, what the vote is in different states. Census data and other information is used to estimate the vote on the Congressional District level, where many delegates are estimated.
- I incorporated the demographic crosstabs from the Iowa caucuses in the same way as for national polls. Currently I am giving 50% weight to the caucus exit polls and 50% weight to the other national polls. Previous to this I was giving 100% weight to national polls.
- There is a component based on state polls.
- Only state polls since November 1, 2015 are considered.
- State polls include everything from Real Clear Politics, plus some additional polls I have been able to find.
- This does not include ‘polls’ from Overtime Politics, which I suspect may be fraudulent.
- State polls are not viewed in a vacuum, but instead are considered in light of national polls. For example, when the model is considering a poll from 1 month ago, it considers not just the results from that poll, but also how the race has shifted nationally since that poll was taken.
- Weights of individual state polls are adjusted for:
- Pollster Quality Rating (from 538)
- Recency of the poll
- The national-poll/demographics component and the state-poll component are averaged together with variable weights.
- The overall weight given to the state poll component depends on the quantity, quality (pollster rating), and recency of state polls.
- To give you an idea of how these weights work, at the upper end of the projection for New Hampshire currently puts 83% weight on state polls and only 17% of the weight on national polls and demographics. That is because there are a lot of recent polls of New Hampshire, at least some of which are of reasonably good quality, so the model trusts those state polls to be more or less accurate. On the other hand, for states with no polls since Nov 1, the projection relies entirely on the national poll/demographic portion of the model. The current weights for the 2 components are:
-
- So you can see how the weight put by the model on state polls increases with the quantity, quality, and recency of the available state polls. If there are a lot of good, recent state polls, the model will make most of its projection using those. If not, it uses more of the national poll/demographic model. The model isn't overly trustful of low quality state polls.
- For example, let’s compare Texas and California. for Texas there are two polls — a fairly recent KTVT-CBS 11 poll from January 26, and a UT/Texas Tribune poll from November 8. The former is fairly recent but low quality, while the latter is ok quality (Yougov) but is from a long time ago. On the other hand, the only poll available in California is a high quality Field poll from January 3. Even though there are more polls from Texas, and even though one of those is substantially more recent than the California Field poll, the model places greater overall weight on the state poll component for California (43%) than for Texas (34%) because of the quality difference.
- In the special case of Vermont, I cheated. There are no polls since November 1 from Vermont, but there is a poll from September showing Sanders up 65-14 over Clinton. I included this, and over-weighted it. The rationale for this is that this is the only state where the voters already know Sanders as well as Clinton; here the race will not have followed the national trajectory. So although this is out of date, I decided to include it and over-weight it.
- Congressional District level projections are also adjusted for state polls as well as for national polls and demographics.
In this way, the model attempts to take all of the available data and to put it together in a sensible way to make forecasts.