All Recent Stories Staff Community Trending Elections From Markos' Desk Comics Community Groups Community Spotlight Actions Civiqs Make a Donation

Help Desk Jobs Work With Us Advertising Overview

Poll Data Analysis: The Current State of the Democratic Primary

by MattTX

Community

(This content is not subject to review by Daily Kos staff prior to publication.)

Monday, Jan. 25, 2016 Monday, Jan. 25, 2016 at 12:11:51pm PST

The approximate current state of the national Democratic primary, based on current national polling averages and demographic averages from national polls since November, extrapolated to the state level using 2008 Democratic primary exit polls

This is a continuing part of an ongoing series using polling data, past exit poll data, census data, and other data sources to analyze the 2016 Democratic Primary.

Previous posts are:

Poll Meta-Analysis: The Bernie and Hillary 2016 Coalitions, and how they compare to 2008 Obama/HRC

In the previous post, we systematically compiled all available national polling crosstab data on the demographic categories of region, party identification, race, gender, age, ideology, education, income, and neighborhood type. We then took averages of how Bernie Sanders and Hillary Clinton have fared within different demographic subgroups, to get a better, more systematic, and more data-driven idea of who — at least so far — comprises the Bernie Sanders 2016 coalition, and who — at least so far — comprises the Hillary Clinton 2016 coalition.

The dominant media narrative about the Bernie ‘16 and Hillary ‘16 coalitions seems to be boiled down to a simple soundbite: (that Hillary ‘16 = Hillary ‘08 + African Americans and that Bernie ‘16 = Obama ‘08 - African Americans). The primary conclusion of our analysis was that this soundbite is unsupported by the consensus of publicly available national polling data.

Instead, the reality is more complex. Both Hillary Clinton in 2016 and Bernie Sanders in 2016 are drawing different portions of their support both from voters . In particular, Sanders’ support among white voters appears to be less concentrated among higher income, higher educated whites than was the case for Obama — he has greater relative appeal to working class white voters who supported Hillary Clinton in 2008. Likewise, though the evidence is less strong for this, Clinton appears to have a bit less of a solid hold over Hispanic voters than was the case for her in 2008. Finally, while Clinton is performing very well among African American voters, she is not achieving the same sorts of overwhelming 90-10 margins that Obama was able to achieve against her in 2008. And vote margins and vote shares, not “winning” any one particular group of voters, are what determine the outcome of an election.

So although the Sanders 2016 Democratic Primary coalition has some similarities to the Obama 2008 Democratic Primary coalition, it also has some important differences. And the same is true for the Hillary 2016 coalition in comparison to the Hillary 2008 coalition.

In response to the previous post, commenter snowman3 pointed out, quite correctly that:

The subsamples for some of these polls are really small and I do not feel comfortable drawing such sweeping conclusions about such specific demographic groups. I think state level polls, especially when we’re beyond IA and NH will be much more instructive.

It is true that that crosstab subsamples in polls can be small. That is the purpose of averaging all available polls. The point is not that this will be perfectly accurate and will eliminate all error. Rather, the point is that it is likely to be significantly more accurate than looking at crosstabs from individual polls in isolation, and that it significantly reduces measurement error.

State polls are obviously useful data points, and are worth incorporating. Time allowing, I intend to do that. However, there are very few quality state polls available in most states at the moment, and most states do not have up to date polls. There are also various other things that could be done to improve the analysis — all matters of time and effort.

Modeling the current state of the race

This is going to get a bit technical (I will try to keep it intuitive), so if you don’t care about the methodology, just skip over this section and the methodological notes to the first chart. I know this post (and the previous one) is a lot to swallow; hopefully it will get better after the first few posts get through methodological issues and the like. But for now, I need to explain what I am doing, rather than just throw numbers at you, dear reader.

From the previous post, we have an idea of the relative strengths and weaknesses of Sanders and Clinton among demographic groups, in comparison to their overall support. But this is based on polling data stretching back to November — and so the absolute levels of Clinton’s and Sanders’ support do not reflect current national polling averages (showing Sanders at about 38%, and Clinton at about 51%).

So how can we convert that into something that does reflect the current state of the race?

Uniform Swing

One simple way to do so is to apply a so-called “uniform swing.” If Sanders is at 33% (29% among Democrats and 43% among Independents) in our poll average, and Clinton is at 53% (57% among Democrats and 40% among independents), and we want to convert that to a 38%-51% race, just add 5% to all of Sanders’ numbers and subtract 2% from all of Clinton’s numbers. Then we would be at Sanders 38% (34% among Democrats and 48% among Independents)

As a simple rule of thumb, applying a “uniform” swing is just fine. In particular, that works well when two candidate’s coalitions are not very different demographically. However, when that is not the case, applying a “uniform” swing is probably not the best way to go about it — and a “uniform” swing is not actually really uniform.

The Problem with “Uniform” Swing: It is not uniform on the individual level

The problem has to do with the mathematical definition of what a percentage is — a ratio of a numerator to a denominator. To understand why, think about it on the level of an individual voter, using an example from a general election. I am picking this example because it is easier to see the problem in cases of extreme polarization. Suppose Obama in 2008 or 2012 starts out at 40% among White voters and 95% among African Americans (let’s say 50% overall), and he improves to 53% overall. With a uniform swing, that would mean that he went from 40 to 43 among Whites and 95 to 98 among African Americans. But at the start, 60% of White voters did not support him, while only 5% of African Americans did not. That means that with a “uniform” swing, he would have to win over only 3 out of every 60 whites who did not already support him, but would have to win over 3 out of every 5 African Americans who did not already support him. In other words, if you were an undecided or non-Obama-supporting African American, with a “uniform” swing you would not be just a bit more likely to switch to Obama than if you were a white non-Obama supporter (which is plausible), but 12 times more likely to switch to Obama (which is far less plausible).

Instead, a truly uniform swing on the level of an individual voter would involve all voters who are not supporting a candidate have an equal probability of switching their support.

That would be a truly uniform swing. When one models a swing in that way, it becomes non-linear as expressed in percentages (whereas a so-called “uniform” swing is linear as expressed in percentages, which is what is problematic about it since percentages are mathematical proportions).

“Least Resistance” Swing

But assigning equal probabilities of switching support to all voters is not the best way to go about modeling a swing either, or at least I don’t think it is.

So instead, I am using something a bit more complicated, which I will refer to as “least resistance” swing. What this does is to make the chance that an individual voter will switch their support to another candidate (for example, that a voter will switch from Clinton or undecided to Sanders) proportional to the chance that a voter with the same demographic characteristics would already support that candidate (Sanders). If Demographic markers are reasonably stable correlates of political opinions, this makes sense — it is simply saying that there is something behaviorally correlated with certain demographic characteristics (for example, being younger, or self-identifying as an Independent) that makes one relatively more likely to support Sanders, and that there are is something else behaviorally correlated other demographics (for example, being older or being a self-identified Democrat) that makes one more likely to support Clinton. And it says that those behavioral factors, which have been important determining candidate support so far, will continue to be important in the same sorts of proportions as the overall levels of support each candidate enjoys in the national race shifts.

The idea behind this is that if a candidate is going to increase their overall support, the “low hanging fruit” will tend to consist of people who are demographically similar to a candidate’s current supporters. However, since by definition a candidate is already doing relatively well among demographics that are more favorable to the candidate, a greater proportion of non-supporters are members of less favorable demographics. So the fruit among more favorable demographics may be more ripe for the picking, while the fruit among less favorable demographics is more abundant (at least relatively). These two factors have opposing effects, which means that when their net effect is expressed in percentage terms, there will be a (somewhat) nonlinear relationship between vote percentage overall and vote percentage in individual demographic subgroups. For those who know some (classical) statistics, this is the same sort of idea as why if you are estimating a model of probabilities, you should use a logit or probit model rather than a ordinary linear regression, or why if your dependent variable is a percentage, you should apply the logit transformation.

As long as there are no major structural shifts in the campaign — absent a significant change in the relative appeal of Sanders and Clinton to different demographic groups — it seems to me, at least, to make sense to model it this way.

However, in 2008, following the Iowa caucuses, there was a structural shift in the race — African Americans swung en masse to Obama, far out of proportion with their prior probability of supporting Obama. If Sanders does well in Iowa and in New Hampshire, he’ll likely get some sort of national bump (see this excellent work from fladem). To some degree, that will likely involve some sort of structural shift in the race — but it is difficult to tell just how much of one in advance.

If there is a structural shift, its — erm, structure — will likely be different than in 2008. More on all that later. But for the moment, we are looking at what we should expect a swing to look like in the absence of a structural shift, or else if a potential structural shift is relatively minor (in comparison to 2008 for Obama) in its effect on the internal demographic composition of Clinton’s and Sanders’ support.

Other Methodological Notes

Martin O’Malley — With apologies to Martin O’Malley, who probably deserves better than the treatment he has gotten from the DNC and the media, I am ignoring his candidacy and am focusing on Clinton and Sanders to simplify things. However, he could be an important factor in Iowa, because who his supporters support as a second choice could make a difference. If he remains in the race long enough, he could also potentially score a reasonable vote total in certain states such as Oklahoma (from conservative Democrats who may be reluctant to support either Sanders or Clinton), in a similar way to John Edwards in 2008.
Independent effects of demographics — In order to formulate these projections, I am assuming that the effects of different demographics on voting behavior are uniform and independent. For example, if a voter is a Moderate White Southerner, I am assuming that the effect of being “moderate” is the same as for a voter who is a Moderate African American Midwesterner. Likewise, I am assuming that, at least for the purposes of candidate support, being a woman has an effect which is independent of being, say, Hispanic. However, in reality there are aspects of being a Hispanic woman that cannot be understood just by considering being a woman and being Hispanic in isolation. In reality, each demographic group is heterogeneous. People are different on the individual level in ways that cannot be entirely explained by isolated demographic characteristics. Demographics can serve as good, relatively stable, and predictive proxies for underlying political behaviors — like which candidate an individual is likely to support and how likely the individual is to vote, but they are not perfect predictors.
Accuracy of exit polls — One of the dirty little secrets of American elections is that exit polls (and entrance polls, in the case of caucuses) are not perfectly accurate. To some degree this is not as much of an issue here, since I am using vote shares from exit polls and am not at all using exit polls’ estimates of candidate support. However, sometimes people think that exit polls are more accurate than regular polls because they don’t have a mathematical “margin of error” like normal polls do. However, exit polls do not always obtain a truly representative sample of the electorate. To conduct exit polls, exit poll companies pick a relatively small number of precincts which they think, based on their models, are likely to be representative of the electorate. Sometimes this is not the case — in particular, this can be a problem with Hispanic voters. Exit polls also often ignore absentee ballots entirely. When exit polls do not match the results of an election, the networks (or exit polling company) actually goes back and changes the exit poll after the fact in order to better fit the results. If you want to see how that works, when the entrance polls for the Iowa caucuses first come out, save them in a browser tab or paste them into your nearest word processer. Then come back after a few hours, or the day after, and compare. The Clinton campaign and the Sanders campaign both have better data, but exit polls are pretty much the best publicly available and easy to use data source, and so I am taking them as is (at least for the moment).
Accuracy of national polls — Obviously, I am assuming that national polls are accurate (unbiased) measures of actual candidate support on average, and that their crosstabs are roughly accurate (not individually, which they are not — but at least on average over a fairly large number of polls).
Relative demographic stability of the Sanders 2016 and Hillary 2016 coalitions — As I mentioned somewhat in the previous post, I am assuming that there have not been large, unpredictable changes or structural breaks (at least from November until the present) in the demographic composition of Bernie Sanders’ coalition and of Hillary Clinton’s coalition. Strictly speaking, that will never be true — but it is a matter of degree as to how good of an approximation it is to model the Sanders and Hillary coalitions as such. In 2008, there were fairly stable, though not perfectly stable, demographic markers that predicted candidate support for either Barack Obama or Hillary Clinton. This made it possible to predict, using the results from early voting states, the results in later voting states. However, what I am trying to do here is actually a bit more difficult — I am trying to project what will happen before any voting has actually occurred using polls as a data source, rather than trying to project future voting using the results of previous voting in other states in the same primary as a data source. As actual voting occurs, and if I continue working on this, I plan to incorporate that information into my model, time allowing.

So, to reiterate what I said above in response to snowman3, the claim I am making is not that this analysis and the numbers below are perfectly accurate. Rather, it is that they are probably better than anything else you are going to find publicly available. The only groups of people who have a better picture of the national race are most likely the Clinton and Sanders campaigns, and probably also various progressive organizations and unions such as the AFL-CIO and SEIU. And the reasons why they have a better grasp on what is going on is that they have more and better data, including individual level data.

State polls are indeed a good data source, and one should certainly consider those when projecting results. Indeed, if I keep on working on this project (which is most likely if the primary turns out to be competitive nationally), then I will almost certainly incorporate state polls directly when I have the time to do so. However, outside of a few states (Iowa, New Hampshire, and South Carolina), there are precious few state polls at the moment. In the overwhelming majority of states, there are at most a few state polls, many of which are from many months ago, and which are often from low quality pollsters. The demographics/national poll/exit poll-based model I am presenting here provides a way to help fill in the gaps, provides a measuring stick against which state polls can be compared as they trickle out. I do not claim that it cannot be improved, or that it is the final word — just that it is significantly more worth paying attention to than to individual polls and non-data-driven intuition.

With that, let’s start taking a look at what the numbers look like.

Vote by Race

Currently, the race is quite close nationally among White voters — with Clinton up by about 2%. Clinton has a large lead (about 45%) among African American voters. However, Sanders stands nationally at about 20% support from African Americans. Obviously, it would be better for him if that number were higher, but it is significant and important that this is higher than the 10% or so that Hillary Clinton received from African Americans. Moreover, if a national primary were held today (with undecideds split evenly between Sanders and Clinton), Sanders would be projected to get about 27% support from African Americans, with 73% going to Clinton. For Clinton, what is important is not “winning” a particular subgroup, such as African Americans, but rather it is a question of vote margins and of vote shares. For this, and many other reasons, simplistic analogies to 2008 are not supported by the data and will not hold unless polls are systematically mistaken.

With Hispanics Clinton currently has about a 21 point lead (54% to 33%). If a national primary were held today, with national poll averages standing at 38% for Sanders and 51% for Clinton, we should expect that Sanders would get about 40% of the Hispanic vote, while Clinton would take about 60%. That is already better for Sanders than Obama was able to achieve against Clinton among Hispanic voters in 2008 (who Clinton tended to win by about 2 to 1). However, as mentioned previously, the data on the Hispanic vote is relatively weaker and more uncertain than for the African American vote.

For “other” race, data is very limited and unreliable. For this group, which includes multi-racial individuals, Native Americans, and Asian Americans, I am simply using the aggregate polling data on “non-whites.” In reality, there is likely to be significant heterogeneity in the “other race” vote.

Another important thing to note is that a greater proportion of minority voters (particularly African Americans) tend to be undecided than are White voters. To some degree, it may be that White voters are tending to pay more attention to the election than are minority voters, at least at this point.

Next, we can apply our “least resistance” swing model to project what the race would look like if Sanders were to surge to a 45-45 national dead heat against Clinton (perhaps if he gets a national bump from finishing close in Iowa and winning in New Hampshire). The "path of least resistance" that the model projects involves Sanders gaining a greater amount of support (7.5% in percentage terms) from white voters than from African Americans (5.2%) or from Hispanics (7.0%). At the same time, Clinton loses a bit less support from African Americans (5.7%) from African Americans than from Whites (6%). The share of undecided African Americans goes up slightly in this “swing of least resistance,” while it goes down slightly . These non-uniform effects arise from the nonlinearities and competing effects captured by the “least resistance” swing model. Here, again, we should note the differences between thinking in percentage terms on the aggregate level and thinking in terms of individual voters.

This seems fairly consistent with what we have seen so far as Sanders’ support has gone up over time — he has made inroads with African Americans and other minorities since the days when he was polling at under 10% with them, but more of his gains (in percentage terms) have come among white voters (and to a lesser extent also Hispanic voters). Essentially what we are doing here is assuming that if Sanders picks up support, he will pick up support in a similar sort of way (demographically speaking) that he has picked up support so far. Likewise, that Clinton would lose relatively less support among African Americans than among White voters is also consistent with what we have seen so far. However, due to the nonlinearities involved, as Sanders increases his national support, it mathematically has to be the case that at some point the rate at which his African American vote percentage goes up begins to increase at a faster and faster rate. In other words, part of the idea that African Americans constitute a “firewall” for Clinton may arise from confusion in relating aggregate percentages to individual level voter preferences.

Vote by Education

There is very little difference in candidate support by education. This contradictions the common presumption that Sanders is doing particularly well among college educated voters. That is simply not true, at least according to the consensus of national polls. Below, we will also see again that the same is true for income — Sanders is not doing notably better among higher income voters than among lower income voters, and there is very little polarization in candidate support by income (as well as by education). The only way that the idea that Sanders is disproportionately getting support from higher educated, higher income voters can be correct is if national polls conducted since November are systematically wrong.

The biggest difference between the two groups is the portion that is undecided — almost 5% more of voters without college educations are undecided than voters with college educations. It should be noted that voters with “no college” (without a college degree) includes college students who have not yet graduated. However, the same was true in 2008. And nonetheless, in the 2008 primary, Obama fared significantly better among voters with college degrees than among voters without college degrees — so we can see that there is a difference here between 2008 and 2016.

There is a fairly normal sized gender gap to be found in the national polling data, similar to that in the 2008 Democratic primary. If a national primary were held today, the model predicts that Clinton would narrowly win men by about 3 points, while carrying women by about 20 points. To pull even nationally, Sanders would need to win men by about 10 points and cut Clinton’s margin among women to about 7 points.

There is very little difference in candidate preference by income to be found in the national polling data. Insofar as there is a difference, it has more to do with a slightly greater proportion of low income voters being undecided — perhaps because they may tend to be paying less attention to the race at this point. This is not to say that there might not be more of a difference between voters with particularly high and particularly low incomes — but just that there is not likely to be a particularly large difference between voters with family incomes greater than or less than $50,000.

There is a strong difference between the candidate preferences of younger voters (18-44) and older voters. If a national primary were held today, the model predicts that Sanders would win younger voters by about 11 points. However, Clinton would win older voters handily by about 28 points.

There is still more “low hanging fruit” in among younger voters than among older voters for Sanders (in percentage terms, which can be a bit misleading). However, if voters age 18-44 were disaggregated into voters age 18-29 and voters age 30-44, we would likely see that Sanders’ support among voters age 18-29 is high enough that more of the “low hanging fruit” (again, in percentage terms would be likely to come from voters age 30-44 than from voters age 18-29.

So in order to draw even nationally with Clinton, the model says that Sanders would need to get to about 59% support in the polls from voters age 18-44 and 37% from voters age 45+. In this case, a hypothetical national primary might end up with Clinton winning older voters by only 15 points, while Sanders would carry younger voters 62%-38%.

By ideology, the model projects a significant difference between liberals and moderates and conservatives. If a national primary were held today, we should expect that Clinton would win liberals by a fairly small margin (about 5 points), while she would win moderates and conservatives by about 20 points.

If the race were to narrow to a dead heat, the model projects that Sander’s “path of least resistance” would involve plucking a bit more “low hanging fruit” in the form of increased support from liberals, while also gaining among moderates and conservatives. To reach an even race, Sanders would need to win liberals by about 9 points, while losing moderates and conservatives by about 6 points.

By type of community (urban/suburban/rural), the model predicts that if a national primary were held today, there would only be a fairly small amount of polarization between urban, suburban, and rural areas. Undecideds are disproportionately concentrated in rural areas. In a hypothetical national dead heat, we should expect Sanders to narrowly win “urban” areas and to more solidly win “rural” areas, while Clinton would be expected to win suburban areas.

However, an important caveat is in order. Of all the demographic categories, the data for this one is the weakest. There are relatively few polls and pollsters that have crosstabs for urban/suburban/rural, as discussed previously. So of all the demographic categories we are looking at, this is probably the one we can be least confident in.

By Party Identification, there is a strong difference between self-identified “Democrats” and self-identified “Independents.” Currently, Sanders is already winning independents by about 11 points, while Clinton is ahead with Democrats by about 21 points. In order for Sanders to tie Clinton nationally, he would need to improve to about 41% with Democrats and 56% with independents in the polls. Then, with undecideds split evenly, between candidates, Sanders would need 46% from Democrats and 62% from independents. A greater portion of Independents are undecided than are Democrats.

Across all states in the 2008 Democratic primaries, independent voters usually made up somewhere between about 15% and 25% of the electorate on average (according to exit polls). This is the case even in states with “closed” primaries. It should be stressed that Party identification is not the same thing as party registration. Many voters may self-identify as Independents but be registered as Democrats (or vice versa). Also, many states do not have party registration in the first place.

Vote by Region

According to polls that provide regional crosstabs, significantly more voters are undecided in the South and Northeast, while significantly fewer are undecided in the West and Midwest. Perhaps that deviation is a bit less in reality, but that is what the national polls say. Clinton currently leads in all regions, but in some more than others. In a hypothetical national dead heat, the Northeast would be projected to be split fairly evenly, while Sanders would have small projected advantages in the West and Midwest.

The biggest story that emerges from the regional data, however, is that Clinton’s lead is strongly concentrated in one particular region — the South.

That concentration of her support will be relevant when we finally get down to the details of delegate allocations. What’s more, her support will tend to be particularly concentrated in a fairly small number of Congressional districts in the south — namely in African American voting rights act districts (many delegates are assigned by congressional district). On the one hand, that could potentially help Clinton, by allowing her to win large delegate hauls out of those districts (which tend to have more delegates than other districts in the south). On the other hand, if Clinton fails in 2016 to receive support from African Americans by the overwhelming margins that Obama achieved in 2008, her delegate hauls may be correspondingly smaller. An additional wrinkle is that many of these districts not only have large numbers of African Americans, but are also home to a disproportionate share of the white liberals in some southern states. In Obama’s case in 2008, strong support from African Americans and from white liberals tended to work together and to reinforce each other in these districts, allowing him to win very large margins (and large delegate hauls). As one example, Obama won the Atlanta area 4th and 5th Congressional districts by 82,850 - 21,441 and 84,270 - 29,396 margins in 2008. That was enough to win overwhelming 5-1 and 5-2 delegate splits. But if Clinton’s support from African American voters in 2016 is even a bit less overwhelming than was Obama’s was in 2008, and if she fares worse among white liberals, then that is a task she will be hard pressed to replicate. The concentration of African American voters in a few districts also means that other districts will tend to have proportionally fewer African American voters. Depending on whether Sanders’ support is high enough to clear — or low enough to miss — an array of delegate hurdles, that could work out either to Clinton’s or to Sanders’ benefit in different districts. We will look more closely at the details of how this may work out in the future.

Vote by State — Methodological Notes

So how does all of this turn out on the state level? In which states should we expect Sanders to be relatively strong, and in which states should we expect Clinton to be relatively strong?

To start answering those questions, I projected all of the above down to the state level, using 2008 Democratic primary exit polls. The chart below shows what we should expect the results of a hypothetical national primary “held today,” if we extrapolate from the national demographic data above down to the state level, and *if* the electorate has the same demographic characteristics as in 2008, and making all of the assumptions listed above.

Of course, the electorate will not have the exact same demographic makeup as in 2008, but for the time being I thought it was better to simply use the 2008 exit polls rather than to adjust them in a way that would be arbitrary and contestable. That way you, dear reader, will know what you are looking at, and can make your own mental adjustments if you think that turnout will substantially differ from in the 2008 Democratic primary.

Other important things to note:

Model Adjustments — I included two sorts of adjustments to the basic demographic/polling/exit-poll model. In both cases, rather than assigning a uniform swing in percentage terms, I transferred a given share of of the voters who would have otherwise supported one candidate to the other candidate.
1. A home-state adjustment — For Vermont, 20% of people who would otherwise support Clinton are transferred to Sanders. For New York and Arkansas, 20% of people who would otherwise support Sanders are transferred to Clinton. From a 50-50 tie as a starting point, that amounts to a 10 point bump — which is either less than or more than 10, depending on the particulars.
2. A caucus adjustment — For caucus states, 10% of the votes that would otherwise go to Clinton are transferred to Sanders. From a 50-50 tie as a starting point, that amounts to a 5 point bump — which is either less than or more than 5, depending on the particulars. This is to reflect the presumption that, while the Clinton campaign is unlikely to be caught as flat-footed in caucus states as they were in 2008, holding a caucus rather than a primary will tend to shift the demographics of the electorate in favor of Sanders. No adjustment is made for Iowa or Nevada because entrance polls of the caucuses are available, and so they *already* reflect the ways in which the demographics of caucus participants may differ from the demographics of what the electorate would be if the state held a primary instead of a caucus. IA and NV are double starred ** to reflect this.
No 2008 Democratic primary exit polls for Caucus States — The media did not conduct exit polls (entrance polls) for caucus states other than Iowa and New Hampshire. These states are starred. * As a result, we cannot simply project the national demographic breakdown of support onto the state level 2008 Democratic primary exit poll demographics, since that 2008 exit poll demographic data does not exist. So as a stopgap measure, I projected onto rough casual estimates of what I thought the demographics would have been, using “similar” states as a basis. This is not ideal, but at least for the time being I think it is better than nothing. But view those states with a grain of salt. Perhaps later I will do something else better to solve this issue. In general most of these states are states that would be demographically favorable to Sanders even if they were had primaries rather than caucuses. Specifically:
1. AK, HI, and DC — demographics of the primary electorate are completely made up by me. In particular, Hawaii should be treated with skepticism, as it is demographically so different from the rest of the U.S. as to make extrapolation from national polls almost useless. The same is true, to a lesser extent, of Alaska.
2. CO — the demographics are the average of the exit poll demographics of New Mexico and Oregon.
3. ID — the same exact demographics are used as from Montana exit polls.
4. KS — an average of South Dakota and Missouri exit polls.
5. ME — an average of Vermont and New Hampshire exit polls.
6. MN — Wisconsin exit polls are used.
7. NE — 25% Iowa exit polls, 75% South Dakota exit polls.
8. ND — South Dakota exit polls are used.
9. WA — ¼ California and ¾ Oregon exit polls are used.
10. WY — an average of Montana and Colorado exit polls.
Regions — In some cases, the assignment of states to regions is arbitrary. For example, I assigned West Virginia to the “Midwest” region; it could conceivably have been placed in the “South” instead. If it were, that would swing it towards Clinton. So if you think that one state is better placed in a different region than the one I placed it in, you can take that into account. For example, to some degree parts of Florida are similar to the Northeast; if you think this is important, that would effectively shift part of the state from the pro-Clinton South to the Northeast, which would tend to hurt Clinton and help Sanders. So judge for yourself, and make your own mental adjustments to the projections, if you think they are needed.

The “Region” column lists the region to which a state is assigned. The “Adj. Type” column lists any adjustments that are made to the pure poll/demographic/exit-poll model. The “Magnitude” column lists the size of those adjustments. If you want to know what the numbers would look like without those adjustments, simply subtract the adjustment from the candidate it is given to and add it to the other candidate. For example, in Idaho, you would subtract 4.1 from Sanders’ 52.3, and add the 4.1 to Clinton’s 37.3. So without the adjustment, Idaho would be 48.2% Sanders — 41.4% Clinton.

Next, the first three Sanders/Clinton/Undecided columns list the projection before assigning undecided voters. These reflect how we would expect the averages of a large number of good quality state polls to come out as, if the race were 38% Sanders — 51% Clinton nationally.

All the way in the two rightmost columns, the estimated vote shares are shown splitting the final undecideds evenly between Sanders and Clinton.

Vote by State — Estimate with Current National Poll Average (Roughly 38% Sanders — 51% Clinton)

Above, we have our estimate of what the results of a national primary held today would be, given that national polls are averaging about 38% for Sanders and 51% for Clinton at the moment. States are sorted in order of vote shares for Sanders and Clinton, so we can see the states which are most favorable to Sanders, and the states which are most favorable to Clinton.

Encouragingly, the predictions seem to match the current results of available state polls fairly well, even though (at least for the time being) it does not take state polls into account at all. For most states, there are no recent high quality polls; this provides an estimate of what high quality polls *would* say the current state of the race was in all states, if all states were polled as frequently as Iowa and New Hampshire are at the moment. Note though that for caucus states, you should probably take out the caucus adjustments if you want to know what state polls would say at the moment (unless they had a good methodology, similar to Selzer in Iowa, to poll specifically for caucuses).

In early states:

In Iowa, it has Clinton ahead very narrowly, by 46.3% to 45.2%, or 50.5%-49.5% with undecideds allocated.
In New Hampshire, it pegs the race at 49.8% for Sanders to 40.1% for Clinton (54.6%-45.4% with undecideds allocated evenly).
In Nevada, the model has Clinton ahead 52.0%-39.3%, or 56.4%-43.6% with undecideds allocated.
In South Carolina, the model has Clinton ahead 57.4%-28.4%, or 64.5%-35.5% with undecideds allocated.

You can judge for yourself, but to me it seems to be about what I would expect, and to generally be in the same sort of range as we have seen from state polls.

Vote by State — What if Sanders Surges to a Dead Heat in the National Polls?

Finally, if Sanders were to surge in the national polls (perhaps following a close finish in Iowa and a win in New Hampshire), and the race were to turn into a dead heat nationally, how should we expect that to play out on the state level?

And from this, we can begin to see the outlines of the most plausible path to victory for Sanders. And we can see the minimum that Clinton needs to do to hold on and win the Democratic nomination. Generally speaking, the minimum that Clinton needs to win the Democratic nomination is to hold on to the states in which she is projected here to be above 50. And generally speaking, the minimum that Sanders needs to win the Democratic nomination is to win the states in which he is projected here to be above 50. However, it is more complicated than that — it depends on vote margins and delegates.

Roughly, the states that are close to 50-50 here should be approximate national bellweathers. Interestingly, Nevada is projected to be very close to a national bellweather, given the makeup of Clinton’s and Sanders’ coalitions so far (though that does depend on the accuracy of the Hispanic data).

You know what they say about history — it does not repeat, but it does rhyme. And our goal is to try to use the available data to pin down its meter. 2016 will not be the same as 2008.

But precisely because history rhymes rather than repeating, if Sanders does succeed in making a truly competitive national race of it — and if he manages to win the Democratic nomination — the path of least resistance is not likely to be the same as the path taken by Obama in 2008. If Sanders wins, he will do so by losing some states, and some voters, that Obama won, while winning some states, and winning some voters (and winning some new voters) that Obama didn’t have in 2008. But just as importantly to winning some states that Obama lost, the path to victory for Sanders involves losing some states that Obama lost by lesser margins. And it involves winning some states that Obama won by larger margins than those achieved by Obama. The Democratic primary will be won not by one candidate “winning” a particular state, but rather by combining vote margins across a variety of states to win a majority of the democratically elected delegates. Likewise, the path to victory for Clinton now is different from what she would have needed to do in 2008. Her coalition is different than it was in 2008, and she is a different candidate, emphasizing different issues, with different appeal to different voters.

In future posts, I plan to update, improve, and expand this analysis, and to update it to incorporate new data (such as Iowa and New Hampshire results). And I intend to project down to the Congressional district level, to obtain realistic delegate projections (that is mostly done). With that, we will be able to get a better idea of what exactly it is that Bernie Sanders needs to do to win, and what Hillary Clinton needs to do (at bare minimum) to hold on to her current, somewhat perilous, national lead.