Markos Moulitsas writes the following in a recent post entitled “The Iowa electorate was overwhelmingly white but among those who weren't, Clinton won handily”:
From the Iowa entrance polls:
The non-white sample was 150 out of 1,660, so while unfortunately small, the 24-point gap is still outside the margin of error. That sample size was also too small to break out African Americans, Latinos, and Asians, so we don’t have granularity. Also unfortunately, New Hampshire won’t provide greater insight next week, so this is all we have to work with. (I included Martin O’Malley because half of his meager support came from non-whites, probably Latinos happy with his strong defense of immigrant rights.)
In fact, this is not all we have to work with.
In fact, we can say a significant amount about how non-whites voted in Iowa.
We can’t say it with complete 100% confidence, but we can say it backed by some reasonably good evidence:
- Clinton likely (though surprisingly, we can’t actually say this with as much confidence as you might think) won non-white voters in Iowa. However, Bernie Sanders made significant inroads with them.
Different sub-groups of “non-white” voters voted differently — some voted more strongly for Clinton, others less strongly for Clinton, other sub-groups were competitive, and at least one sub-group of non-white voters voted strongly for Sanders.
There is strong reason to believe that Bernie Sanders may actually have won the Hispanic vote in Iowa outright, possibly even by as much as 2 to 1 (though I would not bet the farm on it). If Sanders did not win Hispanics outright (which is possible), Clinton and Sanders at least ran quite competitively with Hispanics in Iowa.
- Specifically, at minimum, it is fairly unlikely that Hispanics voted by anything more than around 55-45 for Clinton, and it is more likely than not that Hispanics voted for Sanders — possibly by a strong margin.
Clinton won strong support from African Americans — but substantially less strong that the support won by Obama in most 2008 primaries.
- Specifically, Clinton won African Americans by somewhere around 80-20 or 75-25
- Insofar as it is measurable, Bernie Sanders did very well indeed with Native Americans — Sanders may have done as well with Native Americans as Clinton did with African Americans.
- Specifically, Sanders won Iowa's one reservation 83-17.
- Insofar as it is measurable (we can only say anything with quite low confidence in this case), The Asian American vote in Iowa was probably not lopsided in either direction.
- Specifically, it was probably within a range of 60-40 or so either way.
- The weight of the evidence, including both Iowa results and national polls, does not support the idea that non-white voters are a homogeneous group that is likely to back Clinton as a homogeneous bloc.
As for the issue of how Sanders fared with Hispanics in Iowa, here’s a key paragraph from Tuesday’s NY Times article about the reaction to IA:
What’s more, Mr. Sanders showed strength in unexpected ways that could signal trouble for Mrs. Clinton, performing surprisingly well in rural counties and small caucus precincts, and even making some gains among Hispanic Democrats, his advisers said on Tuesday morning.
Though exit polls show Hillary Clinton winning 58% of the minority vote, Ucles said he is going to further analyze county returns, because Sen. Bernie Sanders won 15 of the largest 20 counties where Latinos live, suggesting they may have played a role in the razor-thin Clinton victory, which because of delegate allocation, Sanders billed as a virtual tie.
In fact, there is much more to back this up than just a few selections from news article. I will demonstrate all of this further below.
Iowa’s Non-White Population
Although Iowa is a predominantly white state, it does have enough minority voters so that we can begin to get some idea of how minority voters caucused, and what that may portend for future states. Iowa has significant concentrations of minority voters in particular neighborhoods of a number cities, including Des Moines, Waterloo, Sioux City, and Davenport. It also has a Native American reservation, and there are some rural areas and small towns in Iowa that also have significant minority (mostly Hispanic) populations. How did Iowa's minority population vote?
Let’s start by just looking at the data in a straightforward, easy to understand way. Then we will gradually move to more sophisticated but harder to understand ways to see what this data can tell us. We will see that it can actually tell us a pretty good amount.
I collected results and demographic data for all of the usable precincts find that had a significant minority population in Iowa (greater than roughly 30% of the “Voting Age Population, or “VAP”). Not all such precincts are usable, for the reason that you have to match between two different sets of precincts and two different data sources. I matched between results from the Iowa Democratic Party for up-to-date precincts and Census data on the VAP from Dave’s Redistricting App for slightly-out-of-date precincts (local governments re-draw precinct boundaries from time to time). In 3 cases (starred), I could make a match by combining precincts from either of the two sources. There are a few other cases where the two sets of precinct lines are too mixed up to make a match, even by combining precincts from one source or the other, which I couldn’t include.
After collecting this data, we can in fact find results for a pretty good amount of precincts in Iowa with significant minority populations. Note that I did not include any college campus precincts (which do tend to have relatively higher, though not necessarily high, non-white populations). In fact, most of these precincts are pretty far away from Iowa’s Universities:
First, what can we say about these precincts in general?
OK, so a lot of that seems to pretty blatantly contradict the conventional wisdom. But it is there in the data. Just look for yourself.
Before we move on, let's look briefly at the Native American vote in Iowa. I am not including this in the rest of my analysis because there is really only a single precinct to consider.
The Native American Vote:
The Sac and Fox Meskawaki Settlement (a Native American reservation) is located in Tama County, Iowa. The "Indian Settlement" precinct in Tama County voted 83.3% for Sanders and 16.7% for Clinton. While this is only one data point, I am pretty sure this is the only concentration of Native Americans in Iowa. In 2008, Obama performed very well in Reservations across the west — this is a preliminary sign that Native Americans may support Sanders in large numbers. That is a good sign for Sanders in many western states where Native Americans are an important part of the Democratic base, and possibly also in Oklahoma (though it is quite possible that Native Americans there, who tend to be more integrated, will vote differently).
Correlations between non-white VAP and Sanders support:
Next, let's do something slightly more sophisticated, but still fairly easy to understand. Let's put the data into a scatter plot and see visually how things are correlated:
We can see that there is a positive, but fairly weak, correlation between the White voting age population and the share of the delegates that Sanders got. If a precinct had a higher white VAP, Sanders tended to do better.
We can see that there is a negative, and somewhat stronger, correlation between the African American voting age population and the share of the delegates that Sanders got. So If a precinct had a higher African American VAP, Clinton tended to do better.
We can see that there is a positive, but fairly weak, correlation between the Hispanic voting age population and the share of the delegates that Sanders got. If a precinct had a higher Hispanic VAP, Sanders tended to do a better.
We can see that there is not much of a correlation between the Asian voting age population and the share of delegates that Sanders got. The Asian population is too low to really be able to tell much clearly.
Note that I excluded the one precinct with a really large (70%) Asian population, since the fact that the Asian population is so much bigger makes it a substantial outlier. For what it's worth, that precinct split 50-50 between Sanders and Clinton. However, if you do include it you actually *do* get essentially the same line and the same correlation as if you exclude it.
Correlations between non-white electorate and Clinton and Sanders support:
However, the voting age population is not necessarily perfectly reflective of the electorate that showed up at the caucuses, for a variety of reasons. These include:
Not everyone in these precincts is a Democrat, (likely to caucus in the Dem caucus), and disproportionate numbers of whites relative to non-whites are not Democrats. Still, there are some white Republicans and GOP-leaning Independents who would have caucused in the GOP caucus (if anything), and a higher proportion of whites in these precincts are Republicans than are non-whites.
Not everyone in these precincts is a citizen eligible to vote. Hispanics and Asian Americans tend to be more likely to be non-citizens than are Whites or African Americans.
- Even controlling for citizenship, Asians and Hispanics tend to vote at significantly lower rates than Whites and African Americans.
If we take these sorts of factors into account, we can come up with a more realistic picture of the racial makeup of the Democratic electorate in these precincts. So does that change things?
Not really very much.
I’ll explain in more detail the steps I took to come up with this more realistic electorate, but for the moment just look at the charts:
Not too much change, but the correlation becomes a bit stronger.
Not too much change, but the correlation becomes a bit weaker.
Not too much change, but the correlation becomes mildly stronger.
Modelling this more formally
Next, we are going to ramp up the technical requirements and use a generalized linear model (GLM) to model this a bit more formally statistically. For readers who have some statistics background, the reason why we are using this rather than ordinary least squares is because we are modeling in terms of percentages (proportions). To simplify and facilitate analysis a bit, I am removing O'Malley, so that we are only comparing Clinton and Sanders. I split his delegates evenly between Clinton and Sanders. I also am getting rid of the “other race" category (in effect assuming that nobody of “other race” votes in each precinct, or alternatively that they vote the same as everyone else in each precinct), again to simplify.
Using VAP as the electorate:
To start with, we will model what it would look like if the Democratic caucus electorate had the same racial makeup as the Voting Age Population in each precinct. Then we’ll try to get more realistic. But first, here is some regression output for the VAP as the electorate:
Now what exactly does this mean? Skip the part that is indented if you don’t care about the technicalities and want to just jump to the conclusion.
The dy/dx is showing the marginal effect of an increase in the share of the voting age population that is White, Black, or Hispanic:
- Increasing the White VAP share by 1% increases Sanders’ vote (delegate) share by about .2%.
- Increasing the Black VAP share by 1% increases Clinton’s vote (delegate) share by about .59%.
- Increasing the Hispanic VAP share by 1% increases Sander’s vote (delegate) share by about .59%.
Note that when I say “increase the White VAP share” a 1% increase, I do not mean adding 50% + 1% = 51%. Instead, I mean multiplying 50% * 1.01 = 50.5%. This is non-linear because we are dealing with percentages/proportions, which can be confusing and can make it substantially more difficult to see intuitively what is going on.
The 95% Confidence Interval means that we can be 95% sure that the effects of an increase in the VAP of that race lie within the interval specified.
The P>|z| means the probability that an increase in the VAP has a non-zero effect on Sanders' (or alternatively Clinton’s, since we are excluding O’Malley and looking at a two person race) vote.
- There is a 14% chance that increasing the White VAP does not have a positive effect on Sanders’ vote share.
- There is a 0% chance that increasing the Black VAP does not have a positive effect on Clinton’s vote share.
- There is a 42% chance that increasing the Hispanic VAP does not have a positive effect on Sanders' vote share.
An important note is that you need to be careful in thinking about “statistical significance” with this model. For a technical reason (multicolinearity), you have to exclude one racial group (Whites, African Americans, Hispanics, or Asians) from the model and treat it as a residual. As a result, all of the “positive effects” are really in comparison to the group that is dropped and treated as a residual — in this case Asian Americans. Depending on which one you drop, this can have an effect on the model's estimates. It is generally not too large, but it can knock an estimate either in or out of any arbitrary statistical significance cutoff. For example, if you drop African Americans instead of Asian Americans, then all the other variables become significant. You also need to pay attention to *what* exactly it is that is or is not statistically significant. What is or is not statistically significant is how African Americans, Whites, and Hispanics differ from Asians. But since Asians are pretty much, at least as far as we or the model knows, splitting their vote 50-50, you can (as some not quite strictly correct shorthand) think of the output as showing more or less how Sanders’ or Clinton's support is affected by an increase in the White, Black, or Hispanic VAP in comparison to 50-50.
So the conclusion of this is:
We can be very confident that African Americans voted for Clinton (though there is some uncertainty as to just how much was the margin).
We can be fairly confident that Whites in these precincts voted for Sanders. They most likely gave Sanders somewhere between a slight margin and a reasonably large margin in these precincts, though we can't rule out the possibility that they may have voted slightly for Clinton.
We cannot be overly confident that Hispanics voted for Clinton or for Sanders. On balance though, it is probably more likely that Hispanics voted for Sanders than for Clinton in Iowa, at least within these precincts.
Although the latter conclusion sounds like it would be bad (“we cannot be confident”), actually it is very good for Sanders. You can think of the “we cannot be confident” as meaning that we can’t statistically tell for sure whether Hispanics voted outright for Sanders, as opposed to splitting their vote between Sanders and Clinton.
What this does not mean is that we can't be confident that Hispanics did not vote heavily for Clinton. Phrased more clearly without the triple negative, we can be pretty confident that Hispanics did not support Clinton by a large margin, of the sort that she earned from Hispanics in the 2008 primary (when she won ) What we can say is that Hispanics in these precincts most likely either voted strongly for Sanders, weakly for Sanders, or weakly for Clinton.
Next, how well does this model fit the data?
We can see by comparing what the model predicts should be Sanders’ delegate share in each precinct with what his actual vote share is in each precinct. Doing that, we get:
Note that the R squared jumps up to .6 if we exclude the one particular outlier precinct (Des Moines Precinct 62). Hence, the model can explain a reasonable amount of the variation in the results, though not all of it.
Now, can we also say something more specific about how people voted — like an estimate of what % of the vote each racial group gave to Clinton and to Sanders? Yes, we can.
For this, I will use the technique of ecological inference. Although the technical details are complicated, this allows us to produce results that are more easily understandable in simple intuitive terms, putting a number on an estimate of how each racial group voted. If the electorate has the same racial makeup as the VAP, applying ecological inference predicts:
So as point estimates, we can estimate that if the electorate in these precincts demographically matched the voting age population perfectly, then:
- Sanders probably won White voters in these precincts by about 55-45.
- Clinton probably won African American voters in these precincts by about 81-19.
- Sanders probably won Hispanic voters by about 65-35.
- Asian voters were probably split more or less evenly between Sanders and Clinton.
However, while using the VAP is a good starting point, the VAP is not the same as the likely caucus electorate, for reasons that we explained above. If we take those reasons into account, does that change the results much? If so, how does it change them? Let's find out.
Narrowing the Electorate: STEP 1
Asian and Hispanic voters tend to turn out at lower rates than do African Americans, both for reasons of citizenship, and for other reasons. According to the Census bureau's CPS election supplement, voter turnout (as a share of the VAP, including non-citizens) in 2012 was:
- White: 63.0%
- Black 62.0%
Applying these turnout rates, we come up with an electorate that looks like this in our precincts:
So this has the effect of increasing the White and African American vote shares and reducing the Hispanic and Asian vote shares. We need to do more to get a more realistic electorate, but first let's see what difference this one change makes.
If we re-run our model, we now get this output:
What has changed? In English, and without explaining everything explicitly:
- We can be a bit more confident now that white voters voted for Sanders in these precincts, and they probably voted by Sanders for a bit more than just the VAP data would indicate.
- The positive effect of African Americans on Clinton’s vote share declined in magnitude. But we can still be extremely confident that Clinton won African Americans, and by a good margin.
- The positive effect of Hispanics on Sanders' vote share increased in magnitude. It is now significantly less likely that Sanders lost or more or less tied with Hispanics than was the case with just the regular VAP data not modified for turnout. So our confidence that Sanders won Hispanics outright in these precincts increases a bit after taking into account the fact that Hispanics tend to turn out at low rates compared to other voters.
How does that look on an easy-to-understand scatterplot? How well does our model fit the data now?
It's pretty much the same, overall (slightly worse, but again it improves if you take out the outlier).
Now, let's re-run our ecological inference to see how the estimated vote changes:
What happened? The estimated white vote for Sanders went up very slightly (not much), the estimated margin for Clinton among African Americans went down slightly to 80-20, and the estimated Sanders support from Hispanics went up slightly. The estimated support from Asians for Sanders went up, but we shouldn't make too much of that.
It may seem paradoxical that the Sanders support went up for all groups. How can that be? It is because taking into account turnout changed the composition of the electorate, so that there are more African Americans. Thus, in order for the model to match the actual results, it infers that Sanders did a bit better among all groups than with the VAP electorate.
Narrowing the Electorate: STEP 2
To improve the estimate further, we should take into account the fact that some of those voters are going to be Republicans and Republican-leaning independents. Not the types who will vote in a Democratic caucus. Moreover, the shares of Republicans differ by race. In general, more whites are Republicans. However, these precincts tend to be Democratic — often strongly Democratic, and whites who live in such precincts tend to be more likely to be Democrats as well. The same is also true of minority voters who live in these precincts. As a reasonable way to do this, we will assume:
- 60% of Whites in these precincts are Democrats or D leaning independents; 40% are Republicans or R leaning independents.
- 90% of African Americans are D; 10% R.
- 70% of Hispanics are D, 30% R.
- 65% of Asians are D, 35% R.
The potential caucus electorate then looks like this:
The effect of this is to bring the average racial composition of the potential Democratic caucus electorate in these precincts to down to 53.7% white, up to 27.6% African American, up to 14.3% Hispanic, and down to 4.4% Asian.
If we re-run our analysis again, what do we get?:
The magnitude of the effects increases (which tends to mutually offset to some degree), and there are some further shifts, but we remain in the same ballpark. On a scatterplot, it looks like this:
Again, that is similar to what we have seen before. The R^2 for the model's predictions in relation to the actual results is actually a bit lower with what should be a more realistic electorate; again, removing the outlier improves the fit (though it is still a bit lower).
Re-doing the ecological inference, the results with this potential democratic caucus electorate are:
We see that as we further narrow the electorate so that it is more realistic than the VAP, the support for Sanders inferred by the model again goes up among all groups. It goes up in particular among African Americans, up to 23%, while also increasing among Whites and Hispanics.
Narrowing the Electorate: STEP 3
Now, how confident should we be in this? One possibility is that maybe for some reason the model is mistaking whites for Hispanics. What if we assume that white voters in these precincts are 70% Democratic rather than 60%? I’ll just cut to the chase at this point. It doesn't change things by much:
Ecological inference estimates basically the same percentages, even if there were significantly more white Democrats. The only difference it makes is that Clinton does slightly better among African Americans.
As we have changed the composition of the electorate, the support percentages inferred by ecological inference have all increased for Sanders, but have all remained in the same ballpark. Moreover, the way in which the vote percentages inferred for Clinton and Sanders have changed as we have narrowed the electorate make sense in light of demographics. The predictions from ecological inference remain quite robust throughout. It seems that we can have a reasonable degree of confidence that the actual levels of support for Clinton and Sanders in these precincts probably lie in the same general range.
It might be the case that, to some degree, the model is mistaking. Maybe white voters in those precincts are more likely to support Sanders than white voters in other precincts. However, we don’t really have any evidence of this. And not just that, but they would need to be substantially more Democratic than we are already assuming them to be (which is pretty darn Democratic).
It is also possible, in theory, that Hispanic turnout was just so abysmally low that there were more or less zero Hispanic caucusgoers even in the most heavily Hispanic precincts. But that’s pretty unlikely as an explanation. First, we already have low Hispanic turnout relative to Whites and African Americans baked into this analysis as soon as we went past the basic VAP — and that did not change the results. Second, LULAC led a big effort to increase Hispanic/Latino turnout in the Iowa caucuses, setting a goal of getting 10,000 Latinos to vote. Although we don't know the exact number who actually caucused, they say that they got commit-to-caucus cards from 13,500 Latinos in Iowa (though at least some will have caucused in the GOP caucuses). Undoubtedly, both campaigns heavily targeted these voters as well. So if anything, maybe we should suspect that Hispanic turnout could have been relatively high. If that were the case, incidentally, it would be consistent with Sanders winning Hispanics by a closer margin that what ecological inference keeps telling us (2 to 1).
For some reason, it happens to be the case that Sanders systematically did better in precincts with more Hispanics than in precincts with fewer Hispanics. Ecological inference thinks that this reason is that Sanders won Hispanics in Iowa — and it is pretty insistent that it was by a pretty large margin. Occam's razor would lead to the basic conclusion that it is probably right that Sanders did substantially better among Hispanics in Iowa than anyone expected — even if we suspect that maybe the margin might have been less than the model is telling us (maybe it was more like 55-45, or 52-48, or even 50-50?).
Anyone who doesn't think this is true needs to look at the precinct results and explain how Sanders could have done as well as he did in these precincts without a reasonable amount of support from non-white voters - particularly from Hispanics, but also by not getting blown out with African Americans to the degree that Clinton would have liked.
So the best inference we can make from the data is that there is a strong probability that Sanders won Hispanics in Iowa (or at least in these precincts, insofar as the Hispanics there are representative of Hispanics elsewhere in Iowa). It is important to note that minority voters who live in areas which are less heavily non-white can tend to vote differently — so it could be the case that Clinton did better, or even much better, among Hispanics (and perhaps also African Americans) who live in precincts with fewer non-whites. On the other hand, it's also possible that Sanders did just as well (or even, in theory, better).
The best inference we can make about African Americans is that Hillary Clinton won African Americans in Iowa — probably by somewhere around 80-20 to 75-25. To be sure, that is good for Clinton and bad for Sanders. But it is not the sort of 90-10 margin achieved by Obama in 2008, and it is also less than the tremendous margin by which Sanders won millennial in Iowa. Sanders needs to continue to improve among African Americans — he probably needs at least 30 to 35 percent support from African Americans to win nationally. Though if he fares anywhere near as well among Hispanics as ecological inference thinks he did in Iowa, Sanders could potentially pull into a national lead even if Clinton continues to win African Americans 70-30.
Sanders’ support among non-white voters in Iowa was fairly closely in line with what we would expect based on national polls:
Sanders may have fared slightly worse than the 25.4% among African Americans that we might expect, though the 15% viability cutoff in Waterloo 4-1 makes it difficult to tell. But it seems likely that he fared better among Hispanics than the 38% we would expect. And he seems to have done quite well among Native Americans, though it is harder to tell about Asian Americans and people of more than one race.
If Sanders gets a bump in the national polls out of Iowa and New Hampshire, he could close to a national dead heat with Clinton by earning as little as 31% support from African Americans, 45% support from Hispanics, and 43.2% support from other non-whites. If he does better among Hispanics than that 45% threshold, the amount that he needs from African Americans correspondingly goes down. Sanders has made outreach to African Americans a priority — and he should continue to do so, and should ramp up those efforts. But the idea that Sanders needs to win African American voters in order to win the Democratic nomination, or to win South Carolina, is simply incorrect. All he needs to do (at minimum) is to cut Clinton's margin. More of course would be better for Sanders, but it is not strictly necessary any more than it is necessary that Clinton win over millenials (instead, all she needs to do is cut Sanders’ margin). Sanders can’t win by entirely whistling past Dixie, but he can win nationally even if Clinton wins the bulk of the south. All Sanders needs to do is to do well enough to pass certain delegate thresholds in the South and on Super Tuesday, and then to win delegates elsewhere in the country — in the West, in the Midwest, and in the Northeast. He is much closer to doing that than you think.
Heading into New Hampshire, keep your eyes peeled on Green's Grant to see how it votes. It is a town in the middle of the White Mountain National Forest which has a population of 1 (a park ranger or something?) who happens to be Asian, and who voted for Obama in 2008. OK, ok… that’s a joke. Though according to Dave’s redistricting app, it’s apparently true!
This is a continuing part of an ongoing series using polling data, past exit poll data, census data, and other data sources to analyze the 2016 Democratic Primary.
Previous posts are: