I have frequently been citing a analysis by Open Left diarist fladem about the empirical effects of an Iowa bounce on New Hampshire voting results. Fladem has concluded that there is between an 11-19 point bounce on New Hampshire results from the Iowa, depending on the respective finishes of the candidates.
I decided to take this one step further by using some of FlaDem's data in a more systematic way. Specifically, I'll be looking at the following three things with respect to their effects on the New Hampshire results, by means of a regression analysis.
- The final results from Iowa (1972-)
- New Hampshire polls immediately before Iowa (1980-)
- National polls immediately before Iowa (1972-)
Thanks to fladem, I have the national polling numbers from 1972 onward. I also have the New Hampshire polling numbers from 1980 onward (fladem has these numbers for 1984, 1988, and 2004, and I was able to find 1980 and 2000 polls on the Internet).
I will be excluding 1992 and 1996, when Tom Harkin and Bill Clinton, respectively, ran essentially uncontested in Iowa. I've also included certain minor, non-viable candidates from the analysis. The candidates that I included in the analysis are as follows:
1972 - Muskie, McGovern*, and Humphrey
1976 - Carter*, Bayh, Udall, Shriver and Humphrey
1980 - Carter* and Kennedy
1984 - Mondale*, Hart, Glenn and Jackson
1988 - Gephardt, Simon, Dukakis*, Gore and Jackson
1992 - none [Harkin uncontested]
1996 - none [Clinton uncontested]
2000 - Gore and Bradley
2004 - Kerry*, Edwards, Dean, Gephardt, Lieberman and Clark
A candidate that dropped out after Iowa receives a score of 0 for New Hampshire. Likewise a candidate that did not compete in Iowa -- there are only a couple of these -- also receives a 0 for that state.
Model 1: National Polls and Iowa Results (1972-2004)
The first analysis I did was to look at only national polls and Iowa results in predicting the outcome of New Hampshire. This allows me to look at the data from 1972 onward.
What did the analysis find? National Polls are completely meaningless in predicting the results of New Hampshire, once we know the results from Iowa. In fact, the coefficient on the national polling data is slightly negative, although not in any statistically significant way.
This is an extremely robust result. For example, if we look only at the results from 1980 onward, or 1988 onward, or 2000 onward, we still find that the national polls have no predictive effect once we know the results of Iowa.
This does not mean that the national polls have no relationship with the New Hampshire results. The two things are somewhat correlated. However, they are of no predictive benefit in predicting New Hampshire, once we know the results of Iowa. Correlation does not equal causation, in other words.
Another important finding, by the way, is that the rank in Iowa does not appear to mean anything for New Hampshire. It's the percentages that count.
Model 2: New Hampshire Polls and Iowa Results (1980-2004)
Now we'll look at pre-Iowa New Hampshire polls, as well as Iowa results, for all years from 1980 onward. Both variables are highly statistically significant.
If the only things we know are the (pre-Iowa) New Hampshire polls and the Iowa results, we would basically just want to take a weighted average of these things to predict the outcome in New Hampshire, favoring the polls over the Iowa results in roughly a 10:7 ratio.
Model 3: New Hampshire Polls, National Polls and Iowa Results (1980-2004)
Just for fun, let's see if the national polls begin to mean anything if we also know the New Hampshire polls. It turns out that if we look at all three variables -- New Hampshire polls, National polls, and Iowa results -- the National polls do have a statistically significant predictive impact, but that impact is negative. In other words, holding New Hampshire polls and Iowa results constant, we would tend to lower our prediction for a candidate's results in New Hampshire for each point he has in the national polls.
How can this be? It seems to be highly counterintuitive.
Well, here's the explanation. New Hampshire polls are a leading indicator to national polls. Voters are more engaged and more informed in New Hampshire than they are nationally. As the primary season progresses, voters continue to become better engaged and informed, until the actual voting takes place, when the voters are presumably as informed as they ever will be.
In other words, if a candidate is doing better in New Hampshire polls than he is in national polls, that suggests that as voters become more informed, they will continue to slide toward that candidate. At so the candidate will do well in the voting booth, at which point all voters are highly informed (relatively speaking, at least). On the other hand, if a candidate is doing better nationally than he is in New Hampshire, that suggests that the candidate may not hold up to scrutiny, that he may be trading primarily on name recognition, etc. His support is superficial.
The litmus test of this then becomes Iowa. If a candidate is doing better in New Hampshire polls than he is in national polls, and that candidate does well in Iowa, that provides very powerful evidence that this increase in information works to the benefit of that candidate.
You might call this something like "the momentum of information". This hypothesis, by the way, has been confirmed by other researchers.
Applications for 2008
Presently in the Democratic race, we are seeing a pattern wherein Barack Obama is doing better in Iowa than he is in New Hampshire, and better in New Hampshire than he is nationally. This is a very favorable alignment of the numbers. It is so powerful, in fact, that I believe it may explain the shift in tactics from the Clinton campaign.
We can also plug in the current polling averages from Iowa, New Hampshire, and nationally to see what our models are predicting. Note that strictly speaking this is a violation of the model's conditions, because it's designed for actual Iowa caucus results, rather than pre-caucus Iowa polls. But here goes:
--- Model 1 Model 2 Model 3 Average
Obama 26 25 25 25
Edwards 25 20 21 22
Clinton 23 30 25 26
Our most sophisticated model, Model 3, shows
that Obama and Clinton are equally likely to win New Hampshire at the moment, with Edwards not too far behind.
A more proper application of the model is to do some scenario testing. Specifically, the model presently suggests the following:
Edwards must beat Clinton by 6 points in Iowa to become a favorite over Clinton in New Hampshire
Edwards must beat Obama by 4 points in Iowa to become a favorite over Obama in New Hampshire
Obama must beat Clinton by 3 points in Iowa to become a favorite over Clinton in New Hampshire
Please keep in mind that these results refer only to New Hampshire, rather than what happens after New Hampshire.