Predicting the House Using Historical Data

by Daniel Donner for Daily Kos Elections

Thursday, Oct. 18, 2012 Thursday, Oct. 18, 2012 at 9:00:05am PDT

It turns out there is a very strong relationship between the number of seats Democrats gain in the House and the change in the popular vote margin from the previous House election. Using a regression based on data from 1932-2010, this year's election can be predicted as shown below.

Quick instructions:
1. Find House generic ballot polling numbers here.

2. Use the generic ballot number you found to find how many seats Democrats would gain in a typical year, using the black line above or the table here.

3. Adjust for this year's circumstances, which appear to be poorer than average. Your adjustment should keep you within the bounds of the green lines, which are essentially historical limits. I would suggest subtracting 6 - add your suggestion in the comments, and we'll see what the wisdom of the crowds says.

Details

Using the graph.
First, you need to figure out what the popular vote margin for House elections will be (no easy task!), and find that number along the x-axis. (In the last two cycles, polling averages have been pretty accurate, slightly underestimating Democrats.) For example, as of this writing, the polling average show a tie in the generic ballot. The graph above shows that history tells us Democrats should, on average, gain just shy of 25 seats with a tie (black line), although the 95% prediction interval ranges from 2 (lower green line) to 46 (upper green line). In years when conditions are better than average for Democrats, the results should end up above the black line; we'll end up below the black line for years that are tougher than average.

A tie would put chances of taking back the House just south of 50%, if we knew nothing else about this year's elections. However, we do know plenty more information, and circumstances seem less favorable than average for Democrats, implying the final result will end up below the black line.

Adjusting for this year's circumstances.
After finding your general range, then you can start to factor in other things. For example, if you think redistricting gave Democrats a 10-seat structural disadvantage this year, then simply subtract 10 from the number the black line gives you. In this case, the graph would predict (at a popular vote tie) Democratic gains of about 14 seats. If you think circumstances are, for Democrats, the worst they have been in 80 years, then you would look at or just below the lower green line; in the case of a popular vote tie, this would show essentially no change in the number of Democratic seats.

For reasons that are detailed below, I would personally guesstimate that around 5-7 seats should be subtracted from the model's estimate for this year, but I wouldn't bet much on that number. For the pessimistic amongst us, the graph shows us that under the worst circumstances we could not expect Democrats to win the House back unless the popular House vote margin ends up at D+6.

Using the chart to adjust seat-by-seat predictions.
If you have already made your own predictions of the net change in Democratic seats in the House this year, based on seat-by-seat analysis, this graph can still be useful to you. If conditions change suddenly, you can adjust your prediction without waiting for polls from individual races. For example, if you have Democrats gaining 10 seats under conditions of a generic ballot tie, and the generic ballot polls suddenly move to Democrats +5 (don't you wish?), you would add 19 seats to your prediction for a 29 seat total change (oh, I so wish!).

Yes, Democrats could win and still lose.
It is easy to see at a glance how Democrats could win the House popular vote but still not win a majority of seats. In fact, history alone tells us that even if Democrats win the popular House vote by four points, there's still ~10% chance they won't take the House back, and that's without factoring in conditions specific to this year.

Much, much more below, including how well this would have worked in 2010, how the graph was constructed, why it works better than just using popular vote alone, and how to estimate the popular vote without generic ballot polling.

Where did this graph come from?
The graph comes from the idea that the two biggest predictors of the change in House seats for Democrats are this year's popular House vote and the current distribution of House seats. Of course, the current seat distribution will be related to last election's popular House vote. So, I graphed the change in the number of Democratic seats against the change in the margin of the popular House vote. You get an excellent correlation:

Why not just use the popular House vote alone?
Two reasons: 1) because using the change in popular House vote (Change Graph, above) works better and 2) because when you use the popular House vote alone (Static Graph, below), you get different regressions for different time periods. This isn't the case for the Change Graph. Here's the Static Graph:

We can see that the regression for the Change Graph is simply better than that of the Static Graph. When using the regression from the Change Graph, it incorrectly calculates control of the House in 2 of 40 elections, while using the regression from the Static Graph incorrectly calculates control of the House in 5 of 40 elections. The Static Graph also has much larger prediction intervals (for a tied popular House vote, for example, the 95% prediction interval for Democratic House seat changes this year ranges from losing 25 to gaining 105). Finally, recent elections all fall at or below the regression line, while elections from the last period of House/Presidential vote alignment all fall at or above the regression line, implying this regression changes with time.

We could just use the most recent elections for our prediction (as Sam Wang does, I believe, before adding in incumbency and redistricting effects), but then we run the risk of having too few elections to capture the variation from one election to another, and just getting a good regression by chance. For example, the regression for the elections from 2000-2010 is great - but the regression for the elections from 1994 to 2004 has a measly R² value of 0.37.

Other Factors: Redistricting.
Redistricting is one of the factors that can shift the outcome from one side of the regression to another. What happened this time?

One way to look at this question is to use the DKE list of presidential results by district and evaluate the distribution of seats.

First, the districts that were lost were 7R, 5D. The districts labeled as new are 5R, five that are rated at least leaning D, and two rated tossups. This isn't too far off from 'no change' but how exactly to quantify this change is certainly arguable. I'll go for a Republican advantage of +1 personally.

The remaining districts generally shift a little towards safer districts for Republicans, less safe districts for Democrats. For example, there are 18 fewer districts labeled as Republican in the list that Obama won with 52-58%. But there are 9 more districts at Obama 50-51%, and most importantly 5 more districts at Obama >58%.

I fired up my old model to run some calculations. Now, the old model uses Bush vote from 2000, which is not the same as McCain vote from 2008, so the old model can't predict the correct number of seats Democrats or Republicans can win in 2012. But what it can do is show us what effect changes due to redistricting have. Result: Advantage Republicans. For example, under conditions in which the model predicts Republicans losing 32 seats before redistricting, now they would only lose 31 seats. Under conditions in which Democrats would have lost 20 seats before redistricting, now the model predicts 21.

Add the model numbers and the new district changes together and you get somewhere around R +3 or +4. Interestingly, this is pretty close to what Sam Wang calculated at R+6.3+/-0.6 seats. (I love it when different methods more or less agree!)

At any rate, whatever estimate you might prefer to use, simply subtract that number from what the black line gives you in the first graph above.

Other Factors: Open seats.
Another factor that needs to be considered is the distribution of open seats. Of course, incumbents can raise a ton of money, scare away strong opponents, and, on top of that, they perform a few points better on election day just by virtue of being incumbents. Open seats are therefore more likely to switch parties, so the number of retirements in each party makes a difference. We can see this in the data going back to 1980 (sorry, I just tired of counting retirements at that date), excluding redistricting years. On average there is a weak relationship between the residual for the regression and the difference in the number of open seats for each party. So, if Democrats have 10 more open seats than Republicans, you can expect Democrats to win, on average, a couple fewer seats that year than the regression predicts. If Republicans have 10 more open seats than Democrats, you can expect Democrats to win a few more seats than the regression predicts, on average. This year, the open seats are more or less balanced between the parties.

The incumbency advantage is, indeed, why the Change Graph works better than the Static Graph. If Democrats have only 150 seats and then win the next election by 10 points, they won't be able to pick up every seat they ought to be able to because they almost certainly won't have fielded strong, well-funded candidates in every district. So maybe they will win, say, 255 seats. But if they start with 330 seats and the margin decreases to 10 points in the next election, we might expect them to win a lot more - maybe 295 seats - because Republicans will be fighting the incumbency funding and recruitment advantages instead. Sadly, there is no such clearcut example in history that does not involve complicating factors such as redistricting, but one that is relatively close is 1988 (+8D popular vote margin, 260D, change from 258D, +10D popular vote margin in 1986) vs 2006 (+8D popular vote margin, 233D, change from 202D, -3D popular vote margin in 2004).

Other Factors: The election before last.
Up above, in explaining how the regression was constructed, I said "the current seat distribution will be related to last election's popular House vote." But if the regression works, shouldn't the current seat distribution be related to the difference between the popular vote in the previous election and the popular vote in the election before that?

As it turns out, yes, kind of. There is another weak - very weak - relationship between the residual and the difference between the current election's popular vote margin and the margin of the election four years ago (again, excluding redistricting years). This year, this might give us a slight headwind of a few seats at this point, as 2006 was a pretty good year for Democrats in the House popular vote, compared to the current generic ballot numbers.

Does this method of prediction work?
It works if the polling works.

For example, in 2010, this regression would have predicted a loss of 54 seats for Democrats compared to the 2008 results - in July. Numbers worsened from there, with a final prediction of a loss of 72 seats, and a range of 48 to 94. Actual: 63. In that year, the average margin of the final two weeks of polling was just 1.5 points off, in favor of Republicans.

On the other hand, in 2006, the regression predicted Democrats would pick up around 40-60 seats throughout the entire year, with the final two weeks polling average resulting in a prediction of a gain of 57 seats, with a range of 33 to 80. Actual: 33. The final polling average margin was off by 5.4 points in favor of the Democrats that year.

So beware.

Follow the President?
This year we seem to have a lack of polling for the generic ballot. There were only 12 polls in September, compared to 18 in 2008 and 31(!) in 2010. What are we to do? One interesting thing to note is that we have entered another period of alignment, where the popular House vote and popular Presidential vote are pretty close. So perhaps we can just substitute the Presidential margin for the generic ballot margin. In 2008 there was a 3-point difference, however. Still, it's an interesting idea. Here's the relationship:

This year it seems like the generic ballot numbers are lagging the presidential numbers a bit. In 2008, the generic ballot numbers correctly showed better numbers for Democrats than the presidential numbers, although both were a little too Republican. Getting good generic ballot numbers is the major challenge for this estimation method.