This is a cross-post from the Burnt Orange Report. I spent about 4 months researching and developing this, so while it certainly has methodological flaws, I figured it was a labor worth sharing.
One of the most contentious debates in the field of political science today is over the "predictability" of elections. This question necessarily leads to some questions which grate at our moral conscience as Americans. If an election is predictable based upon economic conditions and "political time", then how much impact can one individual truly make? Are we controlled by fate or destiny?
On the other hand, one could argue that, if voters are really rational, then it's pretty simple to figure out what they are going to do given objective preconditions. Rational choice theory then encourages us to see electoral predictability as fairly flattering evidence that Americans really know what's best for them, and what could be more moral than that?
In either case, I'm not a very moral person, but I did think it would be cool to try and take a stab at divining - just about a year ahead of time - who has the inside track in the race for the White House.
(In part I did this in the hopes that I could create a reasonably realistic computer game, so I'm not without some pragmatic motive).
The Burnt Orange Political Weather Forecast
Click here for a map
This month's forecast for the 2004 Election -- still 54 weeks away -- suggests a somewhat competitive election in which President Bush has a slight Electoral College advantage.
The overall forecast suggests an election similar to that of 2000, with probably battleground states being:
(Leaning Slightly to the Democrats)
Washington, Oregon, Wisconsin, West Virginia, and Pennsylvania
(Leaning Slightly to the Republicans)
New Mexico, Iowa, Missouri, and Florida
In these states, both parties have a better than one-in-three chance of winning.
Other possibly competitive states - where the chance of an upset falls to one-in-ten, are:
(Leaning to the Democrats)
California, Illinois, Michigan, Minnesota, and Delaware
(Leaning to the Republicans)
Arizona, Colorado, Nevada, Arkansas, Louisiana, Tennessee, Ohio, and New Hampshire
Should all of these predictions all come to pass, President Bush will be re-elected with 285 electoral votes, with the as-yet unnamed Democrat receiving 250 electoral votes.
Bush has 235 EVs "solid" or "leaning", with the Democrat having 196 EVs "solid" or "leaning". Slight Lean/Tossup states comprise 104 EVs.
This prediction assumes a presidential approval rating of 55 percent in the Gallup Poll on or about Labor Day of next year. It also assumes an approximate 3 percent increase in real disposable income in the third quarter of 2004.
It does not add in the likely impact of the Democratic candidates's home state advantage (since we do not know who the Democratic candidate will be). Expect about a four-point bounce in the state tally for whomever the candidate is.
Here is a table with the projected two-party vote shares and the probabilities of a Democratic win:
State Share Prob. Rank
AL 41.7% 3.0% 38
AK 32.4% 0.0% 48
AZ 44.9% 13.2% 31
AR 47.9% 31.3% 25
CA 55.1% 84.3% 12
CO 45.8% 17.8% 29
CT 59.1% 96.9% 6
DE 53.9% 77.2% 13
DC 82.8% 100.0% 1
FL 48.3% 34.4% 24
GA 43.2% 6.6% 35
HI 60.4% 98.5% 5
ID 30.4% 0.0% 50
IL 55.3% 85.4% 10
IN 41.5% 2.7% 39
IA 48.9% 38.3% 22
KS 38.4% 0.4% 45
KY 41.4% 3.1% 37
LA 44.3% 10.6% 32
ME 54.8% 85.1% 11
MD 58.9% 96.7% 7
MA 64.1% 99.9% 3
MI 52.3% 67.2% 15
MN 52.3% 67.7% 14
MS 38.8% 0.7% 42
MO 48.3% 34.6% 23
MT 38.8% 0.7% 43
NE 34.9% 0.1% 47
NV 47.1% 25.8% 27
NH 45.8% 17.1% 30
NJ 56.9% 91.8% 9
NM 49.4% 42.9% 21
NY 62.6% 99.6% 4
NC 43.9% 8.3% 34
ND 39.2% 0.8% 41
OH 47.7% 29.9% 26
OK 37.3% 0.3% 46
OR 51.1% 57.8% 18
PA 52.1% 65.5% 16
RI 64.7% 99.9% 2
SC 42.2% 3.6% 36
SD 39.8% 1.1% 40
TN 46.2% 19.9% 28
TX 38.4% 0.5% 44
UT 30.4% 0.0% 51
VT 57.3% 94.3% 8
VA 44.1% 9.7% 33
WA 51.9% 63.4% 17
WV 50.3% 52.5% 20
WI 50.8% 55.4% 19
WY 31.0% 0.0% 49
The validity of this forecast
The model is based on data from 1964 through 2000. Although perfect data for 1956 or 1960 was not available (as I shall explain below), the model was able to make reasonably good guesses as to which states the Democrats would carry in those years.
The 1960 Retrocast --
Map
The model achieved roughly 75 percent accuracy for this election. While it predicted a Kennedy victory over Richard Nixon (345 EVs to 186 EVs; 6 EVs from Alaska and Hawaii not counted), it missed several important states; it mistakenly called Washington, Oregon, Montana Oklahoma, Tennessee, Kentucky, and Florida for Kennedy, while calling Connecticut, New Jersey, and Nevada for Nixon. Nor could the model foresee that Democratic electors in Mississippi and Alabama would vote for conservative Harry Byrd instead of the official Kennedy/Johnson ticket, which won by a total Electoral College vote of 303-219-15.
Kennedy's unforseen success in Nevada, New Mexico, Louisiana, and New Jersey may likely have been caused by Catholic voters (and perhaps the reverse is true in Tennessee and Kentucky).
Alaska and Hawaii were omitted from this retrocast as it was the first election in those new states, and the model is heavily dependent on past performance.
The 1956 Retrocast --
Map
The model successfully predicted an overwhelming landslide by President Dwight Eisenhower over Democratic challenger Adlai Stevenson, missing only Missouri (which it called as a "solid" Eisenhower state), North Carolina, and Louisiana (which had a "slight lean" towards Stevenson/Kefauver). Overall this yielded 45 correct calls and 3 incorrect calls, a 93.7 percent correct-call rate.
Within-sample retrocasts (as opposed to these out-of-sample retrocasts) showed a consistent error rate of between 5 and 10 percent. So it is possible that 5 states (or even more) in the 2004 forecast could "flip." Generally, though, this "objective" forecast is roughly in line with the widely-regarded predictions made by Larry Sabato as well as the more subjective ones at PresidentElect.org. Ron Faucheux at Campaigns & Elections, perhaps the world's foremost political oddsmaker, also gives Bush a slight advantage (54.5%) heading into next year.
Factors weighing into this forecast
This forecast was created using a pool of six models, namely --
- Two linear regression models (one with a constant, the other with an intercept of zero) estimating the Democratic share of the two-party vote;
- Two probability models gauging the probability of a Democratic win (1) or loss (0) -- one model uses LOGIT, the other uses PROBIT (rhymes with "hobbit");
- Two probability models gauging the probability of an individual voter voting Democratic (1) or not (0) -- again, using both LOGIT and PROBIT.
All six models use the following variables:
- Democratic share of the two-party vote in the last election;
- Average Democratic share of the two-party vote in elections t-2 through t-4 (that is, the last three elections prior to the previous election; in 2004 that means 1996,1992, and 1988);
- The ideological position of the median voter in that state, ranked on a scale of -3 (most conservative) to 3 (most liberal). This is based on a moving average of the annual scores derived by Fording, Rinquist, Hanson, Berry (1998), who use congressional voting scorecards from Americans for Democratic Action and the AFL-CIO to estimate the ideological leanings of constituents. Since the three-year moving average for 2001-2003 can not yet be calculated (and won't be until early next year), ideology scores for 2000 (the moving average of 1997-1999) are used currently;
- A dummy variable (1 for Democratic presidents, -1 for Republican presidents) denoting whether the Democratic presidential nominee is the incumbent president;
- A dummy variable (1 for Democratic presidents, -1 for Republican presidents) denoting whether the Democratic presidential nominee is the incumbent vice president;
- A dummy variable marking the home state of the incumbent president (1 for Democratic presidents, -1 for Republican presidents);
- A dummy variable marking the home state of the Democratic presidential nominee (which in all cases is zero for this forecast, since we don't know who the candidate is yet);
- The incumbent president's job approval rating, as measured by the Gallup organization on or about Labor Day (via David Burbach at MIT for many of the data points). This is positive if the incumbent is a Democrat and negative if it is a Republican;
- and the natural logarithm of the percent change in per-capita real personal disposable income in the third quarter of the election year (what a mouthful!).
Two additional dummy variables were used to account for unusually poor Democratic performance in the Deep South in 1964, as well as unusually good Democratic performance in the South in 1976. Generally, accounting for the whims of the Southern white bloc vote was the hardest part of producing this forecast -- the Southern tide which propelled Kennedy and Carter was not present for Johnson and Clinton. Moreover, accounting for George Wallace's vote in 1968 created headaches; eventually, I decided to count Wallace votes as Republican votes (since, presumably, the same voters who went for Wallace earlier voted for Republican Goldwater in 1964 and later voted for Republican Nixon in 1972).
Overall the models use 503 datapoints (every state and D.C. since 1976; ever state in 1972; and every state except Alaska and Hawaii in 1968 and 1964). The two linear models have R-square statistics of .86 and .84, respectively; and global F statistics of 264 and 241, with 491 and 492 degrees of freedom. Both voter-probability models have maximum ln-likelihoods approaching -336, and both state-probability models have maximum ln-likelihoods of about -103.
In the future I intend to update this prediction using better data, including the "true" ideological scores for 2001-2003 and more accurate estimates of 2004 Q3 RDI growth. I also would like to experiment using congressional support for the president as a variable (the logic behind that being that a state congressional delegation's support of presidential initiatives is driven, in large part, by the president's popularity back home among the contituents).
The entire Excel spreadsheet will be found here. Criticisms of a strictly mathematical sense (as this was the first time I have applied LOGIT/PROBIT analysis) are very welcome. Be warned, the spreadsheet is about 31 megs large. A non-interactive, HTML version will be found here.
Finally, I am deeply indebted to the prior works of Steven Rosenstone, Douglas Hibbs, Ray Fair, John Zaller and Larry Bartels. I am also grateful for Charles Annis's Web tutorial on implementing generalized linear models like LOGIT and PROBIT on his Web site, statisticalengineering.com. Major sources of data are Dave Leip's Election Atlas, the Bureau of Economic Analysis, and the Bureau of Labor Statistics.