Introduction:
Tammy Baldwin is a U.S. Representative from Wisconsin's Second District and the presumptive Democratic nominee for the U.S. Senate seat vacated by Herb Kohl. On DK Elections, we've had a few conversations (like this one: http://www.dailykos.com/...) about her chances in a general election relative to other possible nominees like Rep. Kind or former Rep. Kagen. Some of the points raised against her didn't strike me as very convincing. For example, I don't think it matters on its own that her district--which includes Madison--is much more liberal and Democratic than the state as a whole. Candidates can get elected to statewide from unrepresentative constituencies. But one point did concern me.
Introduction (continued):
In 2000, Baldwin ran for re-election for the first time, having been elected in 1998. She ultimately won by a narrow 2-point margin against Republican John Sharpless (http://en.wikipedia.org/...). I can't link to them, but I looked up some news articles at the time depicting Sharpless as a strong campaigner.
Her district at the time was estimated by our own twohundertseventy to be about D+7, having been won by Clinton with a 22-point margin (see the above-linked discussion). Indeed, as we'll see, she sharply under-performed both Gore (who was easily carrying the district over Bush despite a relatively strong performance by Nader) and the state Assembly candidates running in her district's precincts.
I decided to look at this election on a precinct level, using the handy data at the WI Government Accountability Board Elections website:
http://elections.state.wi.us/...
Their spreadsheets give the Congressional, State Senate, and State Assembly districts each precinct is in, so it was pretty straightforward.
Basics:
I'll start with the basics. There were 325,099 total votes cast for President in the district. George Bush won 36.39% of the vote to Al Gore's 57.96% and Ralph Nader's 4.97%. There were 318,380 total votes cast for Congress in the district. John Sharpless won 48.57% of the vote to Tammy Baldwin's 51.36%.
Baldwin's district at the time had 405 precincts. However, some of these precincts (as we all know) are "empty" and cast no votes in any race. 15 cast no votes for President or Congress, so there were 390 "real" precincts. Of these, Gore won 290, Bush won 96, and they tied in 4. Baldwin won 119, Sharpless won 270, and they tied in 1. Baldwin's margin over Sharpless was larger than Gore's margin over Bush (positive or negative) in only 10 precincts, 7 of which were in Madison. In 2 tiny precincts, their margins were the same. Gore overperformed Baldwin in the remaining 378 precincts.
Calculations:
I performed several different regressions on the precincts using R. The most remarkable result was found when I tried to predict the absolute number of votes Baldwin and Sharpless won in each precinct using the absolute number of votes Bush, Gore, and Nader won, along with a "dummy variable" indicating whether or not the precinct was in the city of Madison.
Here is the result for Sharpless:
Call:
lm(formula = SharplessAbs ~ Bush + Gore + Nader + InMadison)
Residuals:
Min 1Q Median 3Q Max
-97.864 -11.059 0.293 13.845 118.553
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.15413 1.94121 0.595 0.552501
Bush 1.11144 0.01305 85.160 < 2e-16
Gore 0.13671 0.01096 12.476 < 2e-16
Nader -0.10455 0.04563 -2.291 0.022500
InMadison -17.07373 4.54295 -3.758 0.000198
---
[...]
Residual standard error: 25.67 on 385 degrees of freedom
Multiple R-squared: 0.996, Adjusted R-squared: 0.996
F-statistic: 2.411e+04 on 4 and 385 DF, p-value: < 2.2e-16
and here is the result for Baldwin:
Call:
lm(formula = BaldwinAbs ~ Bush + Gore + Nader + InMadison)
Residuals:
Min 1Q Median 3Q Max
-198.471 -11.336 -0.006 11.062 102.168
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -4.60146 2.26566 -2.031 0.04295
Bush -0.12707 0.01523 -8.342 1.32e-15
Gore 0.86487 0.01279 67.623 < 2e-16
Nader 1.00028 0.05326 18.781 < 2e-16
InMadison 15.28366 5.30224 2.882 0.00417
---
[...]
Residual standard error: 29.96 on 385 degrees of freedom
Multiple R-squared: 0.9962, Adjusted R-squared: 0.9962
F-statistic: 2.538e+04 on 4 and 385 DF, p-value: < 2.2e-16
The R-squared for each was over 99.6%! That is a pretty good fit for using only four variables to predict 390 data points. And all of the variables are pretty statistically-significant. I'm actually disappointed--I would have preferred a weak correlation, so that there might have been some more interesting phenomena going on.
Put into formulas:
SharplessAbs = 1.15413+1.1114*Bush+0.13671*Gore-0.10455*Nader-17.07373*InMadison
BaldwinAbs = -4.60146-0.12707*Bush+0.86487*Gore+1.00028*Nader+15.28366*InMadison
We can see how much of the remarkable 99.6% r-squared came from each variable by using beta coefficients. For Sharpless:
Bush 0.8488265
Gore 0.1579102
Nader -0.008010274
InMadison -0.002702342
And for Baldwin:
Bush -0.06345608
Gore 0.9396389
Nader 0.1133891
InMadison 0.006650035
Analysis:
Let's look at these results carefully (although take my explanations with a grain of salt, as I am no statistician). For Sharpless, the regression coefficients for Bush and Gore are positive. That means he was more likely to have a voter show up whenever Bush or Gore did, suggesting Sharpless was able to peel off Gore voters. Bush's regression coefficient is over 1, suggesting Sharpless improved on the Republican performance by perhaps getting extra voters in Republican areas who did not vote for President (or voted for a conservative third party, since the total number of Congressional voters was a bit less than the total number of Presidential voters). Gore's regression coefficient is about 0.14, suggesting perhaps that about 14% of Gore voters went to Sharpless in every precinct. Bush's beta coefficient is the largest, but Gore's is significant as well.
For Baldwin, only Gore has a positive regression coefficient, and it is about .86, suggesting her loss of Democratic presidential voters. Bush has a negative regression coefficient. InMadison and Nader, as you might expect, are positive for Baldwin and negative for Sharpless. In particular, Nader's coefficient for Baldwin is about 1, perhaps suggesting that most Nader voters also voted for Baldwin. Nader's beta coefficient is pretty significant here.
Statewide Scenario:
We can easily plug these formulas into every precinct in the state (or for the whole state, if not for the InMadison part). The formulas predict that Baldwin would have received 996,797.9598 votes to Sharpless' 1,537,784.507 votes had they both run statewide. That's about a 60/40 or 61/39 loss--a landslide, as you'd expect, of course, from her narrow win in a D+7 district.
Wrap-Up:
This is basically a story of a null hypothesis turning out to be right. It seems like a consistent 14% or so of Gore voters voted for Sharpless instead of Baldwin, and Sharpless got a few extra Republicans who didn't vote for President as well. Baldwin seems to have done well with Nader voters. She also perhaps got a statistically-significant but small boost in her home city of Madison. Other than that, the patterns hold well across almost 400 precincts, suggesting that there's not a lot of room for regional variations.
I had thought that perhaps Baldwin had some other sources of regional strength where she might have outperformed Gore, even if Gore outperformed her overall. But she didn't really outperform Gore anywhere except insofar as she did well with Nader voters. There was a precinct where she got 1 fewer vote than Gore and Sharpless got 2 fewer than Bush.
And there was one tiny precinct where Bush/Gore was 8-1 and Sharpless/Baldwin was 7-2. Madison's Ward 83. This strongly suggest that there was one voter in Wisconsin who wanted George W. Bush to be their President and Tammy Baldwin to be their Congresswoman. But that might have been about it.
I'm not sure what conclusions, if any, we can draw from this for the 2012 Senate election. That election, of course, will coincide with the Presidential election, and as we see above Baldwin did seem to consistently lose quite a few Democratic Presidential voters. If Obama carries Wisconsin in 2012 by a weaker margin than Gore did in 2000, Baldwin will then have that much less room for error there.
Something in the data and admittedly in the news reports make me think this was as much about Sharpless being a strong candidate as much as it was about Baldwin being a weak candidate. Perhaps by now she's built up regional strength and organizations in and out of her district. Hopefully, she's studied the campaign well and won't face another Sharpless. (The news articles are worth looking up, but I'm not sure how much I can quote/link to them.)
There are also some interesting (albeit slightly weaker) patterns in the percentage results by precinct, and all together I'm not sure if the two patterns don't suggest something unusual related to Madison's occasionally-commented-on combination of high turnout and partisanship. But that's for another day, I think.
I am indebted to the following explanation and code of how to do regression analysis in R, including the beta coefficients stuff--hadn't heard of them:
http://www.gardenersown.co.uk/...
Percentage calculations(update):
twohundertseventy says that I should use percentage instead of absolute numbers to avoid the influence of varying precinct size. That was my initial tack, as it happens.
Here is Baldwin's percentage as explained by Gore's percentage, Nader's Percentage, and by the InMadison dummy variable:
Call:
lm(formula = BaldwinPer ~ GorePer + NaderPer + InMadison)
Residuals:
Min 1Q Median 3Q Max
-0.201429 -0.022574 0.000886 0.023580 0.132608
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.050262 0.009962 -5.046 6.97e-07
GorePer 0.851790 0.019588 43.485 < 2e-16
NaderPer 1.006267 0.069913 14.393 < 2e-16
InMadison 0.045233 0.005517 8.199 3.62e-15
---
[...]
Residual standard error: 0.03841 on 386 degrees of freedom
Multiple R-squared: 0.912, Adjusted R-squared: 0.9113
F-statistic: 1334 on 3 and 386 DF, p-value: < 2.2e-16
Beta coefficients:
GorePer 0.6859836
NaderPer 0.1459599
InMadison 0.08006726
So the story's not that different--Baldwin got about 85% of Gore's share of the electorate, plus basically all of Nader's share, plus a small boost from precincts in Madison. The fit is significantly worse, but a 91.2% r-squared is still quite close, I think, and continues to suggest a relative lack of regional variation. (There's not much point in doing the same for Sharpless, since he and Baldwin got all the Congressional votes except for a bit of "scattering".)