[catastrophic technical glitches in original diary led to rapid deletion - hoping this re-post functions properly]
This is an exit poll story, about trying to get to the bottom of why the early exit polls in the 2004 US presidental election over-estimated Kerry's share of the vote, an especially painful overestimate because unlike an error of comparable magnitude in 1992, where the exit polls also over-stated the Democratic vote, the difference in 2004 was the difference between defeat and victory. It's a geeky story, because it's about a tiny mathematical piece of the puzzle - but a piece that may be crucial to interpreting the data. And it's a blog story, because that tiny piece emerged from the collective networked wetware that is the blogosphere.
Read on....
In November 2004, armies of lawyers were on standby in Florida in anticipation of a repeat of the protracted vote-count and legal battle that had eventually handed the presidency to Bush in 2000. In the event, Ohio turned out to be the state under the spotlight, where the inordinate length of the queues to vote, allegations of machine shortages in strongly Democratic precincts, and a general air of distrust of touch-screen paper-less voting machines supplied by a manufacturer whose chief executive, Walden O'Dell, had famously
stated his commitment to "to helping Ohio deliver its electoral votes to the president next year" fuelled widespread speculation that the exit poll error was not due to errors in the exit polls, but corruption of the vote count.
So which was it?
Answering this question is complicated. There is no doubt in anyone's mind that the early exit poll projections of Kerry's margin differed significantly from the result, which means that those projections were well outside their "margin of error". In other words, something other than chance caused the discrepancy. Unfortunately, the sheer size of the discrepancy doesn't tell you which number is wrong.
In January 2005, Edison-Mitofsky (E-M), the pollsters responsible for the exit-polls, issued a public evaluation of their exit-polling"system". In it they concluded that
While we cannot measure the completion rate by Democratic and Republican voters, hypothetical completion rates of 56% among Kerry voters and 50% among Bush voters overall would account for the entire Within Precinct Error that we observed in 2004.
In support of this claim, they noted that the overestimate of the Kerry vote was greater wherever factors were present that were likely to make selection of voters for interview less than random. In other words, E-M's conclusion is that the Great 2004 Exit Poll discrepancy resulted from Bush voters being more reluctant than Kerry voters to be interviewed, exacerbated where their voter sampling protocol left greater opportunities for Bush voters to escape unpolled.
So how did E-M come to this conclusion?
Exit poll methodology in US presidential elections normally involves two levels of sampling, both of which are vulnerable to error. At one level, a selection of precincts is sampled from each state, which, it is hoped, will be representative of the totality of precincts within that state. Assessing how representative the sample actually was can be easily checked after the election simply by comparing the estimate made by the actual results from the sampled precincts with the actual state results. However, a second level of sampling occurs within each of the sampled precincts, the sampling of the voters themselves. As this is done at the precincts on election day, and involves human interactions, it is potentially more vulnerable to bias.
In their evaluation, Edison-Mitofsky conclude that their precinct sampling was fine, if anything, slightly over-estimating Bush's share of the vote. It was at the level of voter sampling within the precincts themselves that error occurred. This source of error is referred to as the Within-Precinct Error (WPE), and represents the difference between the predicted and the actual percentage margin between the candidates for that precinct. The average WPE in 2004 overestimated Kerry's share of the vote by a full 6.5 percentage points.
Two hypotheses have been offered. One is the E-M hypothesis, which has been dubbed the "shy Bush voter" or "reluctant Bush responder" hypothesis. An alternative hypothesis is the politely termed "vote-count corruption" hypothesis, which states that a greater proportion of Kerry votes than Bush votes went uncounted, and/or that extra Bush votes were somehow added to the total. This second hypothesis is at the heart of allegations that the exit polls in 2004 are evidence of election fraud.
In March 2005, a group of academics working for the US Counts Votes (USCV) National Election Data Archive Project US Counts Votes posted a paper entitled Analysis of the 2004 Presidential Election Exit Poll Discrepancies.A number of diaries about the paper appeared on Daily Kos including a recommended diary by Jpol USCV analyse the data given in the E-M evaluation, and conclude that because total response rates are not significantly lower in strongly Republican precincts than in strongly Democratic precincts, the "reluctant Bush responder" hypothesis is not supported. They then turn to what they consider a plausible alternative hypothesis entitled: "reluctant Bush responder in mixed political company". In essence, this second hypothesis is that Bush voters may have been more reluctant to respond in precincts in which voters for the Democratic candidate were in the majority.
They test this hypothesis by referral to a table in the E-M report that gives the mean and median WPEs for precincts with different degrees of partisanship as indicated by the final vote count. Five categories are given: highly Democratic precincts; moderately Democratic precincts; even precincts; moderately Republican precincts; and highly Republican precincts. The relevant table is reproduced below:
Mean and median WPE, mean absolute WPE and number of precincts (N) by precinct partisanship, from page 36 of the Edison/Mitofsky report.
In the convention used by E-M, a negative WPE indicates an overestimate of the proportion of votes for the Democratic candidate, and a positive WPE indicates an over-estimate of the proportion of votes for the Republican candidate. From the table it can be seen that the most negative mean and median WPEs are in the "high Republican" category of precincts. Thus, contrary to the "reluctant Bush responder in mixed political company hypothesis" the greatest overestimates of the Kerry vote appear to have occurred in highly Republican precincts.
From this table, USCV calculate the relative response rates of the two groups of voters, and conclude that in order to satisfy the "reluctant Bush responder" hypothesis, implausible patterns of non-response rates have to be inferred. The USCV authors claim that the E-M data are more consistent with a "Bush strongholds have more vote-count corruption" hypothesis.
Houston, we have a problem.
Figuring out the nature of the problem is what has been happening in the blogosphere over the past few days.
I had been aware for some time that the WPE was a "confounded variable". If, in a particular precinct, something was wrong with either the sampling of the voters or with the counting of their votes, so that the proportions of Kerry voters polled was greater than the proportion of Bush voters polled (alternatively, that the proportion of Kerry votes uncounted was greater than the proportion of Bush votes uncounted, or even, that the proportion of fictitious extra Kerry votes was less than the proportion of fictitious extra Bush votes) the effect on the WPE would be greater if voters in that precinct were evenly divided than if the precinct was highly partisan. I therefore devised an algebraic formula (what RonK calls my "fancy function") that would enable the "true" sampling bias to be retrieved from the partisanship and WPE data. I posted some interim conclusions in a diary on Daily Kos.
I was aware, but not fully aware, that my fancy function was something of a fudge. We do not have the partisanship data for each precinct sampled, nor do we have the WPEs. All we have is the state results from the count, and the mean WPEs for each state. Similarly, in the table above, we only have mean and median WPEs for precincts in broad bands of partisanship. However, using my formula on those values seemed to confirm the inference drawn by USCV, who in fact updated their report (using their own-brand fancy function), kindly acknowledging me for drawing attention to the possible confound, and reporting that after correcting for the problem the inferences remained valid.
I thought I was on the right lines, and Mark Blumenthal, who linked from a piece on his Mystery Pollster blog to my diary, thought so too. In his piece, he made a similar, though not identical point, regarding possible artefacts that might have led to the data in the E-M, which he suggested were artefacts arising from sampling error. This was very perceptive, although, as it turns out, wrong. Enter couple more people from cyberspace, DKos's DemFromCT, and Rick Brady, from Stones Cry Out, both of whom commented on my DKos piece. Other Kossacks, including RonK, commented on both blogs. We started exchanging emails. Rick was sure that sampling error couldn't produce Mark's artefact. I was running simulations that were producing it, but I wasn't sure why, or to what extent it was related to my confound. Mark became convinced that the artefact was not due to sampling error and issued a "mea culpa" update on his blog piece, but invited people to wait for further developments.... Meanwhile the algebra was getting more and more hairy, and our kitchen table was lying inch deep in back-of-the-envelope formulae, at least half of them in my husband's spidery writing, despite his continual nagging at me to forget exit polls and get on with my dissertation
The problem essentially was that the WPE is a function of two variables - the partisanship of the precinct and the amount of bias in the poll (or in the vote-count), and trying to visualise what happens to a third variable when you manipulate variables one and two can be tricky. So being trained as an architect, I did what architects do when they want to visualise things in three dimensions, and I made a model.
A computational model is like any other kind of model, and it lets you peer at things from unusual angles. What I wanted to know was, essentially this:
If some characteristic of a precinct means that you will tend to poll a bigger percentage of one kind of voter than another, what will that variance in sampling bias do to the WPEs for different levels of precinct partisanship? The mistake that Mark had made, and which Rick first spotted, is that this is a different question to to the question "what effect will sampling error have on the WPE for precincts of different levels of partisanship?" Sampling error is what you get if you randomly sample a percentage of all voters, and sometimes, by chance, get a few extra Democrats, and sometimes, by chance get a few extra Republicans. It is analogous to tossing a coin 100 times and getting roughly 50 heads and 50 tails each time. Sometimes you'll get a few extra heads, sometimes a few extra tails, but over time, if you've got nothing better to do, the extra heads will be balanced by the extra tails, and your average score will be 50 heads. When you didn't get 50, what you got was sampling error.
But what I am talking here about is analagous to tossing loaded dice. Imagine you have a set of dice, and paint each of them black on some sides and white on others. You give some friends one each and get them to throw them 100 times, again several times over. Each time they have to record how many blacks they get and how many whites. If they dice are honest dice, the friends who have a die with four black faces will tend to throw a black four times out of six (67% black). Those whose dice have 1 black face will tend to throw a black only 1 time out of six (17% black). And in fact, your friends could figure out how many black faces there were on their die by noting what percentage of blacks they threw (slow, prone to sampling error, but given enough throws, it would work). However, if you had loaded the dice, you could seriously mislead them. If you had a die with three black faces and three white faces, but loaded it so that it landed showing a black more often than it showed a white, you could trick your friend into thinking that the die had more than three black faces.
Now, imagine two scenarios: in one, you were completely random in the way you loaded the dice. Some are lightly loaded, some not at all, some favour the white sides, and some favour the blacks. In a second scenario you load them so that more of the dice load for black than load for white. These are the scenarios I modelled.
Instead of dice I modelled precincts, and instead of varying the number of black faces on a die, I varied the percentage of voter in each precinct who voted for Kerry or Bush. I took a million of each kind of precinct. For scenario one, I randomly loaded the dice - some were scarcely loaded at all, some were loaded for Bush and some were loaded for Kerry. You could imagine perhaps, that, in some precincts, the interviewer appealed to Bush voters more than Kerry voters, so more Bush voters ended up being interviewed. In others, the interviewer seemed more friendly to Kerry voters. (Or to be even-handed with my assumptions, imagine that in some precincts the vote tabulators deleted Kerry votes, and in others they deleted Bush votes - the math is the same). But the point is that in this scenario the bias in each precinct could go either way.
In the second scenario, I loaded more of the dice in favour of Kerry. In some precincts, the "interviewer" was still more attractive to Bush voters, in some the interviewer had no effect. But the average "interviewer" effect was in favour of Kerry voters.
What I wanted to know was what effect this would have on precincts with different percentages of real votes for each candidate. In other words, for a given level of loading, and a given level of support for a candidate, what would be the effect on the guesstimate of the vote?
Figure 1 shows my model output for the first scenario.
Figure 1
Model output for "no net bias" conditions. Kerry's share of the vote is plotted on the x axis (with "high Republican" precincts on the left, where Kerry's vote share is small, and "high Democrat" precincts on the right, where Kerry's vote share is large). The Y axis represents the magnitude of the bias as determined by Mean WPE (white line); median WPE (red line) and bias index (green line). Negative WPEs and positive bias indices represent an over-estimate of the Kerry vote.
Along the bottom of the plot is Kerry's share of the counted vote, ranging from highly Republican precincts on the left (only 10% Kerry) and highly Democratic precincts on the right (90% Kerry). The white line represents the average for each category of precinct WPE. You can easily see that where the support for each candidate is even, the WPE is zero. Even though some of the precincts are very biased, some in Bush's favour, some in Kerry's, the average mis-estimate (WPE) is zero. However, where Bush's support is strong, the WPE goes negative, indicating an over-estimate of Kerry's vote. And where Kerry's vote is strong, the WPE goes positive, indicating an over-estimate of Bush's vote. Remember, in each of these precincts, even the extremely partisan precincts, the bias is as likely to be in Bush's favour as Kerry's - the dice are randomly loaded. But the effect on strongly Republican precincts is very different from the effect on strongly Democratic precincts.
The green line represents the median WPE for each precinct. The median is unaffected by the partisanship. From this, math geeks can deduce that the distribution of WPEs is skewed. In English, this means that as many precincts over-estimated a particular candidate's vote as underestimated it, but that in highly partisan precinct, the over-estimates of the minority candidate tended to be much larger than the under-estimates of the majority candidate.
Computatational modeling is fun.
The white line shows the bias estimate made by my fancy function. To my intense pleasure, it shows that the net bias is zero (which it is) and that the estimate is unaffected by precinct partisanship.
The model output for scenario 2 is illustrated below. In this scenario I jacked up the "response rate" of the "Kerry voters" relative to the "Bush voters". There were still plenty of precincts where the net bias was zero, and even plenty where the net bias was in "Bush's" favour (there are 9 million of them after all) but the average "bias" is towards Kerry voters, so that the average WPE is no longer zero, but negative, indicating a net over-estimate of the Kerry vote, as in 2004.
Figure 2
Model output for "Net Kerry bias" condition. The axes and legends are as for Figure 1
As you can see, this plot looks very different. Again the white line represents the mean WPE for each category. This time, all but the most strongly Democratic precincts have a negative WPE, and the WPEs are most negative for precincts with 30% Democratic support. However, the red line representing the medians in this plot is also curved. It only touches the white line once, at around the 45% mark, so it is only here that the distribution of the WPEs is symmetrical, and both are strongly negative. In the highly Republican "precincts" as in the E-M table, both mean and median are negative, the median being less negative than the mean. For more Democratic precincts, the distribution is skewed in the opposite direction - the mean being closer to zero than the median.
Again the green line represents my fancy function, which for abstruse mathematical reasons has the opposite sign to the WPE - a positive value indicates an over-estimate of the Democratic vote. Once again to my intense pleasure, the green line correctly indicates an overestimate of the Kerry vote, and also correctly indicates that the bias it represents was not different in precincts of different levels of partisanship, whatever misleading tale is being told by the mean and median WPEs.
So what does all this mean?
Firstly, it means that the WPE is a terrible way to assess where either polling bias or electoral fraud occurred, as its value depends on the very variable we want to examine - the partisanship of the precinct or the state. The only way of finding out whether differential non-response rates (aka "shy Bush voters") are responsible for the WPE is to find out to what extent the degree to which random sampling protocol was compromised (which, fortunately for us tends to exacerbate any "shy voter" tendencies" and thus brings it statistically out of the woodwork), is correlated with variance in a measure of bias that is not confounded by precinct partisanship, such as that given by my "fancy function". I hope that the people at Edison-Mitofsky will read this great Blogspheric effort, do the math, and tell us the answer. If a large proportion of variance in "bias" remains unexplained, or if there are suspicious looking outliers in the data, then maybe the "vote-corruption" hypothesis will retain some legs (though they are looking very wobbly to my eyes right now).
Secondly, the really unexpected (to me) finding from of this computational experiment is that even where the net signed WPE is zero, it cannot be inferred that no bias occurred, as randomly distributed precinct-specific biases may nonetheless favour one candidate as frequently and as greatly as the other. So just because a polling firm gets the right answer in future will not tell us whether or not bias (or even fraud) occurred. However bias (and fraud) will leave a tell-tale finger print when WPEs are plotted against precinct partisanship. If the mean and/or median WPEs start wandering off into that S shape, we will know that someone's cheating. Unfortunately we won't know whether the fingerprints belong to a pollster who looks more appealing to one kind of voter than the other, or to a hacker of the vote. They are both the same.
Finally, the results of this experiment suggest that the pattern of mean and median WPEs by precinct partisanship reported by E-M, may not be, as USCV claim, an indication of bias concentrated in Bush strongholds, but rather the pattern that might be expected if over-polling of Kerry voters (or under-polling of Bush voters) was fairly uniform and widespread. In Figure 3, below, I've plotted the Eidson-Mitofsky data from the table above. If you compare Figure 1 with Figure 3, there is a bit of a family likeness, particularly in the way the median diverges from the mean in the high Republican category.
Figure 3
EM median and mean WPE values for five categories of precinct. Trend-lines are best quadratic fit.
If so, then it would seem that the conclusion drawn in the USCV report, that the pattern observed requires "implausible" patterns of non-response, and thus leaves the "Bush strongholds have more vote-vount corruption" hypothesis as "more consistent with the data" is not unjustified. The pattern instead is consistent with the E-M hypothesis of widespread "reluctant Bush responders" - whether or not the "political company" was "mixed".
True geeks can download a paper giving all the hairy equations by clicking here
Cross posted on New European Times
Update 27th April 2005: the link to the geeky paper now goes to an updated version; the old one is archived here