I'm not a professional statistician but I do use statistics in my research on a daily basis, and I have been looking at a recent paper that has appeared on the subject of the discrepancies between the exit polls and the counted votes.
Mystery pollster has done a far better job than I ever could at explaining issues relating to the exit poll discrepancies. However, the paper recently posted by Ron Baiman at Free Press has not yet been critiqued by Mystery Pollster, so I have had a look at it to see how it squares with the principles laid out by Mystery Pollster. Baiman kindly sent me the spreadsheet he was working from so I was able to see what he had done.
The first issue to check was the data used. There was some confusion shortly after the election about which were the relevant exit polls. As election day went on, the exit polls were adjusted in line with the real data available from precincts, making them useless for analysing discrepancies between how people thought they had voted and how they appeared to have voted (although more useful for predicting the results). Most recent analyses have used data gleaned from early screen shots from CNN. Baiman has used the screen shot data, which appears to be the best available data at present.
He shows, as other analysts have done, that the states where the exit polls deviated most greatly in favour of Kerry from the final tally were: Florida, Pennsylvania, Ohio, South Carolina and New Hampshire. However, in contrast to Mystery Pollster, he claims that in Ohio (and also South Carolina and New Hampshire), the size of this discrepancy has only a one in a hundred probability of occurring by chance. Mystery Pollster claims that that the Ohio discrepancy was well within the confidence limits of the exit poll.
I have therefore looked at Baiman's spreadsheet to try and find where his analysis differs from that of Mystery Pollster. It all seems to boil down to the value assumed for a phenomenon called the "design effect". The design effect is a source of error that needs to be allowed for, as the people polled in an exit poll are not randomly selected from throughout the state, but from a sample of precincts. As voters in a particular precinct are likely to share some demographic characteristics, there is likely to be less variance in the polled sample than in the state as a whole. The calculated variance therefore needs to be multiplied by some value in order to compensate for this "design effect".
Baiman used a value of 30% for the "design effect" which he regarded as "conservative". However, Mystery Pollster cites Rick Brady who contacted Warren Mitofsky, of Mitofsky International, and obtained values of between 50% and 80% for the design effect. Mystery Pollster, and also Nick Panagakis, whom he cites, has used values of either 60% or 80% for this effect in their calculations, rather than the 30% used by Baiman.
I have redone Baiman's calculations using values of both 60% and 80%, and sure enough, even with the lower of the two (60%), the probability of Ohio's results occurring by chance drops to 1 in 30, which not improbable, given the number of states.
So it would appear that whether you regard Ohio, South Carolina and New Hampshire as being out of line or not is critically dependant on estimates of the "design effect", and it would seem sensible to use values advocated by the polling company.
However, what remains clear in Baiman's analysis, and also in Mystery Pollster's, is that the exit polls, country wide, over estimated Kerry's share of the vote. Whichever way you analyse the results, this remains true, and is statistically significant.
The issue therefore becomes one of interpretation. Given that the exit polls did not predict the outcome, either the exit polls were systematically wrong, or the vote did not reflect the way people thought they had voted. Baiman makes a strong argument that the exit polls are likely to be accurate:
It is it hard to imagine that a professional exit sampling firm with a decades old reputation could make a systemic error of this magnitude. Indeed the National Election Pool and Edison/Mitofsky state that:
"The mistakes made during the 2000 elections were unusual. During the 10 years before that VNS and the poll before it made only one mistake from 1990 to 1998. Before that when the broadcast networks made their own projections there were similarly very few mistakes during the 1970s and 1980s. There were no mistakes during the limited coverage in 2002. There were no mistakes made during the 2004 Democratic primaries. Many lessons were learned from the 2000 experience and changes were made to see that mistakes like the ones in 2000 would be very unlikely to occur again."
And he concludes:
These unexplained statistical anomalies in the vote count in critical states, such as Ohio, Florida, and Pennsylvania, and in the national popular vote for the 2004 Presidential lections, indicate:
a) Implausibly erroneous exit sampling especially for the national sample and for
the most critical states where one would have expected pollsters to be most
careful, and/or
b) Election fraud and/or discriminatory voter suppression that resulted in a
in an election result in Ohio, Florida, and other states, and in the national
popular vote outcome, that is contrary to what would have occurred in a free
and fair election.
I conclude that, based on the best exit sample data currently available, neither the national popular vote, or many of the certified state election results, are credible and should not be regarded as a true reflection of the intent of national electorate, or of many state voters, until a complete and thorough investigation of the possibilities a) and b) above is completed.
If Baiman is correct, that the votes, not the polls were in error, a disproportionate number of Kerry votes must have been lost, either deliberately or through some bias in the structure of electoral fallibility. Baiman mades an interesting argument that as well as missing votes, voter suppression (i.e.vvotes not even cast) could lead to exit poll discrepancies as exit poll samples are weighted according to past patterns of turnout.
Against Baiman's argument that it is the vote counts, rather than the exit polls, that are erroneous, I would argue that it is dangerous to generalize from one election to another regarding exit poll bias (or lack of it). In the UK (where I live) some years ago John Major (Tory) won an election despite exit poll predictions that he would lose, or at best fail to win an absolute majority. His party was very unpopular at the time, and was blamed for a catastrophic drop in property values. The widely held explanation for exit poll discrepancy was what were termed "shy Tories" - people who had voted Tory out of a sense of insecurity but did not like to admit that they had voted for the party they also blamed for their misfortunes. I would argue, therefore, that it is at least possible that when people are insecure, they may be more likely to vote in a way they are are not ready to admit to!
However, these are not statistical points. What the statistics appear to show, is that, yes, the exit polls overstated Kerry's vote, but that no, Ohio was not particularly out of line. This suggests that either the Kerry vote was suppressed across a wide swathe of states, or that the exit polls were, for some reason yet to be addressed, wrong.