Daily Kos has seen several diaries, such as Jen Hayden's, that describe allegations of suspicious statistical patterns in election results from Kansas and elsewhere. Generally these diaries call for better election verification, and I agree with that call. Does it really matter whether the statistical arguments make sense?
In one sense, not at all. If a voting system isn't verifiable, knowing that it might be working isn't a great comfort. However, if we wonder whether there is really alarming evidence of massive fraud, then we have to weigh the evidence. The main line of argument is based on work done by Francois Choquette and James Johnson back in 2012. Various Kossacks have offered their critiques of the work in comments, but I don't think I've seen a critique. So I decided to write one — short, with pictures, but I hope a fair introduction.
tl;dr version: this work hasn't caught on (except, apparently, recently in Kansas) because of its faulty assumptions and bizarre implications.
1. Choquette and Johnson's "Perfect Example of the Alleged Election Fraud"
Here is a snippet from Figure 5 in Choquette and Johnson.1 Most of their analyses look basically like this figure, so once you take some time to understand it, you're almost done.
2012 Iowa Republican caucuses
What's that? The green line represents Mitt Romney's cumulative vote share
, where "cumulative" means that the votes are being added sequentially from smallest to largest. For instance, in the 500 or so small precincts that contain 10,000 votes, Romney's vote share was about 0.18, or 18%. Eventually, as larger precincts are included in the totals, Romney's share begins to increase. By the time all the precincts are included — over 1700 precincts containing over 120,000 votes — Romney has 24.5% of the vote, practically tied with Rick Santorum (the blue line). Ron Paul, in red, starts strong in the small precincts, but fades to third.
Choquette and Johnson find this trend highly suspicious. They suggest that votes have been stolen for Romney in the larger precincts. Would you expect Iowa precincts with, say, 20 Republicans, or 70, to be just like precincts with hundreds of Republicans? I wouldn't. I actually see no reason to climb aboard this thought train in the first place — no reason to expect these lines to be flat.
But if small precincts were like large precincts, as C&J assume, then the results on the left side of the graph should be representative. Then Santorum should have won Iowa, and Romney should have come in third, over 7 points behind.2
Santorum never led Romney in any poll, much less by 7 points. But let's reserve judgment and look at these data in another way. Here we will look just at Romney's votes as a share of votes in each precinct.
Each point represents the results from one Iowa precinct. (Some points overlap.) The blue line is a loess smoother similar to the trend lines you may have seen on HuffPo Pollster. We can see that, on average, Romney's vote share increases rather steadily as the number of votes cast increases, leveling out somewhat for precincts with more than a few hundred votes.
Remember, C&J's idea is that votes may have been stolen for Romney in the larger precincts, presumably because fraud there would be harder to notice or to prove. That story might be plausible in some of the datasets C&J examined. But here it makes very little sense. The steady increase in Romney's vote share begins with even the smallest precincts: for instance, Romney got 14% of the votes in precincts with 1-5 votes, but 17% in precincts with 6-10 votes. This doesn't look like someone committed hanky-panky in the larger precincts: it looks as if the more Republicans there were, the better the establishment candidate did.
So, why were C&J suspicious? I'm not sure. Partly because they found broadly the same pattern in almost every state — although the explanation I just gave tends to work in every state. Partly because, they say, they generally couldn't find similar patterns in elections that didn't involve Republicans — but they must not have tried very hard, because I looked at four 2008 Democratic primaries and found strong patterns in all of them. Here is perhaps the weirdest, if one is thinking of these patterns as fraudulent:
In the Minnesota caucus, Barack Obama beat Hillary Clinton by fewer than 10 points in small precincts (depending on how one defines "small"), yet won the state by over 30 points. If C&J had noticed this result, they might have construed it as evidence of massive fraud involving precincts of almost all sizes. But the Minnesota caucus was conducted on hand-counted paper ballots. And to my knowledge, participants and observers throughout the state agreed that Obama had handily defeated Clinton. So I think this result gives a pretty good indication that C&J's method just doesn't work.
(Although C&J didn't look at this caucus at all, some of their extrapolations for the 2012 Republican primaries are similarly... surprising. Would you believe that Newt Gingrich actually won Florida by 16 points, even though Romney led in sixteen consecutive polls, often by double digits? I wouldn't.)
One more example: C&J look at general election results from Ohio in 2008. They say their analysis shows Barack Obama "losing thousands of votes to John McCain through this anomaly.... Again, we need to emphasize that there is no reasonable explanation (other than Election Fraud) for such a nearly perfect linear relationship between precinct size and candidate success."
OK, look at that graph. What are they saying?
Did they really think that Obama beat McCain by 57% to 41% or so, but McCain stole votes all over the state to make it close? Were they aware that Ohio was a battleground state in 2008? Did they care?
If we had reason to believe that large precincts were randomly scattered around Ohio like raisins in a stochastic fruitcake, then it might make sense to ignore how a landslide-plus-fraud scenario contradicts our prior knowledge. But, not so much. For instance, in Cuyahoga County in 2008, 32% of all precincts were in the city of Cleveland, but only 6% of precincts with over 600 presidential votes were in Cleveland. When the precincts with the most votes tend to be in the suburbs, it isn't surprising that Republicans tend to do better there.
C&J's entire enterprise makes no sense. If we want to judge whether the results in some precincts (or set of precincts) are surprising and possibly erroneous, we need to have some plausible benchmark of what to expect there. The baseless assumption that precincts with few votes should be indistinguishable from precincts with many votes doesn't meet that standard.
In my previous diary, I showed how Beth Clarkson's adaptation of C&J's methods fails when applied to Kansas. No wonder. (Also note leviabowles's diary from earlier today, and his previous work.)
1 Francois Choquette and James Johnson, "2008/2012 Election Anomalies, Results, Analysis and Concerns,” Version 1.5, September 2012, archived here
2 Specifically, Choquette and Johnson estimate "the number of votes that were gained or lost for each candidate" by examining the median vote share for each candidate between 5% and 20% of cumulative votes cast.