The outcome of the November election was a surprise, and U.S. elections have lots of security problems. So it’s not surprising that, once again, some folks are marinating in election fraud conspiracism rooted in those facts and a fair amount of nonsense. There’s a market for this, and so there are willing suppliers. As someone who spends a lot of time working on election integrity issues, I take it kind of personally when folks serve up bunk on the subject. So I’m pretty annoyed at a fellow named Mike Farb who has a website and Twitter feed full of it. Mostly I try to ignore him, but his name came up in comments on a recent rec list story, so let me briefly explain what is wrong with the North Carolina ‘research’ he is promoting. (I’m trying not to promote it myself, but note the link a few sentences down.)
tl;dr: Trump often does better in larger precincts because larger precincts often are more Republican. Go figure.
The argument
Soooo, that graph above is my replication of a graph posted by Mike Farb in this Twitter thread — one of several graphs that, according to Farb, are “very suspicious.” His explanation reads, to me, a bit like a parody of an AI program generating free verse technobabble:
First lets talk about how to read our graphs
We are using CVT Analysis
I call it the law of large numbers
Its based on a very simple theory
Numbers are expected to look more consistent
when looking at larger sample sizes
Its like flipping a coin.
Flip it only 10 times and you may get 7 heads
Flip it 500 times and you would expect closer to 50% heads
The same works for Elections
You would expect to see more consistent numbers
When you look at larger precincts.
It’s not about how the vote breaks down
It is about how it compares to different precinct sizes
Also if there was a hack you would expect to see it more
At the larger precincts
that makes CVT a perfect way to look at Election Results
If that makes no sense, no worries. (1) Apparently there’s a whole website devoted to it. (2) It really doesn’t make sense, on several levels. Most superficially, it often verges on gibberish. (E.g., maybe Mike Farb really does call CVT analysis “the law of large numbers,” but even if you don’t know the terms, you’ll suspect that they aren’t synonymous.) But, hey, anything that is so hard to explain on Twitter must be important, right? Alas, no.
CVT stands for “Cumulative Vote Tally.” (I explained this approach a bit more patiently in a story last year.) The basic idea is to sort the precincts from smallest to largest (fewest to most votes cast), and then calculate cumulative vote shares for each candidate, starting in the smallest precinct and progressively adding larger precincts. This graph shows that in Union County, North Carolina, Hillary Clinton dominated in the precincts with the fewest votes cast, but Donald Trump dominated in larger precincts.
You may have noticed that the vote shares do get “more consistent” when looking at the “larger sample sizes” on the right side of the graph. (Of course most people wouldn’t call these “samples” at all; they certainly aren’t random.) But presumably Farb means that there is no good reason for Trump to do better in large precincts than small precincts, so it is highly suspicious that he did — not only in Union County, but in others.
The reality check
Actually, there is a screamingly obvious reason for Trump to do better in large precincts in Union County: they contain relatively more Republicans. Here’s another graph, with the precincts sorted in the same order, but showing the Democratic and Republican shares of two-party voter registration. (Libertarian and unaffiliated voters aren’t included.)
Hey, this looks kinda like that other graph...
The smallest precincts are, in fact, far more Democratic by registration than the larger precincts. Not only is it not suspicious that the “Cumulative Vote Tally” graph doesn’t rapidly flatten out, but given these registration figures, it would be shockingly weird if it did.
Now, conspiracism always has a fallback. (The ultimate fallback, often combined with other approaches, is to say that critics are trying to suppress discussion and discourage questioning, and/or to complain about their tone, disregarding substance altogether.) Here an obvious fallback is that the vote trends are more dramatic than the registration trends. Republicans have about a 23-point edge in two-party registration overall, but Trump won the county by closer to 36 points. Is this suspicious? Not really. For many reasons, party registration doesn’t perfectly predict vote shares. But once we take registration shares into account, the apparent relationship between “size” and vote shares evaporates.1
If we want to judge whether some results are suspicious, we need to have some plausible benchmark of what to expect. Just assuming that small precincts should be indistinguishable from large precincts doesn’t pass the giggle test. They aren’t.
OK, so what?
As I wrote last year, if a voting system isn't verifiable, knowing that it might be working isn't a great comfort. North Carolina’s voting systems actually are more verifiable than many states’, given that all votes are cast either on paper ballots or on Direct Recording Electronic systems with voter-verifiable paper records — and after every presidential election, random samples of these votes are counted by hand to audit the original counts. But in many places — including, e.g., Georgia and most of Pennsylvania — individual votes aren’t recorded on paper at all. In others, the votes are recorded on paper, but there are poor or no procedures for auditing or recounting the paper. That’s bad, and we should work to fix it.
Does pretending to find it suspicious that Trump did better in precincts with more Republicans help to fix anything? This I doubt. Not to say that Farb is pretending: I’m inclined to think that he actually didn’t notice, didn’t wonder, didn’t think to look. But that’s alarming. When someone launches a site called Unhack The Vote and solicits donations to find evidence of hacking — and then breathlessly reports purported evidence that is embarrassingly weak — how is that in any way a good thing? I can’t see it. It worries me, and it bugs the hell out of me.
---------
(1) That is, in OLS multiple regression controlling for the Democratic share of two-party registration, the relationship between number of votes cast and Clinton vote share actually is faintly positive but indistinguishable from 0: t = 0.45. Or, controlling for Democratic and unaffiliated shares of all registration, the relationship is faintly negative and even closer to 0: t = -0.19. Of course this is a simplistic approach, but there’s no hint that higher-grade statistical pyrotechnics would yield a more interesting result.