Tuesday's primary election in New York will be the first statewide election conducted entirely on optical scanners instead of lever machines. New York is the last state to abandon lever machines (unless a last-ditch lawsuit to keep them unexpectedly succeeds).
The transition has sharpened tensions among election integrity advocates in New York: some advocates support the transition to optical scanners, while others want to keep the lever machines indefinitely. I won't say much about that debate, because it is probably moot at this point. The main question now is how to manage the transition to optically scanned paper ballots. I think Precinct-Count Optical Scan (PCOS) is a good approach overall, but there are plenty of issues to deal with. Below the fold, I try to skim the high points.
Why change?
Substantively, the drawbacks of the lever machines -- in no particular order -- are:
- They are old and no one makes them any longer.
- They often have high failure rates on election day (although they usually aren't hard to fix).
- They are inaccessible, and many voters fail to find ballot issues on them (of which more below).
- They provide no record of individual votes, so if a machine is rigged, or isn't properly zeroed out, or jams but isn't taken out of service, there is no way to correct the vote count. (I don't know of any proven case of rigging in New York, but the other two scenarios have happened for sure.)
None of these arguments is necessarily a clincher. (For instance, in 2008, New York met the accessibility requirements of the Help America Vote Act (HAVA) by using two voting systems -- lever machines plus electronic ballot marking devices -- and in principle it could continue to do so.)
At any rate, New York is under two separate legal requirements to replace the lever machines. One is written in state law, the Election Reform and Modernization Act of 2005 (ERMA). Nassau County has sued to strike down ERMA on the grounds that it violates the state constitution. I am not a lawyer, but I do not see how that suit can succeed. The other is a federal court order that requires the state to replace the lever machines, on the grounds that they do not comply with the requirements of HAVA. Most recently, Judge Gary Sharpe issued an injunction requiring Nassau County to replace the lever machines this year; a Second Circuit panel upheld that injunction two days ago. It is still conceivable that the November general election could be conducted on lever machines, but it is very unlikely.
The 2009 pilot
About three quarters of New York's counties participated in the November 2009 pilot of the optical scanners. The proportion of voters who participated was much smaller, because New York City and most downstate counties declined to join the pilot, and most counties used the scanners only in one or a few polling places. However, the exciting special election in New York's 23rd Congressional District (CD 23) was conducted mostly on optical scanners -- as it happened, nine of the eleven counties in the district used scanners countywide -- so to some extent the scanners got a real workout.
For the most part, the pilot seems to have gone well; voters and election workers did not report many problems. In Erie County (see pp. 13-15), a programming error caused votes for one candidate to be allocated to another candidate. This error was revealed by the pre-election Logic & Accuracy tests, but election officials failed to correct it at that time. The error was fairly obvious after the election: a vote total of zero tends to stand out! It was corrected by rescanning and hand-counting the ballots in question.
After the election, there was a small flurry of articles in the Gouverneur Times alleging suspicious or impossible results in the CD 23 contest. Simply put, none of these articles amounted to much. Later, I examined all the CD 23 returns at the Election District (precinct) level, in comparison with party enrollment (registration) statistics and presidential returns from 2008. Remarkably, I found three large (but inconsequential) errors in the presidential returns -- which, apparently, no one had ever noticed. In contrast, the CD 23 results were all facially reasonable. The 3% hand count (of which I'll say more below) also indicated that the scanners were substantially accurate, although to confirm the election outcome, the audit sample should have been larger.
One crude metric of how voting systems performed is the residual vote rate -- the proportion of undervotes or overvotes in a particular contest. In this context, an "undervote" means that the voter (apparently) didn't vote for any candidate, while an "overvote" means that the voter (apparently) voted for more than one candidate. In either case, that vote isn't counted. Some counties report "blank" votes (undervotes) and "void" votes (including but not limited to overvotes) separately, but many do not, so I looked at the combined residual vote rate. In the CD 23 contest, the residual vote rate was 4.6% in the counties that used optical scanners, 6.9% in Clinton County, 14.2% in Oneida County, and slightly over 20% in Essex County. Essex reported surprisingly high turnout, at least in part because the district attorney race was hotly contested. At any rate, it appears that voters had at least as much success on the paper ballots as on the machines.
For state propositions, the residual vote rates were much lower on the paper ballots than on the lever machines. These rates averaged around 15% in the pilot counties (higher in St. Lawrence, where apparently many pollworkers failed to tell voters to check the back of the ballots), between 40% and 60% in most lever machine counties, and around 80% in New York City. I guess it is debatable whether it is good for more voters to vote on state propositions, but assuming that it is good, the optical scanners performed much better than the lever machines.
The problems I: security and reliability
One inherent problem with optical scanners is that they are programmable -- and, broadly speaking, anything that can be programmed can be hacked or simply misprogrammed. Careful security and testing procedures can forestall many of these problems. But if you really want to find out how accurately the scanners counted the paper, it makes sense to hand-count at least some of the paper in a post-election vote tabulation audit. (In a very close election like Franken-Coleman, a full hand count can alter the outcome even if the scanners performed perfectly.) It also makes sense to verify that the scanner counts are correctly reported in the final tabulation.
New York has probably the most rigorous auditing law in the country, but the law has holes. In New York, each county randomly selects 3% of its voting systems (scanners), and does hand counts for all the contests on all the ballots counted on those scanners. If a county finds a discrepancy rate greater than 0.1%, the county must do additional random auditing in that contest -- possibly expanding to a full hand count if discrepancies persist. (New York's audit regulation can be found here.) In all but the closest statewide contests, if a 3% audit reveals few errors, that is a pretty good basis for confidence in the results. (Of course the audit can be subverted, but not so easily.) However, in a smaller contest, a 3% sample is not very robust. I would like to see risk-limiting audits, at least in high-profile contests. Risk-limiting audits provide a pre-determined minimum chance of leading to a full hand count if that hand count would alter the outcome. In general, the smaller and the closer the contest, the larger the percentage of votes that need to be examined in a risk-limiting audit.
Even without risk-limiting audits, it is a problem to have every county separately determine whether to expand a contest audit. That means that some counties might do a full hand count in a contest while others don't, which could leave considerable doubt about the correct outcome. Such a situation might be sorted out in state court, but why wait? Also, the audit results could be much more convincing if candidates had a way to add certain scanners to the audit based on facially anomalous results. (County boards do have the authority to do additional hand counts, as happened in Erie County during the pilot.)
Of course, if detailed election results aren't available, who knows which ones are facially anomalous? When I examined the pilot results in CD 23, the county boards were all very helpful, but the results in individual Election Districts often weren't available for weeks after the election. The results ultimately came in a welter of data formats, none of them very convenient -- spreadsheets that required careful copying and pasting; PDFs that required special decoding or even retying; even scans of handwritten ledger pages. The move to optical scanners should make faster and better reporting possible. (Minnesota has a magnificent reporting system, with downloadable detailed results from every precinct in close to real time on election night.) But so far there is no word of plans to improve reporting.
The problems II: usability
Security concerns tend to draw a lot of attention, but the new systems pose at least two usability concerns.
Lever machines, when properly configured, make it impossible to overvote (to vote for too many candidates); paper ballots do not. Optical scanners can easily be configured to detect overvotes; the question is what happens then. The scanners being used in New York have the capacity to automatically "reject" such ballots, returning them to the voter -- but this feature is not being used in New York.
I went to my county board, where a demonstration machine has been set up, and cast an overvote just to see how confusing it was. Sure enough, the machine displayed an arcane message about an overvote and invited me to push either a green button to cast my ballot, or a red button to correct it. The message didn't explain what an overvote was, or that if I cast my ballot, I would lose my vote in that contest. Thanks to a study by the Florida Fair Elections Center, we already know that this approach doesn't work very well. In the 2008 presidential election, the DS200 (one of the two systems to be used in New York) had an in-person overvote rate of 0.43%, compared to as low as 0.03% for other scanners. A lawsuit is pending to require that the scanners automatically reject overvoted ballots. (Improving the warning message will take longer.) Overvoting appears to have been rare in the pilot, probably in part because there were fewer candidates in those contests than in the presidential contest.
New York ballots can be insanely hard to read -- especially in parts of the state (mostly New York City) where the ballots are multilingual. Here is an example of a trilingual absentee ballot from the 2008 election; some ballots use four languages. Given all the other requirements and design constraints, the ballot type can be very tiny, although the candidate names in English generally are fairly readable. (I don't read Chinese, but if I did, I suspect I would need a magnifying glass.) Frankly, the lever machines are no treat to read either. But now that millions of New Yorkers, instead of a relative handful of absentee voters, will depend on paper ballots, we can hope for some careful usability studies and thoughtful design improvements.
At this point I don't have detailed (or especially informed) comments about using these systems as ballot marking devices (BMD) with special accessibility features. Based on the stories I've heard, many of these systems have been long on promise and short on performance. The safest bet is that not many people will use the BMD features this year, so it will not matter very much how well or poorly the features work.
Conclusions?
I'll miss voting on lever machines, but I like voting on paper. YMMV. But if we're going to be voting on paper, like it or not, we have to make it work as well as possible.
Based on last year's pilot, I'm cautiously optimistic that this year's elections in New York will go fairly well -- and I'm certain that things will go wrong on Tuesday. If we're lucky, the problems will be fairly subtle and small, as they apparently were in the 2009 pilot. If we're wise, we'll learn from experience and make improvements as quickly as we can.