Daily Kos Elections has just embarked on our quadrennial project to calculate the results of the most recent presidential election for all 435 congressional districts, an effort that first began in 2008. We’ll be publishing data on a rolling basis, so you can keep up with our releases by subscribing to our newsletter, the Morning Digest, or following us on Twitter, and also by bookmarking this post, which we’ll populate with new numbers as soon as they’re available.
The most common question we get about this undertaking is why this information isn’t immediately available, which is a great opportunity to explain where the data we use comes from and how we get it into its final format for your usage and enjoyment.
States could easily provide this data themselves, and a few do, like Minnesota. (Note, though, that these are unofficial results—more on that in a bit.) Most, however, do not, even though there's no excuse for not doing so. That means we have to calculate the results ourselves, and that's a very tricky process.
For starters, election returns are most commonly provided at the county level. However, almost every state with more than one congressional district splits counties between congressional districts. That means you need precinct-level election results in order to know what part of each county belongs in which district. And that's where things get really hairy.
Some states provide precinct-level results for the entire state in a user-friendly format at a central location. As long as you know which precinct belongs to which district (more on this later, too), then calculating presidential results for each district is usually straightforward—though not always.
But the states that don't offer precinct data in one central place? Those states make life hell. For them, you have to go county by county, and that's a brutal process. Sometimes counties post precinct results online, and some are in usable formats, like spreadsheet files. Some are less useful, like native electronic PDFs that can be converted with software but require a lot of effort to reformat.
The worst, however, are in scanned PDFs that are just brutal to convert—OCR typically chokes on them—and usually have to be reentered manually. And some results, believe it or not, are handwritten. The most amazing of all was an Excel file where numbers were represented by clip art images. (And yes, we actually once needed to use that file.)
And that's if they're online at all! Even in this day and age, many counties do not post results online. Some don't even have websites. For these, you have to call them up and ask for their results one at a time. If you're lucky, they'll email them to you. Sometimes they will only fax them to you. (We maintain a fax service just for this eventuality.) Sometimes they will only send them to you by mail, meaning you have to scan them in yourself and deal with yet more difficult-to-parse data.
Some jurisdictions even make you pay for the results despite the fact that all of this information is (or should be) publicly available. We've had counties try to charge as much as $100! Some also take many months to make their data available—Nassau County, New York, for instance, has been one of the worst serial offenders.
All in all, we expect to have to collect results one by one for over 200 separate counties and cities. Yet once you have the data, there's a whole host of other issues to confront. States handle absentee votes differently, for instance. Some will assign all absentees in a single district to just one precinct, so dividing them up is an art. That’ll be an even bigger issue this year given mail voting’s massive surge in popularity.
Then there are split precincts—these are the worst. Many states divide precincts between districts, but election boards usually only report totals at the precinct (not sub-precinct) level, so allocating split precincts is also an art. This one of the issues we were referring to above when we talked about knowing which precinct is assigned to which district—the answer isn't always obvious. We've developed techniques to handle all of these difficulties and many more, but they all present challenges.
And one more thing: Most states take a while to certify their results, so we can't even start on most of them yet. You may see some preliminary calculations floating around out there based on unofficial results, but we strongly advise against relying on these. Final, certified results will always differ—sometimes materially so—from unofficial returns, but it’s always worth the wait.
So there's a lot of data collection, data cleanup, and raw calculation to be done. A ton, in fact. In spite of all this, we love this project, which is why we've been at it for over a decade. This data is really important to have for so many reasons, not least of which is that voter behavior at the top of the ticket still correlates closely with down-ballot preferences. It's an imperfect measurement, especially given the resurgence in ticket-splitting we saw this year, but in an era when the reliability of polling data is increasingly under question, these results give us something firm to grasp.
All of our calculations are also completely transparent. We make every data file we rely on publicly available, and we do the same with the spreadsheets that contain all of our calculations, which show every formula we use. There's no black box, no secret sauce. You can download everything and play with it yourself, and if you ever think you might have spotted an error, you can reach out to us and we’ll look into it.
Fortunately, those are extremely rare. We take immense care to ensure our calculations are as accurate as possible, with many built-in checks to guarantee that the numbers we input and the results our spreadsheets pop out are as flawless as can be. Long ago we set out to provide the gold standard for presidential results by congressional district. Given the widespread adoption of our data by journalists, academics, election enthusiasts, and campaign professionals on both sides of the aisle, we believe we have succeeded, but we will never take our reputation for granted.
Many people over the years have helped us every step of the way throughout this process, and we know many more will once again. We are grateful to everyone who has made this project possible, and we look forward to recognizing you once more when this iteration concludes. (In case you’re curious, we finished the 2016 results at the end of Jan. 2017.)
Lastly, we have one request. We often see our data cited without attribution or a link. We are always delighted that so many people make such frequent use of our numbers, but as we say, it really is a lot of work! We'd be grateful, therefore, if you could give us a mention and a link whenever you use our results. You can always find them at this shortened link that can fit in any tweet or on any chart: http://dkel.ec/presbycd. Thanks so much!