Earlier today I forwarded a data file for upload to Dave's Redistricting App that features 2008 & 2010 election data by block group as compiled by the California Statewide Database at UC Berkeley. However, as detailed below, this is an incomplete data set that assigns 97.8% of the statewide Obama/McCain vote and 99.4% of the statewide Brown/Whitman vote. The reasons why I decided to go ahead and forward the data set for upload anyhow are threefold:
1) So far as I've been able to determine, this is the most complete California data set available of election results at the block group level. To be more precise, the UC Berkeley Statewide Database translates the election data to the census block level, which I then compiled to the block group level analogous to the DRA. By comparison, the Polidata election figures that are widely being used to calculate the partisan composition of proposed districts assign only about 95% of the 2008 presidential votes.
2) For most intents and purposes, this data will be more than adequate to determine the partisan characteristics of a proposed district. This is particularly true of the 2010 Brown/Whitman figures where even LA County has 98.5% of its votes assigned. The main problems will arise with the Obama/McCain figures if someone wants to cross county lines in and out of LA County to a significant degree. As the data set only assigns 94.9% of the 2008 LA County vote, this would clearly distort the partisan figures of such districts in favor of neighboring counties with more complete data sets (by contrast, the data assigns 99.6% of Kern votes, 99.2% of O.C. votes, 97.7% of Riverside votes, 98.5% of San Bernardino votes, and 99.9% of Ventura votes). San Mateo is the other significant underweighted county, with 93.2% of 2008 presidential votes assigned by the Statewide Database.
3) Most important as to why I've decided to go ahead and forward the data for upload is because, far and away, the most expedient route for me to complete the data set is to have it available on Dave's App. If I had reason to believe that the missing votes were uniformly distributed within a given county then the fix would be easy: I could simply write a database query to distribute the missing votes in proportion to the assigned votes. Since that is probably not the case, what I will need to do is layer shape files of election precincts over shape files of the block groups. I will then need to block off each county in roughly corresponding sets and compare the numbers to see how many votes are missing, if any. That way, I can zero in on where missing votes should be assigned relatively swiftly (to be sure, nothing is actually "swift" when dealing with California data on this level). This would take tremendously more time if I do not have the election data set available in an interactive visual interface such as DRA.
In the meantime, the chart below features how many votes are missing for each candidate within each county, as well a statewide total of unassigned votes for each election. As alluded to above, once I can work with the data in DRA, I will go county by county and begin assigning these votes. My goal is to have a complete data set within a couple weeks, though it may well be sooner than that. The key factor will be how widely distributed the missing votes are in a given county. Obviously, the fewer precincts that are at issue, the more swiftly I can isolate them and figure out where the votes need to go.
The presidential data for Del Norte is a special case, which is why it's highlighted in red below. The data set actually provides a breakdown at a level higher than block groups, so as a stop-gap measure I simply redistributed the presidential votes in accordance with the partisan distribution of gubernatorial votes. Since the whole county had a combined 9290 Obama/McCain votes and is very unlikely to be split on a congressional map at least, it would be a very rare mapping scheme in which this might matter anyhow. That said, I will of course properly match up those votes in the course of completing the presidential data set.
So, there you have it. If there are any questions or suggestions, then please post a comment! I'll be checking in regularly all evening.