Any national election involves many people at many locations, and thus is vulnerable to organized election fraud by local officials.
I'm interested in the Mexican national election system for what I can learn about paper ballot systems--their safeguards against official fraud at precincts, their possible vulnerabilities and how to limit them, and the clues and evidence they can generate to help detect official misbehavior after the fact.
I'm organizing my education about Mexican elections around publicly-available data I downloaded from IFE, the Mexican national election institute. In this first post I lay out what data are available and what kinds of analyses may be possible for them. I could use some help from fellow kossacks:
(1) Can anyone explain the "VCAP" variables, numbers 14, 16, 18, 20, and 22 in the table below, along with any others that have question marks next to them?
(2) Am I misinterpreting something?
(3) What's missing from my analysis?
(4) What news links have I missed that are germane to
understanding the Mexican electoral system and how it might go wrong?
(5) Do you have links to socioeconomic data
or GIS coordinates for Mexican electoral units?
I'd especially appreciate hearing from other kossacks who can tell me about relevant Mexican politics or whose command of Spanish is better than mine.
Charges of fraud in this year's election
Most of us have heard charges of rampant official fraud in the apparent narrow "victory" of rightist Calderon over center-left Obrador, the former Mayor of Mexico City. The three charges I've heard most often involve:
(1) hundreds of thousands of uncounted ballots, amounting to far more than Calderon's current margin of advantage over Obrador;
(2) widespread ballot-box stuffing, captured on video for at least one precinct ("square" is the official translation of "casilla", but remember I'm maximizing analogies to US electoral procedures); and
(3) irregularities in tally sheets arriving at district election HQs later than most. There are well over 100,000 Presidential electoral precincts. If you've read about crooked elections (as in biographies of longtime former Chicago mayor Richard A. Daley), you know why some key precincts routinely report much later than most others: Precinct captains must wait for late-night calls telling them precisely how many votes their Party needs from them What the party needs is exactly how many votes they've got!
But I've not seen the kind of detailed analyses of these charges I'm used to seeing for recent elections here in the US. And some of the apparent safeguards against fraud built into the Mexican system have to be teased out of a detailed look at the data items precincts are supposed to submit to IFE.
What data are publicly available?
On the Mexican national election institute's website, I found a minimal "PREP" database of 117,287 precinct-level Presidential election returns along with a page and a half of documentation. The ascii file inside the ZIP archive was time-stamped 7/10/06 at 14:26. My Spanish is very rudimentary, but with help from fellow kossack ourobouros, I believe I've deciphered all but five of 30 data items (all but the "..VCAP" variables). Am I misinterpreting something here?
Just in case the column heads are hard to read in the following table, here they are a second time:
(1) Variable number
(2) My name for the variable
(3) IFE's label
(4) Number of nonmissing values
(5) Mean
(6) Sum
(7) Largest value
(8) Smallest value
(9) Number of missing values = 117,287 minus N
# | Variable | IFE Label | N | Mean | Sum | Max | Min | Nmiss |
1 | STATENO | ESTADO (State ID #, 1-32) | 117287 | 16.43 | 1927422 | 32 | 1 | 0 |
2 | STATE | NOM_ESTADO (Name of State) | | | | | | |
3 | DISTRCT | DISTRITO (1-2 digits) | 117287 | 8.52 | 1000103 | 40 | 1 | 0 |
4 | SECTION | SECCION (up to 4 digits) | 117287 | 1575.27 | 184759804 | 6183 | 1 | 0 |
5 | PRECNCT | ID_CASILLA (1-2 digits) | 117287 | 1.28 | 151299 | 43 | 1 | 0 |
6 | PCTTYPE | TIPO_CASILLA (B C E S) | | | | | | |
7 | EXTCONT | EXT_CONTIGUA (Within-casilla ID #) | 117287 | 0.00 | 969 | 9 | 0 | 0 |
8 | ACTA | NUM_ACTA_IMPRESO (2 eq Presidente) | 117287 | 2.00 | 234574 | 2 | 2 | 0 |
9 | RECEIVD | NUM_BOLETAS_RECIBIDAS (Received) | 116534 | 572.74 | 66744615 | 8000 | 0 | 753 |
10 | LEFTOVR | NUM_BOLETAS_SOBRANTES (Leftover) | 115909 | 247.67 | 28707301 | 7715 | 0 | 1378 |
11 | NUMVOTR | TOTAL_CIUDADANOS_VOTARON | 113231 | 332.93 | 37698365 | 1806 | 0 | 4056 |
12 | CAST | NUM_BOLETAS_DEPOSITADAS | 109650 | 327.19 | 35876783 | 2280 | 0 | 7637 |
13 | PAN | PAN (Votes for Calderon) | 117287 | 119.43 | 14008198 | 608 | 0 | 0 |
14 | PANVCAP | PAN_VCAP (117,287 blanks???) | | | | | | |
15 | APM | ALIANZA_POR_MEXICO(Roberto Madraza) | 117287 | 70.91 | 8317526 | 420 | 0 | 0 |
16 | APMCAP | APM_VCAP (117,287 blanks???) | | | | | | |
17 | PBT | POR_EL_BIEN_DE_TODOS (Obrador) | 117287 | 116.06 | 13613416 | 648 | 0 | 0 |
18 | PBTVCAP | PBT_VCAP (117,287 blanks???) | | | | | | |
19 | NA | NUEVA_ALIANZA (Roberto Campa) | 117287 | 3.27 | 384189 | 288 | 0 | 0 |
20 | NAVCAP | NA_VCAP (117,287 blanks???) | | | | | | |
21 | ASCD | ALTERNATIVA_SOCIALDEMOCRATA(Mercado | 117287 | 9.25 | 1085079 | 218 | 0 | 0 |
22 | ASVCAP | ASDC_VCAP (117,287 blanks???) | | | | | | |
23 | NREG | NUM_VOTOS_CAN_NREG (Valid writeins | 104290 | 2.69 | 281116 | 251 | 0 | 12997 |
24 | UNCOU1 | NUM_VOTOS_NULOS(Uncounted, w msngs) | 112806 | 7.33 | 827206 | 509 | 0 | 4481 |
25 | TOTVOTE | TOTAL_VOTOS | 117287 | 328.39 | 38516730 | 779 | 0 | 0 |
26 | CASILLA | WHETHER CASILLA IS URBAN (Y=1,N=2) | 117287 | 1.27 | 149353 | 2 | 1 | 0 |
27 | NOMINAL | LISTA_NOMINAL (Registered voters?) | 117287 | 557.62 | 65402655 | 750 | 0 | 0 |
28 | CHRTIMR | HORA_RECEPCION_CEDAT: RCVD at IFE?? | | | | | | |
29 | CHRTIMN | HORA_CAPTURA_CEDAT: (ENTERED???) | | | | | | |
30 | CHRTIMC | HORA_REGISTRO: (DATA TALLIED???) | | | | | | |
First, notice that this datafile does not have all of the official election results:
The idea in this thread is to use available data to understand better just how the Mexican process works, not to do an official recount!
The Washington Post's excellent primer on the Mexican elections says,
"Official results gave Calderón (PAN) 15,000,284 votes, or 35.89 percent. Obrador (PRD) received 14,756,350 votes, or 35.31 percent. Roberto Madrazo (PRI) received 9,301,441 votes, or 22.26 percent. Other votes include Patricia Mercado with 1,127,963 votes, or 2.7 percent; Roberto Campa with 401,804 votes, or 0.96 percent; and write-in candidates with 297,989 votes or 0.71 percent. There were 904,604 invalid votes and 40,886,718 valid votes."
If you notice the "Sum" for TOTVOTE (Number 25) in the table above, you'll see it is more than 2 million votes short of the official total. A small part of that 2 million is absentee ballots. I'm not sure, but I'd guess that most of the rest is rest is accounted for by the Mexican version of the Palm Beach counting room in the 2000 Florida Presidential election. Remember the news clip of a bald guy in a white shirt, holding a punch card up to his eye? That's where most of the "missing" votes are. Again, a Washington Post story has the most plausible explanation I've seen:
"... the counting underway right now is not an extraordinary procedure. It happens each election in 300 districts across the country, all overseen by the Instituto Federal Electoral (IFE).
"In the Mexican electoral process, polling station volunteers compile the ballots and put them together in an 'electoral package.' Along with the package, the station president attaches a report that includes the complete tally contained within," the the Miami Herald's Mexico edition reports. "The most basic information -- the quantity of votes for each presidential candidate, for example -- is visible on the outside of the envelope so that when the entire package is delivered to regional compilation centers, polling workers can immediately enter the information into the (electoral commission's) computer system."
Those preliminary results are known as the PREP. It is not -- repeat NOT -- a complete accounting of every single vote cast. Confusion arose Monday when López Obrador began referring to 2 million to 3 million "missing" ballots. In reality, those were set aside because of "'inconsistencies' such as poor handwriting or extraneous marks on the tally sheets attached outside each ballot box," according to Luis Carlos Ugalde, president of the federal election commission."
Safeguards against fraud, and the "footprints" fraudsters may leave behind
The first 8 variables in the table provide identification for the reporting unit and tally sheet. Variables 9 through 12 evidently are meant to be safeguards against fraud. The central or district location apparently keeps records of how many counterfeit-resistant blank ballots were sent to each precinct (number 9). Each precinct is supposed to report how many ballots were left over (number 10) when Election Day was over. A physical headcount (number 11) would show how many people actually showed up to vote. A tally of ballots cast (number 12), including those not counted (number 24) as well as those counted (number 25) ought to be the difference between ballots received and ballots left over. But take a look at the last column. Notice that thousands of precincts failed to report some or all of these items. Among those precincts who did report them, there may be important discrepancies.
Analyses of patterns of missing data, discrepancies, and the political leanings of precincts where they occur has the potential to uncover clues to possible election fraud.
The rest of the data items
Variables 13 through 23 include counts of votes for President for each of the five parties on the ballot plus the total number of valid write-in votes (number 23). Since it's very unlikely that the top votegetters in a Presidential election both would be write-in candidates, they can be lumped together as a synthetic "sixth candidate". Each of the five parties has a pair of variables. The first is a tally of votes. I'm not sure what the other variables (...VCAP) are meant to show. I believe they have something to do with whether uncounted votes were illegible or completely blank. Does anyone know?
Variable number 24 is the number of ballots each precinct reported it did not count. Notice that 4,481 of the 117,287 precincts in the file did not provide this information. Hmmmm....
Variable 26 tells whether or not the reporting unit is urban.
If I'm not mistaken, variable number 27 is the number of registered voters, important for reporting turnout and calculating whether low turnout helped one candidate more than another.
If I'm translating correctly, the final three variables are time stamps for when the package of tally sheets from each precinct was received at a district data-entry office, when it was keyed or scanned, and when it was tallied.
Thus most of the Mexican Presidential precinct data are online and available for many possible analyses of irregularities.
My next post in this thread (maybe manana) will include such analyses. I'll start with an analysis of the political leanings of precincts that did not report key items such as the number of uncounted votes.