By now hundreds of statistically minded people are poring over Iranian election data. Here's part 1 of my analysis, which is also cross-posted at the Princeton Election Consortium.
Three general categories of data are currently available for validating the Iranian election: (1) pre-election polls, (2) statistical methods for analyzing standalone voting data, and (3) statistical comparisons with past elections. In this diary I examine pre-election polls, which when aggregated under normal conditions are very good measures of sentiment.
A simple look at pre-election polls leads to the following assessment: National Iranian polls were highly variable and of suspect quality. But within Tehran, polls were more uniform and allow a comparison. Six Tehran polls gave a median lead for Moussavi by 4%. This differs notably from the official tally for the city, Ahmadinejad by 12%. The 16-point discrepancy suggests an anomaly in Tehran and opens the question of whether fraud occurred here - and elsewhere. However, it is also important to note several caveats, including polling uncertainty and possible shifts in opinion following the Ahmadinejad-Moussavi debate on June 3rd.
Details after the jump.
In the US Presidential elections of 2000, 2004, and 2008, election-eve opinion polls were remarkably accurate predictors of the electoral outcome. In each case, aggregated, unadjusted polling data successfully identify key states (Florida, Ohio, and Pennsylvania) and came quite close to the final outcome. It's a testament to the power of polls.
The Iranian election presents a harder case. Polling is sparse, professional standards of reporting polls are absent, and respondents are potentially unwilling to answer questions or hard to reach. Still, let's look at the publicly available polls.
National polls are all over the place, even if we only take data after the Ahmadinejad-Moussavi debate on June 3rd, potentially a major decision point for Iranians. One post-June 3rd poll shows Ahmadinejad +16% (47% to 31%). The other shows Moussavi +32% (23% to 54-57%). Earlier polls range from Ahmadinejad +33% to Moussavi +30%. These data are so variable that they are unusable.
Polls within Tehran may be a better source of data. This is plausible because urban areas are easier to sample. The last three Tehran surveys (before June 8th, before June 7th, and June 3rd) show Moussavi +4%, Ahmadinejad +8%, and Moussavi +17%. Before that, three surveys showed Moussavi +12%, +4%, and +2% (May 27, May 26, May 14). The averages are
Last 3 polls: Moussavi +4 +/- 7% (median +/- MAD-based SEM).
All 6 polls: Moussavi +4 +/- 4%.
The announced official result was Ahmadinejad +12% (51.6% to 39.4%), a discrepancy of 16 points. When all 6 polls are used, this discrepancy is highly significant (p=0.003).
For now, my interpretation is that the official returns in Tehran are unbelievable. However, I can think of two alternate explanations.
(1) Ahmadinejad really mopped the floor with Moussavi in the debate. The experience in U.S. elections is that debates provide a side-by-side comparison that can shift opinion substantially (for a famous example see Carter-Reagan 1980). In the case of Iran 2009 there are only 2 or 3 post-debate polls. A comparison using just 3 polls does not quite reach statistical significance (p=0.07).
(2) Tehran polls have a systematic overall pro-Moussavi bias that prevents a direct comparison with vote counts. For example, as pointed out by David Shor, polls might have been restricted to the actual city of Tehran, which is not all of Tehran province.
I should emphasize that Tehran is not representative of the entire nation. It is notably more pro-Moussavi, which can account in part for the public anger there. In fact, if the 16-point discrepancy were corrected nationwide it would still not be enough to alter the overall outcome.
Iranians and other knowledgeable observers, please comment.