Stochastc Democracy explains the difference between House Effects(Bias) and Design Effects(PIE), problems with Nate Silver's Pollster Ratings system, and a new and more accurate way to estimate both in real-time.
One of the key findings is that in election horse-race polls, PIE does not differ significantly among Pollsters, with House effects(Bias) being more pronounced then previously thought.
Estimated Pollster Introduced Error vs Number of Polls in Obama Approval Database
See graphs, House-Effect Tables, and more, below the fold or at StochasticDemocracy.com
****Cross-Posted at StochasticDemocracy.com*****
Nate Silver's new Pollster Rankings released last week have proven influential, leading to the dismissal of R2K to DailyKos, and prompting a mini-feud between Pollster.com and FiveThirtyEight.
Nate's methodology is fairly complex, but at heart, he judges pollsters by looking at how close their last poll was to an election-result. This seems intuitively fair, but mathematically, this poses severe problems for what he is trying to do.
Arithmetically, a poll before election day can be expressed as Poll=Opinion+Bias+Sampling Eror, so absolute error from an election result can be expressed as |Opinion + Bias + Sampling Error - Result| = |Last Minute Movement + Bias + Sampling Error|[1].
The problem arises from trying to study Sampling Error from a statistic whose effects are mainly dominated by last minute movement and Pollster Bias. Because last minute movement and Pollster Bias vary considerably between elections, adjustment would be very difficult. Even worse, this method throws away the data that comes from the vast majority of polls that come earlier in the cycle. There exists a better way!
In order to understand how to proceed, it's important to stress the difference between House Effects(Bias) and Design Effects(PIE) and their implications for forecasting. It comess down to the difference between accuracy and precision.
Consider a hypothetical pollster with an absurdly large Design Effect but a negligible House Effect. Let's call it Zogby Interactive[2] .
Zogby performs just as well on average as an ideal pollster, but it's sampling error is much higher then expected. By comparing the variance of the poll series with that of public opinion, we can estimate excess sampling error. Note that Zogby Interactive Polls provide very little information about a race
Now consider another hypothetical pollster with a pretty small design effect but a large Pro-Republican Bias named Rasmussen.
Rasmussen polls follow Public Opinion almost perfectly, but with a pro-Republic Bias. By measuring difference from public opinion, we can estimate House Effects. Note that once Bias is estimated and accounted for, Rasmussen polls provide quite a bit of information about a race.
Nate Silver has noted both effects, and has tried to estimate House Effects separately in the past. But Opinion, House Effects, and Design Effects all require each other to estimate, and so any accurate model should consider them jointly.
In order to do so, we've set up a Bayesian Dynamic Linear Model and estimated it with MCMC. [5] The model is pretty simple, and is a slight generalization of the site's previous methodology that has performed pretty well [3]:
We assume that public opinion follows a simple random walk[4]. But public opinion isn't directly available. All we have are noisy and biased polls.
A Dynamic Linear Model, where x(t) represents public opinion, and y(t) represent observed polls, arrows indicate conditional probabilities. See footnote [5] for details
Actual Estimates:
One of our system's best feature is that it's a fully Bayesian probability model, allowing us to derive joint posterior distributions of all of our parameters.
Because of this, we can judge the primary debate between Mark at Pollster.com and Nate at FiveThirtyEight. Do Design Effects(PIE) vary significantly among Pollsters?
Short Answer: No
In order to check this, we ran our model on this cycle's Generic Ballot Polling, setting very weakly informative priors for each Pollster's PIE, without assuming any relationship between them.
Estimated Pollster PIE vs Number of polls in the Generic Ballot database.
The issue here is that PIE is a ratio strictly greater then one, and so greater uncertainty will drive the mean estimate in only one direction. On Andrew Gelman's suggestion, I imposed a Hierarchical model on the priors in order to account for this, leading to the following estimates:
Corrected PIE vs number of Polls in Generic Ballot
There's little reason to believe from this graph that Pollster Introduced Error in the Generic Ballot varies at all. The difference between even the best and worst pollster is not statistically significant from zero.
It seems that pollsters perform about 25% worse then would be predicted by ideal random sampling, which would add less then point of error. This is in agreement with previous research, but directly contradicts Nate's findings that polls have an added error of 3-4 points.
The discrepancy occurs because Nate's estimating method conflates House Effects with PIE. This inflates the estimates to be bigger then they actually are.
Preliminary House Effects by Pollster with Standard errors. Pollsters in Bold have effects statistically significant then zero. [6]
The Exception that Proves the Rule:
While the graphs were from the Generic Ballot Polling of this cycle, the results hold for other national and state level races from 2008, 2009, and 2010. It seems that for election horse-racing, accuracy does not vary significantly among Pollsters [7]. But polling for Obama Approval seems to have much higher Design Effects and little more differentiation between Pollsters.
Estimated Pollster Introduced Error vs Number of Polls in Obama Approval Database
It's not clear why Polling for Obama Approval is more unreliable then polling for elections. It could be that because there is no accountability in the form of an election result, discipline and standards are laxer then otherwise.
What Comes Next:
Preliminary Graph of Obama Job Approval with 95% confidence intervals, adjusted for House Affects and PIE.
Over the next couple of days, we plan to release an election forecasting model for House, Senate and Governor races, as well as introducing a house-effect and PIE adjusted tracker for the Generic Ballot and Obama Job Approval. This is to be done in collaboration with the Professor Wang at the Princeton Election Consortium and a soon-to-be-named major polling website. Expect more in the coming days.
Footnotes
[1] - 538 looks at all polls from 21 days before the election to try to correct for last minute mean reversion. Last minute movement should increase as we move further from election day, potentially creating another confounding factor.
[2] - This gives Zogby Interactive too much credit, they have a very large house effect as well as a design effect.
[3] - This is based on the previous work of Simon Jackman of Stanford and Mark Pickup at Oxford and the UK PoliticsHome Poll team, who both had generously made code available.
[4]This is a technical footnote, but there is a good deal of evidence that public opinion of major parties in Western Democracies does indeed follow a simple random walk, as opposed to one subject to "momentum" and "mean-reversion".
[5] That is to say,
Opinion(t) = Opinion(t-1) + e(t), where e(t) is a random variable with mean 0.
Meanwhile, a poll with n respondents on day t is a normallydistributed random variable with mean Opinion(t)+Bias and variance [Opinion(t)*(100-Opinion(t))/SampleSize]*PIE^2, where PIE denotes Pollster Introduced Error or a Design Effect.
Under this specification, PIE can be interpreted as the Ratio of actual sample variance to variance predicted by perfect random sampling.
[6] House-Effects seem to vary a bit from race to race. We're planning to produce more precise House Effect Estimates by pooling parameters from different races using a further multi-level model.
[7] Except Zogby Interactive. Zogby Interactive did not show up in the graphs because they have not fielded any polls in the Generic Ballot. But their PIE in other races has been flagged as an outlier, having a much higher Design Effect then any other Pollster.