On August 4th, Nate Silver published a post at 538 using conventional protocols, i.e., he made a tiny url and tweeted his new post.
But for some reason, the post was almost instantly retracted, and replaced with a different one, not matching the old tiny url.
Nate said this:
An earlier post in this space about poll oversampling was published in error and will be updated and published later this week.But that was the 4th, THREE weeks ago.
So where is the oversampling post?
Well I managed to find a copy, because the internet is forevah.
And you can read it below the fold. But why would Nate pull such an innocuous, innocent post? Because it utterly debunks the screams of "dem oversampling" coming from conservative pundits.
FiveThirtyEight: Why Charges of Poll ‘Oversampling’ Are Usually MisguidedWhy someone want to suppress this excellent and sensible post?
08.05.12 | New York Times
Its a quadrennial tradition for partisans and poll-watchers to complain about the number of Democrats, Republicans and Independents that are included in each survey.
Recently perhaps because Mitt Romney still narrowly trails President Obama in most state and national surveys we have seen a bit more of this from conservatives. They will sometimes allege that these polls are oversampling Democrats, including too many of them in their surveys, and perhaps biasing their results toward Mr. Obama because of this.
There are, nevertheless, elements of truth in these critiques. It is certainly the case that some polling firms consistently show more favorable results for Democrats and Republicans than the consensus of polling firms. We call these house effects and our forecast model adjusts for them; if a polling firm is consistently 2 points more Democratic-leaning that the consensus, we strip most of that right back out.
It is easier, of course, to identify these cases after the fact. Beforehand, the best you can usually do is to acknowledge that there is some possibility of their occurrence. Even in the waning days of an election, when we have surveys from dozens of polling firms that collected tens of thousands of interviews between them, their biases will not necessarily cancel out, and the error in the surveys may considerably exceed that from sampling error alone.
Still, I think the charges of oversampling mostly miss the point. Let me make 13 relatively brief but interrelated points that explains my philosophy on this issue, and where I see the theoretical and empirical evidence as guiding the debate.
1. Be careful if you see the term oversampled. It is probably being used incorrectly. In blogs, the term oversampled has come to be a shorthand for a poll that includes too many Democrats or Republicans. But thats not quite the way that pollsters use the term.
Instead, an oversample is a deliberate effort to include more of a certain population in a survey to permit more robust analysis of a particular demographic subgroup.
For example, say that a polling firm wants to study the views of Latino voters in more detail at the same time that it is conducting a national survey. Its initial survey of 900 adults may include about 130 Hispanics about their share of the United States population which is not really enough to analyze with much accuracy because of the high margin of error associated with a 150-person subsample. So the polling firm would take an oversample until it got a total of 450 Hispanics on the phone, creating a respectable sample size. Then it might be able to report, say, how Hispanic voters preferences would be affected by the presence of Senator Marco Rubio on the Republican ticket.
Knowing that it has interviewed too many Hispanics, the polling firm would then down-weight the Hispanic voters when it rolled them back into its national survey and reported the results from all United States adults. In this example, they would reduce the weight associated with each Hispanic voter by two-thirds, since they interviewed 450 when there should be 150 based on their share of the U.S. population. This technique permits the pollster something of the best of both worlds: it can have a more robust analysis of hard-to-poll demographic subgroups without skewing the overall sample.
2. Be even more careful when you see terms like skewed or biased.
3. Party identification is not a hard-and-fast characteristic, as other demographic characteristics are.
4. Partisan identification measures are affected by sampling error.
5. Partisan identification is not the same thing as partisan registration.
6. There are many different ways of measuring and asking about party identification.
7. Polls of registered voters, or all adults, typically show a more favorable party identification spread for Democrats.
8. If you are going to scrutinize polls based on their partisan identification, do so equally.
9. Weighting by party identification puts the cart before the horse.
10. There is no absolute standard to measure party identification only other polls.
11. Taking a poll average especially with adjustments for house effects is usually a more elegant solution to the problem.
12. Pay relatively more attention to party identification when you have fewer polls.
13. There has not been any long-term bias in the polling average toward Democratic or Republican candidates.
Because if there is no "dem oversampling" then Obama has even more of a lock on re-election than it appears. And this also means there are just more democrats in all the samples, except of course, Rasmussen's. And most important of all, it means the horserace is over before it begins.