I was listening to a(nother) talking head on NPR (or are those talking ears?), and I was struck by the thought that there may be a very serious flaw in current polling methodology. I infer from the way the polls are presented that the concentration on likely voters skews the results in favor of John McCain. I'm not saying this is any sort of conspiracy by the "liberal media" or the corporate pollsters, but inadvertently creates faulty models. My understanding of likely voters means people who have consistently voted in the past, but have not yet voted.
And there's the rub.
With the explosion of early voting, a lot of actual voters, who I expect fall into the category of likely, are not included in the data. So I did a little pseudo-statistical analysis of one contested state to see what error may have resulted. Now, this is pseudo- because I don't have, nor am I going to spend a couple days searching for all the necessary data. If someone at dkos has more complete info they can extend this analysis. Heck, if they want to send me the data I might give it a try.
I searched for info on swing state early voting, and North Carolina had the best info I could find with the minimum effort (the single most important analysis criteria). Don't ask for a bibliography. I didn't record my sources, but they're all from reputable groups like CNN, the NYT, large NC newspapers, etc. Nothing from any advocacy groups or anything less than well-regarded independent sources (like Pew, etc.).
Here's an approximated outline of the basics in North Carolina:
Polls are running about 52-46 for Obama
Early Voting is estimated at 30% of eligible voters
30% of early voters are African American, but 30% of NC population is too, so that doesn't skew
30% of voters have already cast their ballot
Early voters have been estimated approximately 2:1 for Obama
Here's the pseudo-analysis:
Assume the 52-46 split is extensible to all voters
Assume 30% equals 1/3 (it's just easier, and I compensate with the undecideds)
2/3 of the 1/3 who already voted went for Obama. That means that poll ends up 68-32 for Obama, giving McCain all of the undecideds: (52*2)/(52*2+48) = 104/152 = .68
52% of the remaining 2/3 vote Obama, undecideds go for McCain. Obama 52-48.
Using this model, Obama polls 57% (.68*1/3)+(.52*2/3) = .225+.345 = .57
So if the early voters are excluded from the polling, Obama's lead in North Carolina is not 52 to 46, but actually it's 57% to 43%, even giving McCain all the undecideds.
Remember, I call this a pseudo-analysis because I have very limited information. And I'm not a statistician, but I play one on TV. No, wait. I am a statistician, which is why this bothered me in the first place.
I wonder how this translates in the other states that have early voting?