Back in March, I wrote about how the polls this cycle are wrong, just as they were wrong in 2022. Perhaps it was out of some foolish hope the polls would self-correct as we entered the middle stretch of the election season, that I procrastinated providing update, but it is now clear that the problem with the polls is systemic, and will not correct this election cycle.
Random Error, Systemic Error, and The Failings of Data Journalism
Before we get into how we know the polls are (still) very wrong, and what we should do, I will go over the concept of random error and systemic error. If you’re familiar with these concepts already, please feel free to skip this part of the diary.
Let’s say you have a bathroom scale. As many of us know, bathroom scales aren’t terribly accurate; the weight you get on your bathroom scale will be different than the much more accurate scale in the doctor’s office, which we will assume for the purpose of this argument represents your true weight. If the bathroom scale measures a lower weight than the doctor’s office scale, it doesn’t mean that we just get to believe the bathroom scale (as tempting as it may be) — it just means that our bathroom scale has error. Bathroom scales can be pretty inexpensive devices, so this sort of error is expected. If the bathroom scale can weigh lower, or (FSM forbid) higher, than the doctor’s office scale, we say that the bathroom scale error is random.
Data journalism comes in and says: if you were to buy like 10 bathroom scales, weigh yourself on each of them, and then average the results, you could get a very accurate result equal to that of the doctor’s office. Because the error is random, the errors high and low would cancel out, leaving you with a far more accurate answer than if you had just used one scale.
But what happens if the bathroom scale manufacturer uses a component that always gives you a lower weight than your true weight? I’d say that’s nice, but this is systemic error. You can’t average out systemic error. If you had 10 bathroom scales, and they all read low, you’d have a very precise but ultimately inaccurate weight.
Now data journalism isn’t completely hapless when it comes to systemic error. For several cycles now, models like 538 have attempted to correct for systemic error through what in the polling industry is called house effect. But house effect is backwards facing, meaning it has to be observed a previous cycle before it can be applied to this cycle. Never mind whether the polling firm has changed their methodology. And data journalism also makes the very big assumption that pollsters are all operating in good faith.
Republicans have started flooding the zone with garbage pollsters to purposefully and deliberately skew the aggregators, as I noted in Part 2 of my autopsy of the 2022 midterm election, Anatomy of a Polling Failure: Part 2 (Garbage In, Garbage Out).
So, in conclusion, if pollsters were exhibiting random error, we could return to 2012 and feel data journalism was giving us a reasonably good prediction of where the race would be if the election were held today. However, systemic error can not be averaged out.
how do we know the error is systemic?
Good question!
First, a bit of history. Very early on with Biden’s Presidency, a number of unusual and counterintuitive trends were observed in his approval polls. Specifically, Biden’s approval was lower, and preference for Trump was higher, with younger and nonwhite voters. This approval result is completely counter to the outcome of any recent election, including Biden’s against Trump in 2020.
Let’s look at Daily Kos’s own Civiqs as a convenient example. Approval for Joe Biden is relatively higher (and not much changed) among 65+ respondents than among the youngest age cohort of 18-34. The difference in approval is significant. And opposite the results of every recent election, where Biden won young(1) voters 62-35% in 2020, but just barely lost to Trump among elderly voters 48%-51%. We see similar patterns with black and Hispanic voters, where Joe Biden pulls abysmal approval from these groups relative to their actual voting patterns.
So Biden is truly upside down in the polls, with the the demographics that voted for him most heavily approving of him least.
“But approval isn’t voting”, you say!
Correct! But the same upside down pattern has been appearing in polls of Democrat versus Republican. We saw this in 2022, as I pointed out on my series of diaries discussing the failure of the red wave narrative.
And now, let’s go to the 2024 polls, in particular, the swing state polls making all the noise the past couple of months. Please read the full article from Adam Carlson at Split Ticket, but basically, Adam Carlson analyzed all the cross tabs from National Polls from the past 6 months.
Biden is basically at the same level of white support as he was in 2020, when he beat Trump by 74 electoral votes. Polls shown only a 2% drop off among white voters. But look at the discrepancy between black and Hispanic voters. Biden won black voters by 83% in 2020, but the polling averages consistently show him winning by only 62%. Likewise, Biden won Hispanic voters by 25%, but polls show him winning by only 10%. These will have a huge impact on the polls, especially in states with higher nonwhite populations. This is how you get poll results where Biden is even in Wisconsin but behind by almost double digits in Nevada.
An even more refined breakdown of crosstabs is provided by Adam Carlson:
These show a few (real) potentially worrying trends for Biden (i.e., independents), but absolutely astonishing, and almost certainly unreal levels of dropoff in youth, urban, black, and Hispanic support.
There are two possibilities:
- Black, Hispanic, and urban voters, especially those who are young, have abandoned Biden and the Democrats in the greatest racial realignment since the Civil Rights Era, or
- The polls are wrong.
So how do we know which is correct?
Well, if this dropoff in support were real, it would have shown up in special elections and the midterms. But it didn’t.
We just had another special election where the vaunted polling aggregators once again, got it really, really wrong.
Folks, this is systemic polling error -- The polls are just wrong.
“But wait! There’s a third option: disapproval of Biden / preference for Trump” is specific to Biden and Trump, meaning voters want Democrats and Trump🤔.
The media and data journalists really love this option, as it absolves pollsters and journalists of any malpractice. It’s also unfalsifiable until the 2024 election. And there are various flavors of this hypothesis floating around. But it is no different than Option 1. It would be the greatest realignment slash deviation from recent political trends in history … without any whiff of evidence to date.
Crosstab hating, unskewing, and what we can do
Data journalists really don’t like people picking apart the crosstabs. This is a little too much Wizard of Oz pay no attention the man behind the curtain.
The theory is that since the individual error of crosstabs can be quite high, there is no point complaining that one subsample points too far one way, as it is just as statistically likely another subsample points the other way. Again, data journalists are confusing random error with systemic error. If the subsamples were randomly erroneous, then yes, they would cancel out. But systemic error cannot be averaged out. And we have enough evidence from the subsamples for the past 6 months that the error is systemic.
“Aren’t you unskewing the polls?”
No. Rightly pointing out systemic error is not unskewing.
Unskewing, for example, would be replacing crosstab subsamples with Biden’s actual 2020 performance among black and hispanic voters to “correct” the polls. It might produce far more realistic results, but it would be relying on the assumption that Biden would perform the same among black and hispanic voters in 2024 as he did in 2020, which he may not. This is a huge no-no in sampling. For example, the infamous Romney unkewing involved taking 2012 polls and replacing them with 2010 turnout patterns.
So what can we do?
#1: Stop believing individual polls.
We have systemic error. There was maybe some false hope the systemic error would resolve itself as we got closer to the election, but when we have systemic error, it must safely be assumed to be present for the whole cycle -- the polls are hopelessly erroneous and must be discarded en masse.
However, if you really are a pollercoaster enthusiast, you can look at the trend rather than absolute numbers. While there is systemic error that cannot tell you who is in the lead, you probably could accurately discern trends. This is the rare presidential election between two incumbents.
If the state of the overall race does not tell you anything (since almost 20% will be allotted to either Trump or Biden by election day, meaning the above graph only gives you the candidate floors), you could also look at trends among aggregated white voter crosstabs, as for some reason, white voters do not appear to be exhibiting the systemic error as nonwhite voters.
_____________________
(1) Please note the age group used to define young voters for exit polling and for Civiqs differ. However, the conclusion remains unchanged.