Because I have difficulty reading all kinds of speculation about things that might or might not happen, and the death of RBG has me studiously avoiding all the punditry about what might or might not happen next, I find myself with a need to fill up time. So, I decided to spend some time creating a real life, real time demonstration of how “margin of error” or “confidence interval” works in real life, using the data from the USC Dornsife tracking poll.
First, let’s look at how their tracking poll is constructed, in their own words:
“On August 11, 2020, USC’s Center for Economic and Social Research (CESR) invited 8,355 eligible voters who are active members of CESR’s Understanding America Study (UAS) probability-based internet panel to participate in an ongoing election tracking survey. The baseline data and consent information was collected in UAS survey 306.
Each study member who agreed to participate was randomized to respond on a pre-assigned day of the week, distributed so that our full sample participates over a 14-day period. Respondents have until their next assigned wave day (or 14 days after their assigned date) to complete the survey. Data for the full sample is nearly complete after the first 14 days, but not final until the end of the full 28-day wave.”
So, even though their tracking poll is a 7 day average, it is constructed from the last seven days of submissions from 14 cohorts, each consisting of roughly 600 people (596.785714, to be precise, but clearly we don’t have .785714 extra people in each cohort, so I rounded up to a nice even 600 for purposes of this demonstration). While each person has up to 14 days to respond, let’s assume, to keep things sort of simple, that whatever pattern each person has (quick, slow, intermediate responder) they sort of stick to that pattern.
And, according to a wiki formula I looked up, the 95% confidence interval for a sample of 600 is plus/minus 4% and for 4200 (the seven day sample size) it is plus/minus 1.51% and for the full 8400 size sample, it is plus/minus 1.07%.
OK - now let’s look at the Dornsife data. First, here is their running graphic:
It might be kind of small for you to see, but what it shows is a kind of up and down pattern, with the Biden average being 51.72 with a high of 53.3 and a low of 50.13. 51.72 + 1.51 = 53.23, while 51.72 - 1.51 = 50.21. So it looks like there is a VERY SMALL amount of variation outside the confidence interval, but - keep in mind the confidence interval says that whatever we are looking at will be accurate, within +/- 1.51 points of whatever value we are looking at NINETEEN TIMES OUT OF TWENTY. And we have 35 data points. So, quite reasonable that one or two data points might be outside that range, assuming that there really hasn’t been any change in actual vote share for Biden or Trump. And in fact we had ONE data point lower than the range, and THREE data points higher than the range. Hmmm - slightly more outside the range than we might expect, but not a whole lot more. But what a discerning eye might note is that the excessively high numbers came early in the sequence and the low number came towards the end. So, does that mean that perhaps there has been a downward trend, however slight?
Well, what got me working on this (aside from the need for a distraction) was that I was noticing a distinct periodicity to the graph. Go back and look….
And that got me to thinking about the nature of the sample. It is the same 8400 people, over and over again. But that group of 8400 people is actually made up of 14 distinct subgroups, all reporting, more or less, sequentially to each other. So, if each group of 600 had a 4 point confidence margin (at 95% confidence level), what if we lined up two 14 day periods of reporting, side by side, e.g. day one from the first 14 day period, lined up with day one from the second 14 day period. What might we see? So that is what I did, counting backwards from the last reported day (when I created this diary).
So - the first 14 day period went from August 23rd through September 5th, and the second 14 day period went from September 6th through September 19th. When I did this exercise, this is what I got (I just did the Biden side of things for this diary but the same can be done on the trump side too):
The Column labeled “C-E” is the difference between day “n” from the earlier 14 day cycle, minus the same day “n” from the second 14 day cycle. Two things to point out.
- In the first 14 day cycle (four weeks ago to two weeks ago), the average Biden vote total was 52.29. In the second 14 day cycle (two weeks ago to the 19th), it was 51.59, or a drop of 0.70 points. And if we go way back to the early part of this diary, we will see that the confidence interval for 8,400 size sample is 1.07, so well within the expected range.
- What I found even MORE interesting however, was the variance between each of the “equivalent” days, e.g., day 1 vs day 1, day 2 vs day 2 in each of the two 14 day cycles. What I found was that the day vs. day variance was well within the 4 point range for the 600 person sample, indicating a lot of consistency within the group. This is because you really aren’t trying to estimate the accuracy of the 600 against the general population, but rather measuring the consistency of the 600 people. And what we see there is that across all the 14 days, on four days the “movement” was less than 0.1%, on four days it was between 0.1% and 1%, and on the other six days it was slightly over 1%, maxing at 1.32%. Looking at the pattern of the variance, I do note that on a day vs. day analysis, the first 13 days were all moving, albeit very fractionally, away from Biden, however on the last day of the cycle, the number moved in Biden’s favor. So I suppose, looking at the overwhelming directionality of the movement, you could reasonably argue that the decrease of 0.7% is real and (given that the same type of directionality was also then seen on the Trump side) that there has in fact been a tightening in the race by 1.4 points. Compare this to 538’s change from 8.8 to 6.7 (2.1 points) over the same period of time. Although not shown, the actual 7 day running average difference as of 9/19 is 8.67, with the previous 6 days ranging from a high of 9.68 to a low of 6.72; and the 14 day average is 10.1, with a range between 10.8 and 9.3.
So, what was I hoping to show in all of this? Mainly that looking at a single poll on a single day really isn’t very revealing. Also that even with pollsters who use a consistent voter selection process and a consistent voting model, you still get a lot of variation (as demonstrated by the wide differences in the different 600 person panels for Dornsife). And finally, for those that follow the Dornsife tracker, be aware that their own model has a built in day to day “structural” variation that demonstrates itself in the undulating graph that they produce, so expect the lines to go up and down, even as we get closer and closer to November 3rd, simply by the nature of their process.
FINAL NOTE: this will publish at 2 AM Hawaii time, 8 AM EDT, so hopefully I will be sound asleep when it shows up, and if anybody chooses to read and comment, I probably won’t respond until a slightly more reasonable Hawaii time.