Various writers here have recently noted a phenomenon in the COVID-19 data coming from the CDC that makes the most recent data always appear more optimistic. For some states, cases and deaths are being assigned to the date when they occurred, which makes the most recent data appear lower, and over time it eventually fills in. The effect of this is to always make it look like the pandemic is on an improving trend in the most recent days.
Given this, I thought I would share some of the plots I have been creating throughout the pandemic to help my own understanding of what is going on nationally and state-by-state. When the excellent COVID-19 Tracking Project shut down earlier this year, I switched to pulling data from the CDC’s site, specifically the United States COVID-19 Cases and Deaths by State over Time data set. Early on in my use of this data, I noticed that past values were getting shuffled around. There is some value in getting the totals assigned to the most accurate dates, especially for anyone studying the history of the pandemic. But I’m interested in the future — in understanding what is going to happen, given the most recent data available right now.
Based on what I was seeing, I revised my approach to harvesting data. Now I gather new state-by-state totals as soon as they become available, but I don’t revise the raw totals for any dates that have already been reported. Revising past data may give a more accurate picture of what was happening in March, but it doesn’t reflect what we knew in March, as it was happening. So if we’re trying to understand what is happening right now, we have to use the data we have, and compare it to past trends based on what we knew at the time.
In these plots, I also correct for various obvious reporting anomalies, such as negative changes in the totals, and days where a state’s totals do not change at all, often followed by a single day which adds in several days’ worth of data. I also use a simplistic method to extrapolate values for states that haven’t updated in the most recent day.
Early in the pandemic, I developed a simple model that relates the reporting of new positive cases to the reporting of deaths. Take the 7-day average of cases, divide by 60, shift it right 25 days, and plot it alongside the 14-day average of deaths. The two curves have tracked quite well throughout, as the image above shows. This allows us to see into the future! We are now well into the Delta surge, and the trend continues. Take a look at Texas, for example:
The earlier data is somewhat noisy, with various effects from holiday reporting and the winter storm. But the current surge tracks almost perfectly. Regardless of what anyone says in a tweet, cases are rising, and deaths are following right along. It’s clearly going to get significantly worse over the next three weeks.
We can add the revised curves directly out of the CDC data set to see how they deviate. Here is the national plot; blue and grey are cases; red and purple are deaths:
All four curves follow the same major trends, but the revised data has some differences. The overall trends are shifted left a little bit, reflecting the move of data points to the dates they happened rather than the dates they were reported. And, perhaps of most interest, the CDC values for the last few days are noticeably lower than the incremental values we get by never updating existing totals. Go back a week or two, and the curves are almost interchangeable, making this strategy a good way to predict recent values without having to wait weeks for the numbers to settle.
Now consider Florida:
The actual trend in the death rate can’t hide when we hold previous totals constant and only incorporate new days’ totals. In fact, it’s moving almost exactly the opposite of what plotting the most recent data set (purple) would suggest. And new cases do appear to be dropping, but not by as much as the states’ currently reported data (grey) would suggest.
Nationally, the death rate appears to be a little better in the current surge than it was in the winter, but it’s still terrible. New cases might be leveling off, or this might just be some wavering in the data before it resumes ramping up. But nationally, we can see that we haven’t reached the downhill side of this surge, and there are certain to be a lot more preventable deaths over the next month.