People who have been following my Pandemic Observations blog for awhile have probably noted my seeming obsession with how Sweden has dealt with, or rather failed to deal with, the Covid-19 crisis now afflicting most parts of the world. Originally my interest was piqued by just how their determination to strike out against the perceived epidemiological wisdom of most of the rest of the world might work out over time.
The answer seems pretty obvious by now — with over 70,000 confirmed Covid-19 cases reported thus far, Sweden has become the most infected country in Europe on a per capita basis, save only for tiny Luxembourg (which it will probably pass in the next few days) and the even tinier principalities of Andorra, San Marino, and Vatican City — and this with a testing program that was pretty anemic by European standards, though it does finally seem to be ramping up in recent weeks.
Likewise, with over 5400 reported deaths, Sweden now trails only Italy, Spain, the UK, and Belgium in overall Covid-19 per capita fatality rates, and is steadily gaining on all of them (except possibly the UK). One consequence of all this is that Sweden’s neighbors, particularly Denmark, Norway, and Finland, who have all done a quite commendable job of virtually eliminating the coronavirus within their respective countries, seem even less inclined toward reopening their borders to Sweden than Canada currently is vis-a-vis the USA.
What to do? Since the real problem is with Sweden’s current fatality rate, which while it has fallen from the 600+ deaths per week levels back in April down to ~200 per week more recently, is still a lot higher than her neighbors may feel comfortable with (which are all now down to low single digits); and even worse an infection rate that has actually doubled from the ~3500 new cases per week of April to the current 7000+ cases per week currently (a rate equivalent to 25,000+ in the USA, and compared to Denmark’s <300 per week, Norway’s ~100 per week, and Finland’s <70 per week).
Since the current new infection numbers are so large, there isn’t really much that can be done to disguise either the long-term trends or even the weekly stats, but apparently over the mid-Summer holiday back in June, someone in the Health Ministry hit upon a rather ingenious method of obfuscating at least the short-term day-to-day numbers. Prior to this, Sweden had followed the custom of nearly all other data reporting countries in assigning new cases to the day they were actually reported out, rather than the day the test was conducted or the day the results were first obtained — which is why we tend to see a “weekend effect” in data reporting to begin with.
However, by switching to one of these alternative data day assigning schemes (I’m guessing it would have been the day the test was administered since that would both make the most logical sense, and maintain the most flexibility in revising each report day’s data) the apparent numbers for the current day (and to a lesser degree the day or two before) could be drastically reduced. Instead of reporting out 1000+ new cases per day as had been all too common before the mid-Summer holiday, voila just a couple of hundred same-day testing turnaround cases would need to go out to the data aggregators like worldometer, while the many other hundreds of non-same-day results back-filled the previous few days.
All the data was still there, even if it took a bit longer to actually show up, but to the casual observer just checking to see what the current stats looked like, they would at least appear to be much lower than was actually the case. And of course, besides being unnecessarily confusing for researchers, it’s also a royal PITA to have to constantly revise the data already entered into a spreadsheet or database.
This is not to say that there aren’t perfectly valid reasons for needing to revise the data from time to time. Indeed, the UK just dropped a major revision of their case data on us today, when ~30,000 of their over 300,000 previously reported cases suddenly vanished from the record. It turned out that they had gone thru all their records to weed out duplicate cases caused by multiple positive test results for the same individuals, and since the operative definition of a “confirmed case” should obviously apply to the individual being tested rather than the test itself (unless perhaps it’s someone who tested positive at some point, then tested negative for some period of time before testing positive again—this really is a nasty little bug), I have no trouble at all with such a revision — particularly when they were fully transparent about their reasons and methodology.
OTOH, then there is the type of revision Kazakhstan also dropped on us today by coincidence, when their total number of reported cases nearly doubled from ~22,000 to over 42,000 without a word of explanation. However, in both the UK and Khazakh cases, working back through the time-series produced the expected results: the greatest divergence between the old data set and the newer being with the most recent (whether negative in the UK case, or positive in the Kazakh case) while gradually diminishing to ‘0’ the farther back in time one goes. While certainly more work for the researcher in terms of updating their database, at least this type of revision doesn’t tend to play havoc with any of the short-term or medium-turn statistical trends, since the revised data tends to be well integrated over the entire time-series.
Then there is the type of revision more common when a particular reporting agency decides to start counting all of their “probable” Covid-19 deaths (in which someone died exhibiting at least some of the symptoms without either an autopsy or confirming test administered) in with their previously confirmed deaths. While it would be nice to have this sort of revision integrated into the dataset over time by death date or some such, all too often they’re just presented as an enormous data dump for one particular day, which results in a large blip in the previous trendlines, and tends to render the weekly stats I focus on rather meaningless for a period of time. But at least it’s abundantly obvious when such a data dump occurs, even without documentation, and usually proves to be no more than a minor annoyance.
I had thought that pretty much covered all the bases for standard legitimate data revisions, but then I encountered the first post-weekend Swedish fatality data for June 30 (which actually represents data for Monday, June 29). Since this was a much larger than expected bump of over 100 new fatalities (initial post-weekend days had usually been in the 40-50 range the past few weeks) I noticed that it bore no relation to the “New Deaths” column at worldometer (which IIRC was somewhere south of 20).
At first I thought the Swedes were just applying their new reporting methodology that they had already been using for their new case data the past week to their fatality data now — which hadn’t appeared to be the case up to this point. But when I tried integrating this new fatality data in my spreadsheet, I quickly realized that the old and new data weren’t converging as I thought they should. OK, so maybe this was some long awaited revision of “probable” Covid-19 deaths in with their previously reported deaths — except that there were no explanatory remarks indicating such, and if it really were that type of revision, it was actually pretty small potatoes (such revisions usually amount to an additional 10-20% or more of the underlying death toll, while this wqas more like 2%).
Nonetheless, I persevered in integrating this new fatality data, only to find that not only weren’t the two data sets converging the farther back in time I went, they were actually growing further apart! By the time I had reached early May, the difference had widened to well over 400, before finally starting to converge as it should have done to begin with. Needless to say, I’ve done at least a couple of dozen of these data revision integrations from as many different countries now, and not one has ever behaved in such a statistically implausible way.
The only hypothesis I’ve been able to formulate so far is that either through deliberate design or inadvertent programming error, literally hundreds of Covid-19 fatalities in Sweden that actually occurred in May or June were somehow backdated to now appear as April’s stats. So what would actually be the point of this type of data manipulation, if it were done deliberately, since the total number of deaths still went up instead of down?
Quite simply, to make the most recent weekly stats look much better than they actually were. Instead of going from 200 up to at least 250, they now purport to show they have fallen below 50! Even worse, this sort of data manipulation has smoothed out the medium-term trendlines to make it appear is if Sweden has been making nearly continuous progress in lowering its Covid-19 fatality rate.
The only problem with this scheme, if it were indeed done deliberately, should be obvious. As long as deaths continue to accumulate at even a 100 per week clip (to say nothing of 200-300), ever greater efforts would be required to keep backdating these deaths in order to keep up the appearance of a not too unreasonable weekly fatality number (which is an absolute essential if Sweden ever hopes to convince its neighbors to reopen their borders anytime soon), but eventually the whole house of cards would simply collapse.