The question of how SARS-Cov-2 got going in humans is now being discussed seriously and rather objectively by a wide range of scientists with real knowledge of the detailed evidence. I’m not one of them, but have had a chance to talk with some. I lean somewhat toward the lab leak hypothesis, for reasons I’ll describe below. First, however, I want to make a more unusual point. Although generally knowledge is good, in this case it would be best if we never find out for sure which hypothesis is right. What justifies that peculiar claim?
We know that direct zoonosis occurs fairly commonly. It’s the origin of most pandemics. As population pressures drive more human incursions into new habitats, it’s likely to become even more common. There will be new pandemics coming directly from other species. Regardless of whether this particular pandemic happened to come directly from some other animal, we need stepped-up international surveillance of pathogens in other species, to help identify threats before they break into humans. If a lab leak is somehow demonstrated (e.g. by records from Wuhan) it will be hard to persuade the public of the importance of such programs. Furthermore, reaction against the research which might have led to such a leak would probably spill over into a reaction against the research needed to keep up with vaccine development.
We know that lab leaks are common. At least one pandemic, the 1977 H1N1 flu, came from a leak. Many other isolated cases have come from leaks, but were tracked down before they could spread. So long as research on pathogens is done without truly extraordinary precautions, there will be epidemics from lab leaks. Regardless of whether this particular pandemic happened to come from a lab leak, we need stepped-up international surveillance of all research on pathogens, to help block the sorts of practices (gain of function research, sloppy precautions) that make lab leaks likely. If direct zoonosis is somehow demonstrated (e.g. by finding close wild viral ancestors near the outbreak) it will be hard to persuade the public of the importance of such programs.
Both hypotheses represent ongoing serious dangers. We need to guard against both. It’s best if we don’t pin down which one ran into us this last time.
***
Now for some background on plausibility of different hypotheses. This is more important for demonstrating some methods than for the conclusion.
First, we can easily toss out the paranoid bioweapon hypothesis. A virus that mainly sickens and kills old people or other people with weak immune systems and that spreads indiscriminately would be the stupidest fucking bioweapon imaginable. Do the people who think that China is about to take over the world also think that the Chinese are complete morons? Let’s scratch that one off the list.
So now we’re left with two broad hypotheses: zoonosis (Z) and lab leak (L). Let’s try to use standard Bayesian methods to do an extremely rough estimate of the ratio of their probabilities, P(L)/P(Z).
We’ll start with a consensus view, that the prior guess would be P(L) is much less than P(Z). Basically that just says that most pandemics happen via Z. How big is that ratio? With only one known L pandemic, let’s crudely say P(L)/P(Z) =1/30. That corresponds to the standard idea that you would call Z the “null hypothesis”, i.e. the boring first guess. But rather than treat the null as qualitatively sacred we’ll just leave it as initially quantitatively more probable by our very crudely estimated factor.
Now we get to the simple part that hasn’t been emphasized enough. Both P(Z) and P(L) come from sums of tiny probabilities for each individual person. P(L) comes almost entirely from a sum over individuals in Wuhan. P(Z) comes from a sum over a much larger set of individuals spread over China and southeast Asia. Since we know with confidence that this pandemic started in Wuhan, restricting the sum of individual probabilities to people around Wuhan leaves P(L) almost unchanged but eliminates most of the contributions to P(Z). Wuhan has less than 1% of China’s population. That means we need to increase the P(L)/P(Z) ratio by about a factor of 100, since the denominator is reduced. The ratio could now easily be greater than 1.
Of course there are complications. Individuals in different regions make much different contributions to P(Z). Much of China (e.g. Beijing) contributes little. Wuhan, however, is also not a hot spot. The known relevant bat viruses are concentrated far to the south. So for this informal crude estimate, let’s just stick with the plain population factor.
What about other factors that might favor either hypothesis? One that’s often mentioned is the extreme secrecy of the Chinese government, which is certainly consistent with L but also said to be not very surprising for Z. So it may increase P(L)/P(Z), but not very dramatically. The absence of known close ancestors is also completely consistent with L, given secrecy, but also not too surprising for Z. Things can be hard to find. There are arguments among experts as to whether features of the sequence look more surprising for Z than for L, but I don’t know enough to comment on those. Likewise the absence of any known human precursors with lower R0 is a bit surprising for Z, but again we’re in scientifically murky waters.
The bottom line is that these more detailed factors also seem to favor L over Z, but mostly they just add uncertainty. I find it odd that so much of the discussion has centered around stories about such factors, rather than the big simple fact that the pandemic started in Wuhan.
So we have two layers of uncertainty here. Unavoidably, even if we had well-calibrated rates of lab leaks and direct zoonosis we’d still only be left with a probabilistic estimate if which was more likely. On top of that, we have major uncertainties in our guesses about the different factors used to estimate that ratio of probabilities. Combining these uncertainties (a convolution of distributions on the log of the ratio) means that we shouldn’t take the ratio we get too seriously. Both hypotheses are comparably likely. Let’s hope it stays that way.