The Lancet study estimating nearly 100,000 Iraqi deaths, when it wasn't being ignored, was derided by the American press of being of flawed methodology and, therefore, inaccurate. This despite the fact that when the exact same method was used in the year 2000 by epidemiologist Les Roberts, also the leader of the Iraq study, to estimate some 1.7 million deaths in the ongoing Congolese war, the number was given respectful treatment by
CNN,
ABC, and
the BBC.
In fact, as journalist Andrew Cockburn found out, the charges of a flawed methodology may indeed have been true, just not in the way US corporate media outlets would ever tell you.
Writing for the print edition of
Counterpunch, Cockburn reports on how he sent the full data of Roberts' study to statistician Pierre Sprey.**
Sprey begins by complimenting Roberts' sampling methodology:
I have the highest respect for the rigor of the sampling method used and the meticulous and courageous collection of the data. I'm certainly not criticizing in any way Robert's [sic] data or the importance of the results.
But he thinks Roberts went wrong in interpreting the data. How? By following the "academically conventional" approach of assuming the data followed the Gaussian curve.
According to Sprey:
Slavish adherence to this formula obscures information of great value. The true shape of the data scatter almost invariably contains insights of great physical or, in this case medical importance. In particular it very frequently grossly exaggerates the true scatter of the data. Why? Simply because the mathematics of making the data fit the bell curve inexorably leads one to placing huge emphasis on isolated extreme 'outliers' of the data.
For example if the average cluster had ten deaths and most clusters had 8 to 12 deaths, but some had 0 or 20, the Gaussian math would force you to weight the importance of those rare points like 0 or 20 (i.e. 'outliers') by the square of their distance from the center, or average. So a point at 20 would have a weight of 100 (20 minus 10 squared) while a point of 11 would have a weight of 1 (11 minus 10 squared.)
This approach has inherently pernicious effects. Suppose for example one is studying survival rates of plant- destroying spider mites, and the sampled population happens to be a mix of a strain of very hardy mites and another strain that is quite vulnerable to pesticides. Fanatical Gaussians will immediately clamp the bell shaped curve onto the overall population of mites being studied, thereby wiping out any evidence that this group is in fact a mixture of two strains.
The commonsensical amateur meanwhile would look at the scatter of the data and see very quickly that instead of a single "peak" in surviving mites, which would be the result if the data were processed by traditional Gaussian rules, there are instead two obvious peaks. He would promptly discern that he has two different strains mixed together on his plants, a conclusion of overwhelming importance for pesticide application.
Cockburn informs us, "Sprey once conducted such a statistical study at Cornell - a bad day for mites."
According to Sprey, the best solution to this problem is to use a "distribution free" or "nonparametric" method. Says Sprey:
These make the obviously more reasonable assumption that one hasn't the foggiest notion of what the distribution of the data should be, especially when considering data one hasn't seen -- before one is prepared to let the data define its own distribution, whatever that unusual shape may be, rather than forcing it into the bell curve. The relatively simple computational methods used in this approach basically treat each point as if it has the same weight as any other, with the happy result that outliers don't greatly exaggerate the scatter.
The Lancet study's quoted number of 100,000 excess deaths (actually 98,000) had a 95% "confidence interval" of 8,000 to 194,000 deaths (which means that it is 95% certain that the true number lies in between those two numbers). Sprey's nonparametric method points to a 95% confidence interval of 53,000 to 279,000 excess deaths, a major upward shift. "This shift to higher excess deaths occurs because the real, as opposed to the Gaussian, distribution of the data is heavily skewed to the high side of the distribution center".
Furthermore, as Cockburn notes, the Lancet survey was conducted 15 months ago. If one assumes the same rate of excess deaths (a safe bet considering the bulk of the numbers are due to decayed infrastructure and lack of medical facilities and materials--a situation that has definitely not changed for the better in the past 15 months--and likely even a too conservative one considering it doesn't account for a growth in sectarian violence and the ramping up of the US air war over these months), then the 98,000 excess deaths statistic of the original Lancet study becomes 183,000 excess Iraqi deaths resulting from the Anglo-American invasion of Iraq in 2003; a war that was started to oust a dictator that was charged with killing 300,000 Iraqis over the decades of his misrule.
The full story is here.
I'll let the statistics geeks argue about Gaussian versus nonparametric methodologies below without interfering. Other mistakes involving matters of statistics and the Lancet study that I understand much better, I'll be perfectly willing to answer.
**Sprey, according to my googling, appears to be both this guy (a "former Pentagon analyst [who] served as special assistant to the assistant secretary of defense for systems analysis during the Johnson and Nixon administrations. He also carried on the seminal work of the late Richard Reid Hallock [Oberlin class of] '41 in founding the field of combat data/combat history-based cost effectiveness analysis for weapons.") and this guy ("Mapleshade Records has gained an excellent reputation with audiophiles and just plain music lovers for the high quality of its sound... President, engineer and general factotum for the outfit is Pierre Sprey.").