I used to work for a company that worked with drug abusers; drug testing was of obvious interest. They also did work with people with AIDS, so disease testing was of interest. For the purposes of this diary, the two are roughly equivalent. There are also profound civil liberties questions involved in these, but they aren't covered here, I'm just the stats, man.
More below the fold, but first
This series is for anyone. There will be no advanced math used. Nothing beyond high school, usually not beyond grade school. But it'll go places you didn't go in elementary school or high school.
If you "hate math" please read on.
If you love math, please read on.
I welcome thoughts, ideas, or what-have-you. If anyone would like to write a diary in this series, that's cool too. Just ask me. Or if you want to co-write with me, that's fine.
The rules: Any math that is required beyond arithmetic and very elementary algebra will be explained. Anything much beyond that will be VERY CAREFULLY EXPLAINED.
Anyone can feel free to help me explain, but NO TALKING DOWN TO PEOPLE. I'll hide rate anything insulting, but I promise to be generous with the mojo otherwise.
A test for a disease or a drug (or other things), usually gives a result of POSITIVE or NEGATIVE. Sometimes, the result is INCONCLUSIVE, but most tests don't do that, and, when they do, the solution is usually to test again, so we won't cover those. Similarly, in reality, the person may have used the drug (have the disease) or not. There are, thus, four possibilities:
(for simplicity, I will just deal with "drugs" from now on.
True positive: The person did the drug, the test says he/she did
True negative: The person did not do the drug, the test says he/she did not.
False positive: The person did not do the drug, the test says he/she did
False negative: The person did the drug, the test says he/she did not.
These are usually presented in a table:
Reality
Negative Positive
T |
e |
s Negative TN | FN
t |
R ----------------------------
e |
s |
u Positive FP | TP
l |
t
When you test a test, you typically get a group of people known to have done the drug, and a group known not to have done it, and give the test to them all, and record the results. Sometimes people summarize these results with two numbers called sensitivity and specificity. Sensitivity is TP/(TP + FN); specificity is TN/(TN + FP). Overall accuracy is TP + TN/(TP + TN + FP + FN). Suppose we have a test that is 98% accurate
Reality
Negative Positive
T |
e |
s Negative 98% | 2%
t |
R ----------------------------
e |
s |
u Positive 2% | 98%
l |
t
Now, we pull someone in off the street and give him the test. He tests positive. What are the chances he has done the drug? It depends. What were those numbers (not percentages), in the general population?
Were they
Reality
Negative Positive
T |
e |
s Negative 9800 | 2
t |
R ----------------------------
e |
s |
u Positive 200 | 98
l |
t
or were they
Reality
Negative Positive
T |
e |
s Negative 980 | 2
t |
R ----------------------------
e |
s |
u Positive 20 | 98
l |
t
or were they
Reality
Negative Positive
T |
e |
s Negative 98 | 20
t |
R ----------------------------
e |
s |
u Positive 2 | 980
l |
t
The first might be for some drug done by relatively few people (like, say, injecting heroin) and the latter for a drug done by quite a few (like, say, nicotine), and the last for a drug done by nearly everyone (maybe caffeine).
In the first case, if you give the guy the test and he comes up positive, his chance of actually being positive is 98/(98 + 200), or about 1 in 3. In the second case, it is 98/(98 + 20) = about 83%. In the last case, it's 980/982, or almost certain! And, if the drug were done by almost no one in the population, then the chance could be much less than 1 in 3. In fact, if no one in that population does the drug, then, by definition, the chance that this person does the drug is 0, regardless of how accurate the test is!
So, you have to know what population the person comes from, and you have to know something about that population, as well.