There's a really big problem with the general understanding, on this site and elsewhere, of how statistical sampling and analysis works. I'm posting this diary to show what that problem is so that we can move forward from this point and start addressing the real issues that certain poll results have pointed out.
I am not doing this to stir the pot. I am doing this to try to clear up a severe misconception that I've seen and heard thrown around on this site in the last few days since the election and the passage of Proposition 8. I am not doing this to make racist claims or try to support anger and hate against black people, and anyone who tries to spin it that way is wrong for doing so.
Moving on...
The claim that small samples cannot accurately predict the behavior of the population they are drawn from is incorrect. As small a sample as 30 individuals, if taken correctly and if representative of the population, can accurately predict the behavior of that population. I will elaborate after the jump.
If you're going to follow elections, you should have at least a basic knowledge of statistics. Hell, if you're going to operate in a world where claims about polling and survey and experimental results are made every day, you should have at least a basic knowledge of statistics. Being ignorant on this topic is just as bad as being ignorant of how the Electoral College works, or of the functions of the branches of the federal government, or of the checks and balances on them. If you don't understand basic statistics, you are at a severe disadvantage.
I do statistics. I'm good at it. I know how to explain it. Hell, I teach it. And I know that a lot of people can't believe that if you have a representative sample of a population, you only need 30 units (people, items, articles, advertisements, whatever) to make predictions about that population.
I know this sounds ludicrous. I've had many undergraduate students say "no way, that's impossible." I doubted it myself until I was shown the proof. The thing is, it's true. The only thing that bigger samples give you is a reduction in sampling error - which is to say, a reduction of the range of your prediction. A small sample might show that 69% of a population will vote a certain way, but with a margin of error of 20 points (meaning that the population might actually only vote that way 49% of the time, or they might vote that way 89% of the time), and a large sample might still show a result of 69%, but a margin of error of only 3 points (meaning that the range of possible behaviors in the population might be 66% to 72%).
In methodological terms, this is called reliability. Reliability is how repeatable your result is. The lower your sampling error, the more reliable and accurate your results are.
Validity, or how sure you are that you're measuring something real or seeing a real effect, is calculated with a confidence interval. That's a different thing, and we'll get to that.
If I am dealing with the reliability issue, I say that 69% of the time, population X voted this way, with a margin of error of + or - 3%. If I am dealing with the validity issue, I say that I'm 95% sure that 69% of the time, population X voted this way, with a margin of error of + or - 3%. Please note the difference in these two statements. One is a measure of how accurate the sample is. The other is a measure of how certain the sample is.
These two things are not the same. I did a survey last spring for a paper I'm writing, which collected qualitative (that is, written-out, rather than picked-from a list) responses. The survey was more valid when I used all the qualitative responses to illustrate my points. Using those responses as qualitative, written-out interview responses showed that I really was measuring what I claimed to measure. But it was more reliable when I reduced all those individual, rich, full-of-shades-of-meaning responses into just a few categories. You sacrifice reliability for validity, and vice-versa, and most researchers try to strike a balance between them.
When you're measuring a population with a sample of that population, and you're looking at something which is a very quantitative thing like voting patterns, you have reliability at the expense of validity. And if we were talking about something like feelings, or experiences, we might have a serious validity problem, because the way the researcher defines and understands a feeling may not match what the respondent thinks about that feeling. But the fact is, a vote is pretty much just that, a vote. It's not nebulous, it doesn't have more than one meaning. A vote is a yes-no, binary switch. You either voted, or you didn't. If someone tells you they voted for something, you have a fairly valid result right there, as long as you think they're trustworthy.
(The so-called "Bradley effect" is a validity problem. Perhaps people lied to the exit pollsters. That's a validity issue. We'll get to that.)
But using statistics, we can calculate reliability.
So the first question is, how reliable is this sample? Can we depend on getting the same responses again if we took a different sample of this population? How accurate is our result? Those are questions of reliability, not validity. And those are the accusations I'm hearing - that this wasn't accurate, that it can't be applied to the whole population, etc.
And it still sounds ridiculous that a sample size of only 30 people can give us accurate predictions, doesn't it?
Those who would like to learn how this is possible, please follow along.
There's a basic statistical calculation called "the sampling error of a proportion." I would have to go into some detail to explain it fully - basically, I'd have to teach you what I've been teaching my stats class for the last six weeks, which may be too much to deal with. But to give a quick-and-dirty version:
Part 1: The Logic of Sampling
Each time you take a sample of a population you get a slightly different distribution (think bell curve) and a slightly different mean (center point, the mathematical average of all scores). The distribution may not even look like a bell curve. It may be skewed (moved right or left from the mean) or it may be kurtotic (too tall or too flat). BUT:
The more samples you take, the more likely they are to average out into a normal curve.
For example. Let's say that I'm a fairly tough grader. My students got an average of about 15.5/25 on their last exam. The other instructor, however, is not as tough on his students, and their average on the same exam was more like 20/25. These are two different samples of a population, and their means are about 4.5 points apart. With me so far?
If you average out their scores, the entire class forms a more-or-less normal bell curve, with an average score of 17.25. My sample's mean is below this; my fellow instructor's sample is above it. If we had three samples, or five, or ten, the bell curve would smooth out even more.
BUT: Even with one sample only, it is possible to predict the variation of a sample's mean. It's possible to say "Okay, this sample may be off by X much. This value would be the low-end mean if we took a bunch of samples, and this other value would be the high-end mean."
Still with me?
Part Two: The Standard Error of a Sample Proportion
The formula for calculating the error of a sample proportion is simple: D = (square root of)[.5*.5 / N]
That is, the sample error will be the proportion of cases in the category we're interested in times the proportion of cases not in that category, divided by the N (or number of cases) of the sample. This is the most conservative way of calculating it, by the way; if we are assuming there is no difference between groups - that is, that one group will be exactly the same as another group - then we assume that half the distribution falls below the mean and half falls above it, thus .5 * .5.
We could also calculate this by saying "Okay, we have 60 men and 40 women in this sample of 100, so the division is .6*.4," and that does get done, but it's not as conservative. .5*.5 minimizes the play, and is the standard, and allows us to assume the maximum amount of variation in the sample. (Try it yourself. .5*.5 gives us the largest number possible when multiplying two decimals which, when added together, equal 100: 0.25. .6 *.4 gives us .24, .7 *.3 gives us .21, .99 * .01 gives us .0099, and so forth.)
Now I'll demonstrate how an N of 30 is very likely to be accurate. Statisticians generally don't like to be inaccurate more than 5% of the time, so they use a 95% confidence interval, but pollsters will usually go with a 90% accuracy, so we'll go with that (although my understanding is that CNN's polling group uses a 95% confidence interval).
Let's say our N is 5.
D = (sqrt)[.5 * .5 / 5]
D = (sqrt)[.25/5]
D = (sqrt)[0.05]
D = .2236, or 22.36%
So if our sample N is only 5, we could be off by as much as 22 percent in our predictions. If we say that a population does X 69% of the time, the next sample of 5 might show that the population does X 47% of the time, or it might show that the population does X 91% of the time.
Let's try it with an N of 10.
D = (sqrt)[.5 * .5 / 10]
D = (sqrt)[.25/10]
D = (sqrt)[0.025]
D = .1581, or 15.81%
We've just reduced error by about 7 percent simply by doubling our sample size to 10. Now, when we make our predictions, we're only going to be wrong about 16% of the time. Now that population doing X 69% of the time with the first sample might be doing X 53% of the time with the next sample, or it might be doing X 85% of the time. Our prediction range just got smaller, and hence, more accurate.
Now watch what happens when we double that sample size to 20.
D = (sqrt)[.5 * .5 / 20]
D = (sqrt)[.25/20]
D = (sqrt)[0.0125]
D = .1118, or 11.18%
This time we've only reduced our error by about 4.5%. But we've still reduced it. It's just not quite as big a reduction. Now that population doing X 69% of the time might display X 58% of the time, or 80% of the time.
Now let's do this with an N of 30.
D = (sqrt)[.5 * .5 / 30]
D = (sqrt)[.25/30]
D = (sqrt)[0.0083]
D = .0912, or 9.12%
Again, we've reduced error, by about 4%. Note how the rate of error reduction keeps dropping? Each time we raise the N, we shave off less error than the last time. It's a law of diminishing returns situation. And notice: we are now below the magical 10% cutoff point. With a representative sample of a population, we can now predict that population's behavior with 90% accuracy. We may be off by as much as 10 percent, but we can predict it to within 10 percent. So that population might be doing X only 59% of the time, or 79% of the time, but the mean is 69% of the time.
We can keep doing iterations of this, but I think I've demonstrated what I intended to demonstrate. So let's say that of our sample of 224 people, 69% (or .69) voted Yes on 8 on Tuesday.
D = (sqrt)[.5 * .5/224]
D = (sqrt)[.25/224]
D = (sqrt)[.0011]
D = .0331, or 3.31%
The accuracy of this sample, when applied to the population, would only be off by 3.3% in either direction. The population might have voted only at 65.7%, or they might have voted at 72.3%. Those are the upper and lower boundaries of the sample mean distribution. We can rely on this result.
Which means that we can infer or predict that given the opportunity, the entire population that this sample was drawn from would vote the same way the sample did. We would see the same pattern of voting: .69 one way, .31 the other. We could be off by as much as 3% in either direction. And that's with the most conservative estimate which allows us to assume maximum variation in the sample. If I did this with the proportions of the population, .69 * .31, here's what we'd see:
D = (sqrt)[.69 * .31/224]
D = (sqrt)[.21/224]
D = (sqrt)[.0095]
D = .0309, or 3.09%
See? Even less chance that we're wrong.
So this shows, conclusively, that a sample of 30 can accurately predict the behavior of a population. And the math shows that the bigger the sample, the higher the accuracy of the prediction. That's all a larger sample does.
It doesn't matter what the population size is. I don't care if the population is 6.7% of the overall population of the state, or 10%. The point is that 69% of that population, plus or minus 3%, will do this. And yes, we can in fact determine this with a sample size of 30, if we're willing to be off by as much as 10%, as long as the sample is representative of the population.
If you want to argue that the sample wasn't representative, feel free. I'd love to hear arguments to that effect. If you can show me problems with the methodology, I would love to hear it. But assuming that the methodology holds up, the statistics do too.
(If you look up Edison Media Research, the company that does CNN's polling, you'll find that their methodology is sound. Of course, that's going to take a lesson in methodology to convince most people, but if you want to educate yourself on it, Earl Babbie's "The Practice of Social Research" is a good starting point.)
Part Three: Determining Validity of the Findings - the Confidence Interval
Okay, so we can predict the behavior of a group 69% of the time, and that will be accurate, give or take a couple percentage points, if we have a representative sample of 224 people. But how likely is it that this result is wrong? I mean, how valid is this result? How much can we depend on it measuring what it claims to measure?
This is where we get something called a "confidence interval." It's based on the proportions of the normal distribution/bell curve. For this to make sense, I have to give you a mini-lesson on the normal distribution.
The Normal Distribution: Variance and Standard Deviation
The normal distribution is divided into sections called "standard deviations." The way it's calculated is this:
Take each individual score.
Subtract the mean of all scores from that individual score.
Square the result, and set that number aside.
Do these steps for each individual score.
Add up all the squared differences.
Divide that number by N - 1 (the number of cases, minus 1).
This gives you a number called the "variance." The variance is simply the sum of all squared differences from the mean score. Why do we square it? Because otherwise we'd have a bunch of negative numbers and a bunch of positive numbers and they'd cancel each other out. There are other reasons to square instead of taking the absolute value, but that's the one that's important here. The larger the variance, the higher the variability around the mean. The smaller the variance, the lower the variability.
If we have a variance of 10, it's going to be a smaller distribution of scores than if we have a variance of 100, or 1000. But it's hard to talk about variance in any meaningful way because it's hard to wrap our heads around the idea of "squared" differences. So we take the square root of that sum.
That value is the standard deviation. Add it to the mean, and you've gone up one standard deviation from the mean. Subtract it from the mean, and you've gone down one standard deviation.
Please take me at my word that when we have a normal distribution, with the mean at the center, and we go out three standard deviations from the mean, we have covered 99% of the scores. (One deviation out at either side contains 67% of all scores, and two deviations out contains 95%. The mean is at the center, and divides the distribution into 50% below and 50% above, exactly in half. All this will be important in a minute.)
The Z-Score: Standardizing the Scores to Compare Them, and Confidence Intervals
Ever hear of a teacher "curving" grades? What that means is that they take the scores and find their Z-score, which is a method of standardizing points along the normal distribution. If the area below the mean of a normal distribution contains 50% of the scores (which it does), then one deviation up from the mean will add 33.5% of the scores (half of 67% is 33.5%). Which means that if our mean is 17 points, and our standard deviation is 3 points, and a student has scored 20 on the assignment or quiz, they are one standard deviation up from the mean, and they have a score that is better than 83.5% of the students (because 50% - the mean - plus 33.5% - the first deviation - gives us 83.5%).
A student scoring 23 would be up two standard deviations, which would give them a score that is better than 98% of all the other students. The reason for this is because the second deviation is 95% of all scores, so if you take 95% and subtract 67%, you get 28%. Divide this in half and you get 14%, which is the distance between the first and second deviation. Add 50% (the mean) to 33.5% (the distance between the mean and the first deviation) and then add 14% (the distance between the first deviation and the second), and you get 97.5%.
The Z-score table is based on this logic. It's a range of scores from - 3 (the third deviation below the mean) to +3 (the third deviation above the mean) with the mean set at 0. The way you calculate a Z-score is to take the score you're interested in, subtract the mean of all scores from it, and divide the result by the standard deviation. So in this case, for example, we would take the student's score, 20, subtract the mean, 17, from it, and get 3. Then we divide that by the standard deviation, which is also 3, and we get 1. The value of that Z-score, in percentages of the distribution, is .835 - or 83.5%, which corresponds exactly to the value of one standard deviation above the mean. When we do this for the student scoring 23, we'd get 23 - 17 = 6. Divide that by the standard deviation of 3, and you get 2 - and that's that student's Z-score, corresponding exactly to the value of two standard deviations above the mean.
The way you calculate a confidence interval of a proportion, or the likelihood of being wrong, is to take the proportion in question (in this case, .69), plus and minus its z-score proportion, and multiply it by the square root of .5*.5 / N, which is the same equation that you use to find the sample error of the proportion.
Now, unfortunately, there's no way for us to calculate and check the confidence interval of the CNN poll, because neither the mean nor the standard deviation of the sample was published, so far as I know (nor was the variance, which would allow us to calculate the standard deviation, as the standard deviation is the square root of the variance). But when you look at their methodology, you can see that the polling company used a 95% confidence interval. Now, you can decide they're lying if you like, but I believe them, because they could get in serious credibility problems if they lied about their confidence interval.
Conclusion: What's your point, KoSC?
My point here is that arguing that the sample is inaccurate simply because of how large or small it is is not correct. The size of the sample actually makes very little difference in how applicable the results are to the population. It simply makes the range of possible results smaller as the size of the sample increases.
Additionally, since we're talking about voting behavior, not feelings or experiences or other nebulously-defined things that are hard to measure, the main validity issue here is whether or not the people polled were telling the truth about their votes. Absent that, we have an accurate and valid result.
Which means that (and, in my opinion, unfortunately) this is a result we can depend on, and infer behavior of the entire population from. This is not palatable or comfortable, but it's still correct.
It's not scapegoating to discuss this factual information. It's not blaming to say "hey, 69% of this group voted Yes on 8." It's telling the truth. It's identifying problem areas that need to be addressed. Anyone trying to make that into a blame game, or to say that I'm racist simply because I'm discussing this factual information, has issues I can't address here. But it's not blaming or racist to identify this as the truth. I'm absolutely not trying to justify racism. But these results are real and can't be denied, and denying it is just as wrong as someone using them to be a racist ass.
One other thing: This result may very well not be valid for the entire black population, as it was a subset of the black community. But it is valid when applied to black voters, and it's still pointing out a huge problem area. Whether or not it was all blacks is irrelevant. We're talking about the population of black voters. Let's be very specific about that, okay?
On the other hand, black voters are part of black communities. And we need to recognize that, as well.
I hope this makes sense. Please comment if you have questions. I want people to understand this stuff, because damn it, it MATTERS.