2012 in review: Grading the pollsters

by Steve Singiser for Daily Kos Elections

Sunday, Dec. 30, 2012 Sunday, Dec. 30, 2012 at 7:59:05am PST

no image description available — Scott Rasmussen: "Independent" pollster who errs on side of the GOP 81 percent of the time.

Scott Rasmussen: "Independent" pollster who errs on side of the GOP 81 percent of the time.

Now that the results are official pretty much everywhere (New York is a fairly important holdout, though with an obvious rationale for the tardy count), we can finally do a more thorough examination of how America's pollsters fared in the 2012 electoral sweepstakes.

Yes...yes, I realize that this has already been done in a variety of ways elsewhere, but I decided to add my own spin to it. Given my background (I am a polls guy, but from the political angle, not necessarily the math angle), I decided to do a very Algebra I approach to grading the pollsters.

Here's how it worked:

1. I made two lists of pollsters. The first list was every pollster that released polling in at least five separate races (not counting national polls). That wound up being a grand total of 34 different pollsters. Then I did a secondary list, which was the "major pollsters" list. Here, I excluded two groups: pollsters who primarily worked for campaigns, and pollsters that only worked in 1-2 states. This left us with a list of 17 "major" pollsters.

2. I then excluded duplicate polls. Therefore, pollsters were only assessed by their most recent poll in each race. Only polls released after October 1st were considered in the assessment process.

3. I graded each of the pollsters on three criteria:

The first criterion was a simple one--in how many contests did the pollster pick the correct winner? If the pollster forecasted a tie, then that counted for one-half a correct pick. I then rounded to the nearest whole percent, for a score between 0-100.
The second criterion was a simple assessment of error. I rounded each result to the nearest whole number, did the same with the polling results, and then calculated the difference. For example, if the November 5th PPP poll out of North Carolina was 49-49, and Romney eventually won 50-48, the "simple error" would be two points.
I then gave each pollster an overall "error score" based on how little average error there was in their polling. The math here is painfully simple. No error at all would yield 100 points, while an average error of ten points would get you zip, zero, nada. By the way, if you think 10 points was too generous, bear this in mind: two GOP pollsters had an average error in 2012 of over ten points.

The math here was basic: for every tenth of a point of average error, I deducted one point from the 100 point perfect score. Therefore, the clubhouse leader on this measurement (a tie between Democratic pollsters Lake Research and the DCCC's own in-house IVR polling outfit) had an average error of just 2.0 percent. That would yield them the score of 80.
The third measurement sought to reward those who did not show a strong partisan lean. This was called the "partisan error" score. Here, we took the error number from criteria two, and added an element. The question: did the pollster overestimate the Democratic performance, or the Republican one? The total number of points on the margin for each party were added up, and then the difference was taken. That was then divided by the number of polls. This led to a number that (usually) was lower than the "error" score, because a good pollster won't miss in favor of just one party every single time.
Interestingly, virtually every pollster had an average error that overestimated the performance of the GOP. This echoes the national polls we saw, which tended to lowball the lead that President Obama held over Mitt Romney.

For this criterion, the 0-100 score was calculated the same way. For example, Rasmussen, on average, erred in favor of the GOP by 3.5 percent (you'd have thought it'd be higher, but they had a couple of big point misses in blowouts like the North Dakota gubernatorial election. That muted their GOP swing). Therefore, their "partisan error" score would be 65.

So, how did the pollsters fare in 2012? The best, and worst, performances among the major performers might surprise you.

(UPDATE: The link to the GoogleDoc with the data and the "grades" for the pollsters should be fixed now. Apologies to those who tried to view it in the first hour.)

(Continue reading below the fold.)

First, among the sixteen "major pollsters", here is who made the top five:

1. Pharos Research: 267 points
100 points on picking winners (Overall record: 18-0-0)
73 points on "error" score (Average error: 2.7 percent)
94 points on "partisan error" score (Average error: Democrats +0.6)
2. Ipsos/Reuters: 255 points
86 points on picking winners (Overall record: 6-1-0)
76 points on "error" score (Average error: 2.4 percent)
93 points on "partisan error" score (Average error: Republicans +0.7)

3. NBC News/Marist: 243 points
100 points on picking winners (Overall record: 14-0-0)
69 points on "error" score (Average error: 3.1 percent)
74 points on "partisan error" score (Average error: Republicans +2.6)

4. Angus Reid: 242 points
95 points on picking winners (Overall record: 9-0-1)
72 points on "error" score (Average error: 2.8 percent)
75 points on "partisan error" score (Average error: Republicans +2.5)

5. Public Policy Polling: 239 points
96 points on picking winners (Overall record: 48-1-2)
64 points on "error" score (Average error: 3.6 percent)
79 points on "partisan error" score (Average error: Republicans +2.1)

Everyone except the most devoted Polling Wrap devotees might be asking the same question: who the hell is Pharos Research? The pollster caught some eyes in October, when they began a weekly series of polls in about a half dozen states. Their late start to the game aroused some skepticism. Their head honcho, Steve Leuchtman, even had some correspondence with our own David Nir, who sought to figure out who this new firm was. Nate Silver was even more skeptical, leaving Leuchtman to argue that the results would "speak for themselves."

To his credit, they did. They hit all 18 of the races that they polled, including the razor-thin Florida presidential race and the North Dakota Senate race. Their average error was relatively small, and their "partisan error" was among the smallest of any polling outfit. Of course, part of the reason why this was the case is that they were easily the most Democratic leaning of the "major pollsters". In a Democratic year, that paid off for them.

Their biggest miss, as it happened, was one of their most high-profile polls: the Nebraska Senate race. They weren't alone on that ledge, however, as the Omaha-World Herald poll also badly overstated Democrat Bob Kerrey's chances.

Ipsos/Reuters and Angus Reid, by the way, were internet-based samples. The early failed experiment with internet sampling (the notorious "Zogby Interactive" polls) besmirched the entire genre. But their numbers this year were solid, and a third 'net based pollster (YouGov) finished just outside of the top five.

A word about PPP. Their performance this year was, as always, awesome. What dinged their numbers a bit here was one simple fact: unlike everyone who finished ahead of them, they polled individual House districts. These races are often much more perilous to poll. Their only miss (out of 51 races!) was in a House race, where a private poll they conducted one week out from Election Day gave incumbent Republican Frank Guinta a one-point lead in New Hampshire (he wound up losing to Democrat Carol Shea-Porter by a 50-46 margin). The firm's average error in House races was 4.5 percent, considerably higher than their average error in the statewide races (3.47 percent).

They did have a couple of big misses: they gave Claire McCaskill a slight lead in a race that she eventually won by a wipeout, and they underestimated Obama's blowout in Massachusetts by nearly a dozen points. All in all, though, another amazing effort by the crew out of North Carolina. Here is a stat to consider: PPP got within four points or less of the final result in a whopping 73 percent of the races they polled.

Now, for the bottom five:

1. American Research Group: 121 points
39 points on picking winners (Overall record: 3-5-1)
41 points on "error" score (Average error: 5.9 percent)
41 points on "partisan error" score (Average error: Republicans +5.9)
2. University of New Hampshire: 168 points
71 points on picking winners (Overall record: 4-1-2)
46 points on "error" score (Average error: 5.4 percent)
51 points on "partisan error" score (Average error: Republicans +4.9)

3. Mason Dixon: 173 points
75 points on picking winners (Overall record: 15-6-1)
43 points on "error" score (Average error: 5.7 percent)
55 points on "partisan error" score (Average error: Republicans +4.5)

4. Gravis Marketing: 187 points
90 points on picking winners (Overall record: 16-1-2)
46 points on "error" score (Average error: 5.4 percent)
51 points on "partisan error" score (Average error: Republicans +4.9)

5. Rasmussen Reports: 199 points
85 points on picking winners (Overall record: 34-5-3)
49 points on "error" score (Average error: 5.1 percent)
65 points on "partisan error" score (Average error: Republicans +3.5)

Oh, mama. Man, did the "pirate pollster" have a shitty year. Remember that this was a year in which only two presidential states were decided by less than five points. Ergo, picking "winners" in this cycle should've been a cakewalk. ARG couldn't even bat .500, for crying out loud. They missed the presidential winner in Colorado, Florida, Iowa and Virginia. Add a big miss in the New Hampshire gubernatorial race (where Democrat Maggie Hassan eventually won by double digits), and you have an ugly cycle for the Pirate.

Mason Dixon might've only earned the bronze medal of dishonor, but they definitely earn some special dispensation for their crappiness in 2012. They had more "losses" than anyone in terms of picking winners with their polls. They missed on six races, out of just 22 races polled. Remember, PPP had just one miss, despite polling more than twice as many races!

For the House of Ras, meanwhile, it could have easily been worse. They had a pair of misses where they overestimated the Democratic performance by a solid margin (the North Dakota gubernatorial election and the New Mexico Senate race). Take those two out of the mix, and their average error in favor of the GOP would have been considerably higher.

Looking at the larger list, the outcome is incredibly predictable. The Democratic private pollsters, by and large, did quite well. The Republican private pollsters took a bath. The worst two were a pair of GOP pollsters that did so bad, they actually earned goose eggs on the error and partisan error scores. In other words, their results favored the GOP by more than ten percentage points.

The "winner" of the worst performance came from the GOP outfit OnMessage, which polled six races and correctly forecast just one winner. Their average error was an eye-popping 11.5 percent, and all six of their polls overestimated the performance of their GOP candidate.

Of course, the explanation for this is likely a simple case of selection bias. It is entirely possible, of course, that OnMessage and other GOP pollsters conducted tons of surveys that were more accurate. And it is equally possible that said surveys never saw the light of day, because campaigns aren't in the business of releasing numbers showing them getting smooshed.

Therefore, the presence of Democratic pollsters in the top ten, and the presence of GOP pollsters in the bottom ten, likely has less to do with the inherent quality of the pollster in question and more to do with the fact that this cycle, generally speaking, sucked for Republicans.

The exception, to pile on yet again, is Rasmussen. Lest we forget, their performance sucked in 2010, which was as good a year as you are likely see for the Republican Party. So whether their preferred party is in or out of favor, the firm has laid eggs over the past two cycles.

For the entire list, and the polling data used to arrive at those figures, click here. A word of warning: while I took great pains to cull together every poll I could get my hands on on a daily basis, I freely concede that a poll or two might've slipped through the cracks. That said, the overwhelming majority of polling data for the 2012 cycle did make it into our database, which was the basis for the numbers used in this rating system.

Furthermore, this is just one way to measure pollsters. I chose three criteria, but there are certainly tons of other ways to assess the quality of the numbers guys. So, to beat a cliché into the ground, your mileage may vary. Enjoy, and argue at will in the comments.