What these guys lacked in consistency, they made up for in accuracy on Election Eve.
Of the nearly two dozen pollsters that met the parameters for our biennial study of pollster accuracy, it might seem odd at first blush that the crown for highest score came down to a tie between a partisan pollster (GOP firm Public Opinion Strategies) and the often-maligned (around these parts, at least) University of New Hampshire polling center.
However, given what we know now, it would be at least somewhat difficult to say that this distinction carries a great deal of honor with it.
For one thing, the results of this study, which covered pollsters for the 2014 cycle, unearthed (or, at a minimum, provided reams of evidence for) a series of weaknesses with the criteria that I discussed at length a week ago.
Despite unearthing those debatable components of the criteria, in the final analysis I elected, for the sake of continuity with the 2012 study (and the ability to compare results), to keep that set of criteria.
Another qualifying flaw that becomes clear when one looks at winners and losers from the 2014 polling derby is the fact that because the GOP clearly overachieved relative to their polling numbers, it made pollsters that leaned GOP look more accurate. It is no accident, I believe, that some of the pollsters that scored the highest on accuracy in 2014 were among those that scored the lowest in 2012.
Head below the fold for the results of our 2014 study, as well as a look at how we can actually glean some reasonable conclusions from what became an odd and problematic year for pollsters in American elections.
Before getting to the list, let's revisit the criteria, which were only changed slightly (and cosmetically, at that) for 2014:
1. The first act was to gather every pollster that released polling in at least five separate races (not counting national polls). That wound up being a grand total of 23 different pollsters, a considerably smaller number than we saw in 2012 (when 34 pollsters met that standard). Unlike 2012, however, it was decided not to generate a secondary list, which was the so-called "major pollsters" list. In 2012, this meant the exclusion from the "major" pollsters of two groups: pollsters who primarily worked for campaigns or interest groups, and pollsters that only worked in 1-2 states. This time around, there is the addition of a "notes" section where you, the reader, can be informed of those particular quirks with any given pollster.
2. Duplicate polls were then removed from the study. As a result, pollsters were only assessed by their most recent poll in each race. Furthermore, only polls whose field dates were October 1st or later were considered in the assessment process.
3. Then, each of the polling units was graded on three criteria:
The first criterion was a simple one--in how many contests did the pollster pick the correct winner? If the pollster forecasted a tie, then that counted for one-half a correct pick. The pollster's "win/loss record" was then rounded to the nearest whole percent, for a score between 0-100. For example, Mason Dixon (who, as luck would have it, were not contracted to poll a single difficult race this cycle) correctly forecast the winner in all five races they polled. Therefore, on this measure, they earned a perfect 100 point total. Suffolk University went with 7 winners and 2 losers, thus earning them a score of 78.
The second criterion was a simple assessment of error. Each election result was rounded to the nearest whole number, and the same was done with the polling results. Then, all that was left was to calculate the difference.
For example, the November 2nd UNH poll on the New Hampshire Governor's race had incumbent Democratic Gov. Maggie Hassan leading by 4 points. She eventually won by six points, thus the "simple error" would be two points.
Each pollster was then credited with an overall "error score" based on how little average error there was in their polling. The math here is painfully simple. No error at all would yield 100 points, while an average error of ten points would get you zip, zero, nada. By the way, if you think 10 points was too generous, bear this in mind: back in 2012, two GOP pollsters had an average error of over ten points.
The math here was basic: for every tenth of a point of average error, one point was deducted from a 100 point perfect score. In 2014, the leaders on this measurement (a tie between the Univ. of New Hampshire and Suffolk University) had an average error of just 3.3 percent. That would yield them the score of 67.
(As a point of comparison, the lowest average error in 2012 was 2.0 percent.)
The third measurement sought to reward those who did not show a strong partisan lean. This was called the "partisan error" score. Here, we took the error number from criteria two, and added an element. The question: did the pollster overestimate the Democratic performance, or the Republican one? The total number of points on the margin for each party were added up, and then the difference was taken. That was then divided by the number of polls. This led to a number that (usually) was lower than the "error" score, because a good pollster won't miss in favor of just one party every single time.
Interestingly, virtually every pollster had an average error that overestimated the performance of the Democrats. This makes sense, given that the general sentiment of the 2014 cycle was that the Republican Party over-performance was "surprising", given the polling beforehand.
For this criterion, the 0-100 score was calculated the same way. For example, PPP, on average, erred in favor of the Democrats by 3.7 percent. Therefore, their "partisan error" score would be 63.
Under those criteria, here was your running order among the 23 qualifying polling groups for the 2014 cycle:
1 (tie). Public Opinion Strategies—R (246 points)
—Win/Loss (5 wins 0 losses 2 draws—86 points)
—Average Error (4.0 percent—60 points)
—Partisan Error (Republicans by 0.0 percent—100 points)
1 (tie). University of New Hampshire (246 points)
—Win/Loss (7 wins 0 losses 1 draw—94 points)
—Average Error (3.3 percent—67 points)
—Partisan Error (Democrats by 1.5 percent—85 points)
3. Suffolk University (238 points)
—Win/Loss (7 wins 2 losses—78 points)
—Average Error (3.3 percent—67 points)
—Partisan Error (Democrats by 0.7 percent—93 points)
4. Vox Populi—R (223 points)
—Win/Loss (7 wins 2 losses 1 draw—75 points)
—Average Error (3.5 percent—65 points)
—Partisan Error (Republicans by 1.7 percent—83 points)
5 (tie). Hickman Analytics (220 points)
—Win/Loss (4 wins 1 loss—80 points)
—Average Error (5.6 percent—44 points)
—Partisan Error (Democrats by 0.4 percent—96 points)
5 (tie). Landmark Communications—R (220 points)
—Win/Loss (5 wins 0 losses—100 points)
—Average Error (4.2 percent—58 points)
—Partisan Error (Democrats by 3.8 percent—62 points)
7. Ipsos (212 points)
—Win/Loss (3 wins 0 losses 2 draws—80 points)
—Average Error (4.0 percent—60 points)
—Partisan Error (Democrats by 2.8 percent—72 points)
8. Monmouth (204 points)
—Win/Loss (13 wins 2 losses 1 draws—84 points)
—Average Error (4.0 percent—60 points)
—Partisan Error (Democrats by 4.0 percent—60 points)
9. Siena College (203 points)
—Win/Loss (8 wins 0 losses—100 points)
—Average Error (5.6 percent—44 points)
—Partisan Error (Democrats by 4.1 percent—59 points)
10. PPP (197 points)
—Win/Loss (25 wins 4 losses 3 draws—83 points)
—Average Error (4.9 percent—51 points)
—Partisan Error (Democrats by 3.7 percent—63 points)
11. We Ask America—R (194 points)
—Win/Loss (6 wins 1 loss—86 points)
—Average Error (4.6 percent—54 points)
—Partisan Error (Democrats by 4.6 percent—54 points)
12. CNN/ORC (193 points)
—Win/Loss (8 wins 3 losses 2 draws—69 points)
—Average Error (4.4 percent—56 points)
—Partisan Error (Democrats by 3.2 percent—68 points)
13. Harper Polling—R (192 points)
—Win/Loss (5 wins 0 losses 0 draws—100 points)
—Average Error (5.4 percent—46 points)
—Partisan Error (Democrats by 5.4 percent—46 points)
14. Quinnipiac (184 points)
—Win/Loss (5 wins 2 losses 1 draws—69 points)
—Average Error (4.9 percent—51 points)
—Partisan Error (Democrats by 3.6 percent—64 points)
15. YouGov (181 points)
—Win/Loss (61 wins 10 losses 2 draws—85 points)
—Average Error (6.3 percent—37 points)
—Partisan Error (Democrats by 4.1 percent—59 points)
16. Mason Dixon (180 points)
—Win/Loss (5 wins 0 losses 0 draws—100 points)
—Average Error (6.2 percent—38 points)
—Partisan Error (Democrats by 5.8 percent—42 points)
17 (Tie). Rasmussen (179 points)
—Win/Loss (23 wins 5 losses 2 draws—80 points)
—Average Error (5.7 percent—43 points)
—Partisan Error (Democrats by 4.4 percent—56 points)
17 (Tie). SurveyUSA (179 points)
—Win/Loss (16 wins 5 losses 2 draws—74 points)
—Average Error (5.6 percent—44 points)
—Partisan Error (Democrats by 3.9 percent—61 points)
19. Marist (163 points)
—Win/Loss (13 wins 2 losses 1 draws—84 points)
—Average Error (6.1 percent—39 points)
—Partisan Error (Democrats by 6.0 percent—40 points)
20. Gravis Marketing—R (162 points)
—Win/Loss (8 wins 5 losses 0 draws—62 points)
—Average Error (6.0 percent—40 points)
—Partisan Error (Democrats by 4.0 percent—60 points)
21. Loras College (157 points)
—Win/Loss (5 wins 1 losses—83 points)
—Average Error (6.3 percent—37 points)
—Partisan Error (Democrats by 6.3 percent—37 points)
22. Anderson Robbins Shaw/Fox News (145 points)
—Win/Loss (8 wins 4 losses 1 tie—65 points)
—Average Error (7.2 points—28 points)
—Partisan Error (Democrats by 4.8 points—52 points)
23. Hendrix College (100 points)
—Win/Loss (4 wins 1 loss—80 points)
—Average Error (9.0 percent—10 points)
—Partisan Error (Democrats by 9.0 percent—10 points)
If the delineation between major pollsters and local/campaign pollsters had been kept, the seven pollsters who either polled for campaigns or polled locally (one or two states) would've been scattered across the spectrum.
A couple landed right at the top (the University of New Hampshire and the GOP outfit Public Opinion Strategies tied for the lead with 246 points), a couple languished near the bottom (Loras College and Hendrix College), and one (Illinois GOP outfit We Ask America) crashed smack dab in the middle.
As stated before, the 2014 ratings underscore certain flaws with the criteria. With specific data points from 2014, let's look at the flaws inherent in the criteria.
In the case of the co-frontrunners (UNH), the reliance only on final polling to measure the pollster gave UNH a huge out, as it voided from consideration their absurdly erratic polling, graphically shown at the top of this piece.
To wit: New Hampshire's 2nd Congressional District, held by Democratic Rep. Anne Kuster. If you buy stock in UNH, Kuster went, in less than a month, from trailing in the race against Republican Marilinda Garcia by four points, to leading by 23 points, to leading by just 11 points a week later. When the actual votes were counted, Kuster bested Garcia by 10 points. For our purposes, only that final poll (which was also closest to the final outcome) was counted.
Races do swing during a cycle, of course (and House races, presumably, do so more than others). However, this kind of whiplash seems excessive, but because of the "final poll" rule, UNH gets away with it, in terms of their performance.
The "partisan error" criterion also helped out some conservative pollsters. The bulk of polling erred on the side of the Democrats in 2014—that conclusion is absolutely inescapable. So, if a pollster was bullish on the GOP throughout, they were vindicated on Election Day by an electorate that was more friendly to the GOP that virtually anyone considered. Thus, their "partisan error" scores, by extension, are fairly strong.
This should not, under any circumstances, be viewed as an endorsement of the neutrality or "fairness" of those pollsters. Quite the opposite—when I see that Vox Populi, despite the general underestimation of GOP prospects, still erred on the side of the GOP, even if by less than two points, I don't marvel at how close they came to the pin. I wonder what the heck their numbers will look like in a neutral polling climate, or one where it is the Democrats that are underestimated before the election.
Which brings us to a potentially valuable comparison point: if a pollster did well in 2014, it doesn't make them a good pollster. It makes them someone who can see a GOP wave forming. But if a pollster can manage that in 2014, and manage to also be accurate in 2012, when it was the Blue team that got underestimated? That, on the other hand, would be a decent statement of quality.
Alas, not everyone can play in this particular game. Of the pollsters on the 2014 list, a handful either did not exist, or did not poll in enough volume, during the 2012 cycle. Two immediately come to mind: GOP pollsters Vox Populi and Harper. Vox Populi wound up second among the major pollsters in 2014, but I'd reserve judgment on their chops until we see how they perform in a non-GOP electoral environment.
All in all, 14 of our 23 pollsters had enough polling in both cycles to be included in both the 2012 and 2014 studies. Of that group, the two best from 2014 (University of New Hampshire and Public Opinion Strategies) scored the worst among the 14 polling teams in 2012. Likewise, one of the better outfits from 2012 (Marist) wound up close to the bottom of the queue in 2014.
Credit should be given, however, to three outfits in particular. Ipsos, Suffolk University, and PPP were in the unique position of being in the upper half of the pollster ratings in both cycles. Ipsos is the leader in that regard, having averaged 236 points between the two cycles. But Suffolk should be given credit for their consistency (only a twelve point difference between cycles), and PPP deserves loads of credit for their top-tier performance, given that their polling volume crushes everyone not named Rasmussen or YouGov.
Speaking of Rasmussen, they hold the opposite distinction—they were in the bottom half in both cycles. They averaged 189 points per cycle, which was only beat by Mason Dixon (averaging 176.5 points) and the lowest performers in the multi-cycle view: Gravis Marketing (which averaged just 174.5 points)
In conclusion, for fun I explored another potential method for calculating pollster quality—hits and misses. In short, I calculated the percentage of polls in which a pollster nailed it by getting within four points of the final margin. Conversely, I calculated the percentage of polls where they blew it by a double-digit margin.
Here, again, the logistics of the study hurt the measurement a bit. Hendrix looks terrible in this assessment (0 percent hits, 60 percent misses), but they only polled five races, and they polled Arkansas, where everyone missed this year. But, even with some inherent flaws, some conclusions can be drawn.
Conclusion #1: Holy crap, did the resident pollsters at Fox News (Anderson Robbins/Shaw) have a bad year. Not only did they fall to the bottom three in our standard metric, but they had the highest percentage of misses of any pollster with at least 10 polls. They missed by at least 10 percentage points on a whopping 38 percent of the races they polled (five of 13). For reference, the double-digit miss average for all 23 firms and all 323 races surveyed by those groups was ... 18 percent. Which, actually, feels kind of high, but is nothing compared to missing the fairway nearly two times in five.
Conclusion #2: PPP, the runaway winners by this hit-versus-miss metric in 2012, had merely a respectable performance this time around. They were in the hit range on 44 percent of their polls this year. While solid (and above the average for the entire polling group, which logged in at 38 percent), it was down from 2012, when they hit on 73 percent of the races they polled.
Conclusion #3: The polling firm to keep an eye on 2016, to see if they are legit or a wave-inspired mirage, is Vox Populi. The newly established GOP firm polled only 10 races, but they hit on seven of them and had no misses. Impressive, to be sure, though until they have to poll a non-GOP wave cycle, those questions will linger.
For those interested in the data, feel free to click here. For those who have other methods of measuring pollsters, the data for the final month for those more prolific firms is included as a tab on the document. Happy number crunching!