With most of the votes counted, Stochastic Democracy analyzes how well it's model's predictions did vs Nate Silver's FiveThirtyEight, as well as drawing lessons for future election forecasting models.

See below the fold for details, or click through the site to see an interactive web applet.

StochasticDemocracy.Blogspot.com

**************Cross-Posted at StochasticDemocracy**********

With most of the results in, it's now possible to make a comprehensive comparison with Nate's FiveThirtyEight. (See my pre-election comparison of our methodologies here)

Summary:

Stochastic Democracy did a better job predicting the presidential race, FiveThirtyEight did a better job in the Senate, and both sides performed equally well predicting the popular vote.

Electoral College:

But because of the winner-take-all system, this is not a good way to gauge accuracy. Instead, it is a better idea to look at how we did at predicting the margins in individual states:

In terms of mean absolute error, my model had a slightly lower mean absolute error than his. But the Kurtosis of his prediction residuals is much higher than mine.

That means that most of the time, his predictions were slightly more accurate then mine. But when his predictions were off, they were really really off.

If I had to pick a culprit for this, I would say this is a consequence of his extensive and much celebrated use of regression. The idea behind regression here, that it's possible to predict a state's political orientation by simply applying a mathematical formula to it's demographics, usually works depressingly well.

But when it fails, it fails badly(See Indiana). And while it's possible to correct for this by using fat tailed error distributions, it doesn't seem that Nate did this.

But, everyone has their own metrics for this sort of thing, so for those who want to see the raw data, click here.

Senate:

Nate and I basically made the same senate predictions, with one exception: Georgia. I predicted that a run-off was going to be called in Georgia, while Nate predicted that the Republicans were favorites to win the seat outright. Luckily, I made the right call there.

That aside, we made the same mistakes, in Minnesota and Alaska. While neither of races have been called yet, the Republican canidate is currently ahead in both states.

But it's uninformative to just see if we correctly predicted the winner in the state. By looking at margins, we see Nate's model edges out mine slightly.

It seems that when it comes to the Senate, Nate's model did a better job then mine. Why?

Nate accounted for house effects and pollster reliability, while I never had time to implement such things into my own model. While this turned out to be unimportant for the presidential race, this election has shown that as a whole, Senate polls suck. Of course, these effects can and will be implemented into my model easily, so we will see if the discrepancy continues with the next edition of my model.

On the other hand, the theoretical simplicity and resulting flexibility of my model allowed me to adapt to non-standard situations, like the Georgia run-off, better than Nate's model. I believe this flexibility will make my approach more popular in future elections.

Popular Vote:

I did a better job predicting Obama's share of the vote, while Nate did a (much) better job predicting McCain's share of the vote.

I'm going to declare this one a tie, mainly since votes are still being counted, and we both did pretty well. Still, since Nate did such a better job predicting McCain's number, this should lean toward FiveThirtyEight.

[Note, new numbers since then indicate that my prediction was unambiguously better - 12/1/08]

Conclusion:
Altogether, the comparison is a bit of a wash up, and this election has provided both of us clear paths for how to improve in the future.

Still, I hope that everyone who enjoyed both our site's coverage of the election comes back for post-election analysis. Both of us have interesting work coming up.

Update: Pollster seemed to do very well also

Update: The popular vote numbers have changed, with the new results indicating Obama/McCain : 52.81%--45.80%

My prediction: Obama/McCain  : 52.68/ 45.31
538's prediction:Obama/McCain: 52.4 / 46.2

While the different forecasts were very close, StochasticDemocracy unambiguously outperformed 538.

**************Cross-Posted at StochasticDemocracy**********

#### Tags

EMAIL TO A FRIEND X
You must add at least one tag to this diary before publishing it.

Add keywords that describe this diary. Separate multiple keywords with commas.
Tagging tips - Search For Tags - Browse For Tags

?

More Tagging tips:

A tag is a way to search for this diary. If someone is searching for "Barack Obama," is this a diary they'd be trying to find?

Use a person's full name, without any title. Senator Obama may become President Obama, and Michelle Obama might run for office.

If your diary covers an election or elected official, use election tags, which are generally the state abbreviation followed by the office. CA-01 is the first district House seat. CA-Sen covers both senate races. NY-GOV covers the New York governor's race.

Tags do not compound: that is, "education reform" is a completely different tag from "education". A tag like "reform" alone is probably not meaningful.

Consider if one or more of these tags fits your diary: Civil Rights, Community, Congress, Culture, Economy, Education, Elections, Energy, Environment, Health Care, International, Labor, Law, Media, Meta, National Security, Science, Transportation, or White House. If your diary is specific to a state, consider adding the state (California, Texas, etc). Keep in mind, though, that there are many wonderful and important diaries that don't fit in any of these tags. Don't worry if yours doesn't.

You can add a private note to this diary when hotlisting it:
Are you sure you want to remove this diary from your hotlist?
Are you sure you want to remove your recommendation? You can only recommend a diary once, so you will not be able to re-recommend it afterwards.
Rescue this diary, and add a note:
Are you sure you want to remove this diary from Rescue?
Choose where to republish this diary. The diary will be added to the queue for that group. Publish it from the queue to make it appear.

You must be a member of a group to use this feature.

Add a quick update to your diary without changing the diary itself:
Are you sure you want to remove this diary?
 Unpublish Diary (The diary will be removed from the site and returned to your drafts for further editing.) Delete Diary (The diary will be removed.)
Are you sure you want to save these changes to the published diary?

#### Comment Preferences

• ##### Tips, Comments, Recs, Love, Hate..... (23+ / 0-)

Feedback is a public good....

• ##### Blog link is wrong(0+ / 0-)

:)

Yes We Did! Yes We Did! Yes We Did! 11 PM EST, NOV 4 2008. The day the world changed.

[ Parent ]

• ##### Fixed!(0+ / 0-)

That's the one thing I'd want right...

• ##### Nice work. These latest applications of (2+ / 0-)
Recommended by:
peraspera, El Ochito

predictive mathematics to political races are really impressive!

"A single event can awaken within us a stranger totally unknown to us. To live is to be slowly born." -Antoine de Saint ExupĂ©ry

[ Parent ]

• ##### Many thanks for this data.(3+ / 0-)
Recommended by:
peraspera, AlwaysDemocrat, Jazione

Looking back is obviously very valuable...would be interested in any other lessons learned...

• ##### Agreed(5+ / 0-)

I'm still looking though everything. The only concrete thing(read, statistically significant) I've found so far, is that polls were more wrong in states with lots of young people.

As far as lessons learned for poll aggregation like me, I'm working on that too. Check back soon for...

• ##### MN and AK Sen(1+ / 0-)
Recommended by:
AlwaysDemocrat

It's likely the Democrat will win in both races once all votes are counted, or recounted.

Here we are now Entertain us I feel stupid and contagious

• ##### You might be right(2+ / 0-)
Recommended by:
peraspera, El Ochito

I'm not sure if I would say "likely", but I certainly agree with you that it's possible.

I'd say our chances are fairly good in Alaska(Nate has a good post on that here). My own back of the envelope regressions tell me the odds are about 50-50, slightly in Begich's favor.

But in Minnesota, I don't have any idea. I'm not an expert with the whole "Undervote/Overvote" thing.

• ##### Most of those 'undervotes' are in Obama counties(1+ / 0-)
Recommended by:
peraspera

MN-Sen

Three counties -- Hennepin, Ramsey and St. Louis -- account for 10,540 votes in the dropoff. Each saw Obama win with 63 percent or more.

Ballots that showed a presidential vote but no Senate vote are called the "undervote." Statewide, more than 18,000 of those ballots came from counties won by Obama with more than half the vote. About 6,100 were in counties won by Republican John McCain with at least 50 percent.

http://www.startribune.com/...

Here we are now Entertain us I feel stupid and contagious

[ Parent ]

• ##### Excelleny job, both of you(6+ / 0-)

Amazing science. Thanks for your help throughout the election. You guys kept me from going insane with doubt, fear and worry.

• ##### Nice analysis.(2+ / 0-)
Recommended by:
Julie Gulden, dorkenergy

I wouldn't call the popular vote a tie because Nate was closer to the number than you and will probably remain so with the count still out. It being your analysis you should probably err on the side of the other guy lest your humility be questioned.

I think all things considered you guys both did an excellent job and that Nate's final numbers probably suffered from his increased sensitivity in his calculations in the final days. His model is excellent and probably didn't require the increase.

I'm really grateful to you and Nate and everyone who contributed to your work in the campaign season, it's a real service you guys provide and made my campaign watching just that much more enjoyable.

"Good to be here, good to be anywhere." --Keith Richards

• ##### And oh yeah...(2+ / 0-)
Recommended by:
peraspera, Julie Gulden

....tipped and highly rec'd for all the hard work.

"Good to be here, good to be anywhere." --Keith Richards

[ Parent ]

• ##### fabulous, I've been waiting for this diary :)(1+ / 0-)
Recommended by:
peraspera

I love the idea of two models based on completely different assumptions being used to test objective reality.

The question I have been holding off on... but now can ask...

Since you BOTH did not call AK ... does that raise questions about what actually happened there?

The real question I am asking is ... how well can these tools be used to sniff out fraud?

• ##### I shouldn't say anything...(3+ / 0-)
Recommended by:
peraspera, El Ochito, Steko

But yes, it looks really really fishy.

Polls usually do a pretty good job at predicting Alaskan races, but in this election there was a wide-scale failure that spanned house, senate, and presidential.

Hopefully our team has lawyers checking it out...

As for the more general question: They can be used fairly well. If the election results are way off the polls, it doesn't prove anything, but it really is a red flag.

• ##### Seems like Nate answered the first question(1+ / 0-)
Recommended by:
peraspera

I'm almost inclined to accept the issue is less fraud than circumstance.

• ##### Maybe(1+ / 0-)
Recommended by:
Steko

The main arguments I've seen were:

A) The Race For President was called before polls closed in Alaska

B) People lied about their support for Stevens

C)Democrats got complacent and didn't vote

Combined, all three of them seem fairly plausible. But I still hope a team of lawyers is checking it out.

• ##### It can't hurt.(0+ / 0-)

It just struck me because he had Begich at 98-100% for awhile and it was the only call he made wrong.

The run off is really a wash here because he wasn't testing that hypothesis.

The other thing I think is cool about this is the opportunities you can scope out on the fly.  For example, WV started trending towards Obama later in October.  I think that heralded an opportunity ripe for the picking.   I think they could have won that state with a frontal assault that played to the mine workers.  But I understand why they wanted to use all resources for VA.  In essence, I think they sacrificed an opportunity to break into Appalachia, but that is a small point against the big picture.

• ##### Once again, Maybe(0+ / 0-)

The Georgia thing: Nate did make a statement that if thinks a Chamblis victory was the most likely outcome(He assigned probabilities 50/40/10 for Chambliss/Runoff/Martin )

He never formally tested it, but he put it in his final senate sheet, so I'd say including it is fair.

As for West Virginia:

You might be right. I've actually just started working with someone on studying the effect of campaign spending on polls. (We're using the surprisingly detailed campaign financial statements). The guy I'm working with is quite a bit smarter than me, so I'm sure we'll come up with something good.

Fail that, I'm sure Nate will take a look at it too.

With that sort of model, we'll be able to look at What-Ifs more effectively.

• ##### my only nitpick(2+ / 0-)
Recommended by:
El Ochito, MemphisProfessor

when you show off errors , kurtosis, etc. you really only have 2 significant figures.  Clip the rest of the noise.

• ##### about that nitpick...(0+ / 0-)

... overall I would speculate that most of the differences between the sites' results are down in the noise. I am skeptical that either of these analyses could have been done very much better, given all the known unknowns and unknown unknowns.

(I apologize for repeating a war criminal's language, but that particular phrase is both memorable and, I think, relevant to the idea behind higher order moments such as kurtosis.)

Seriously, how close to the final results can one expect to get, given that opinion polls and voting results are gathered and tabulated under such different circumstances?

This year's race may have been more than typically tractable to the methods employed at both sites. Suppose, instead, one applied both systems to 2000? The details, presented in the same depth as in 2008, could be very interesting. Though I assume both systems were "back-tested" and to some extent "trained" on 2000 data, I would be surprised if the polling predictions from that unfortunate election turned out to be as accurate and credible as the ones we are discussing now.

"This document is totally non-redactable and non-segregable and cannot even be meaningfully described." *

[ Parent ]

• ##### Good Point(0+ / 0-)

I'll dispute the assertion that it's all down to noise, though a lot of it is. Both of us made mistakes(Nate's over-reliance on regression in the Presidential race and my decision not to account for house effects in the Senate), and both of us can learn from them moving forward.

But as to your more general point: To some extent, yes. We can only be as accurate as the polls given to us(See Alaska).

On a more technical note, I never "backtested" my model due to time constraints(I'm a full time student), though I do believe that Nate has done some extensive training on past elections.

But to see how well mathematical models have done in past elections, see here.

• ##### and how did a naive average of the polls, like(0+ / 0-)

RCP do? Pollster.com?

I think that 538 and such projections were close because the polls were close - not because there was some wonderful formula Nate was using.  All Nate did was average the polls with some weightings for pollsters and trends (the trend adjustment was probably his most meaningful contribution).  His regression average was pretty weak.  It was a good stop-gap for when there were few polls in some states, but with plenty of polls in every state it became unnecessary.  There is no reason to pay attention to a regression that says Indiana +5 when all the polls show a dead heat.

• ##### Fair Point(1+ / 0-)
Recommended by:
El Ochito

I think Nate's extensive use of regression was a bit ill conceived. I might use it in the beginning as a prior(With wide margins of error), but he gave it far too much weight.

I'm also deeply skeptical of his "trend adjustment" system. It seems to be the reason why his presidential predictions were so bad.

Still, to see how RCP and Pollster did, see here.

RCP didn't make calls for swing states, while Pollster did. If you look at two-way vote-share(Which the other author did not), FiveThirtyEight slightly edges Pollster out, while mine slightly edges FiveThirtyEight.

Once you look at Kurtosis though, you see that Pollster does a better job than FiveThirtyEight at keeping Kurtosis down, though not as well as my method.

This shouldn't be too surprising. Local regression does a great job, and should produce essentially the same results as the Bayesian Filtering that I use.

But the advantage to Bayesian filtering is that unlike local regression, you have a parametric framework by which to answer questions like "What is the probability that Chambliss will get more than 50% of the vote in Georgia", while with local regression, you don't have such flexibility.

• ##### Data on 538's predictions seems to be wrong(0+ / 0-)

The link you provide shows his final prediction
at 349 - 189, not 341-197 as you have listed above.

• ##### Typo on my part(0+ / 0-)

Sorry! This is why I always post sources...

• ##### I also don't understand(0+ / 0-)

why you say that 538 got Indiana so far wrong.

As far as I can tell, he predicted a 48.4% - 50.0%
split ('Projection' line of 'Poll Detail' listing for
Indiana in the right column)

This is way, way off?

• ##### Good Question(1+ / 0-)
Recommended by:
hazzcon

His regression estimate for Indiana was quite off. McCain+4.8 , this is what I was referring to....

• ##### This is confusing, David(1+ / 0-)
Recommended by:
NRG Guy
1.  Nate's predicted EV was 353 for fixed prediction (Mccain vs. Obama in each state) and 349 for probabalistic (which allowed a given state to be on both McC and Obama side).
1.  I find it hard to see in your charts which states you got wrong or right. Nate missed Indiana by a small amount. He got 49 of 50 states right.  How many did you get right?
1.  Using the "538 regression" estimate is wrong. You have to use his "projection."  The "regression estimate" is just an intermediate value in his projection.  To say that Nate's "regression estimate" of Indiana was far off makes no sense; that's not his projection -- not the one that he used to predict the electoral vote outcome.

Seems to me that you've cooked your comparison here to favor your own model.

"Getting elected is the only true moral imperative that politicians believe in." -- Anon

• ##### Great Questions(0+ / 0-)

I'm sorry if my writing is confusing. I'm a math major, so writing isn't really one of my strong-points. Please allow me to clarify.

1. He mentioned several numbers, but I used the one that he used as the title of his post. Before I used 353, and somebody sent me an email complaining.

But, the potential confusion is why I linked to both of our website's prediction pages.

1. We both predicted the exact same states. I didn't want to clutter the page, and I thought the information was conveyed by clicking through to our respective prediction pages.
1. I did not use the "538 regression" estimate when computing these figures, I'm not sure why you would think I did. I used his projection numbers, as you say I should have.

The point I was trying to make, but that I guess I did not convey well, was that Nate's extensive use of regression was the reason why my model was more accurate than his in the presidential race.

As for your last point, that I cooked my comparison to make my model look favorable:

I've tried my best to be impartial, but that's obviously impossible. That is why I've been as transparent as possible in supplying the relevant raw data.

Hope that clears things up,

David

• ##### In your post just above mine(0+ / 0-)

you refer to Nate's regression estimate being wrong for Indiana.  My point is that that is irrelevant to any assessment of the accuracy of his projection.

"Getting elected is the only true moral imperative that politicians believe in." -- Anon

[ Parent ]

• ##### You'll have to explain that one(0+ / 0-)

While it is irrelevant to any assessment of the accuracy of his projection(simply calculating the mean error does that pretty well), it is quite valid as a critique to his methodology.

He calculated his projection estimate by calculating a weighted average of several "polls"(one of which included the regression estimate), and then assuming some mean regression.

I am making the argument that the reason his presidential forecasts displayed such high Kurtosis, was because he gave the regression estimates too high a weight.

We could discuss the assertion if you wish, but to declare that discussion of methodology is irrelevant seems strange.

• ##### Some oversights in your analysis(By a 538 junkie)(0+ / 0-)

I obviously have a bias, and am willing to spit it out, just like you have a bias towards your site and are open about it.

First, to say that you did substantially better in predicting the electoral college is off. His model projected this as being something like the 3rd most likely possibility. The number you use to compare is an average among many possible outcomes, so there's no reason it would be expected to be exact.

Also, you said that he screwed up in the senate polling of Minnesota, whereas his final prediction was Franken +.2, which is really really close. (Though even in his analysis he had a tendency to write off barkley, assuming his support would largely collapse, which didn't happen because of the dynamics in the race. This is really a moot point, because all that really mattered was how Franken did relative to Coleman)

• ##### Speaking as a 538 junkie too(0+ / 0-)
1. I realize that using explicit electoral vote comparisons would be unfair. This is why I said "But because of the winner-take-all system, this is not a good way to gauge accuracy. ".
1. I screwed up Minnesota as well, even though I was only off by a bit too.

This is precisely why I went ahead and gave you data about the margins, and actually compiled a spreadsheet of our performance for easy viewing.

But you raise fair points. It's very difficult to be fair with these things, and that's why I've tried to provide as much data as possible.

• ##### And a reply(0+ / 0-)

Cheers to you. I think on a whole you were fairly objective. I certainly had a conformation bias looking at your comparison. I, like many, have canonized Nate as my own personal patron saint of electoral politics. If there's one thing he did do better on though, I think it's got to be the publicity. Until reading this I had never heard of your election analysis site.

• ##### interesting analysis, but..(0+ / 0-)

mean absolute errors:  2.544495663  versus 2.249522988

I can't believe that a statistician/math major would quote so many (in)significant digits!

You know, some of the newest calculators have displays which show many more digits - perhaps getting one of these new devices would make your comparison look even more impressive (50 digits?  WOW!!!)  /snark

People do not need religion. They need effective coping mechanisms to deal with existential anxieties. Patricia Guzikowsk

• ##### Sorry About That, fixed(0+ / 0-)

I loaded this directly as an image off of excel, it's a bit of a pain to change.

But since you're the second person to complain...

• ##### MAE and Kurtosis?(0+ / 0-)

Although the MAE is certainly one decent way to compare the accuracy of two predictions, kurtosis is far down on my list.  If two methods have the same MAE but one had higher Kurtosis it isn't clear which one is "best" since it may be a matter of personal preference whether you'd like to be very close very often and occasionally further off, or a little more off most of the time but never very bad.

I would rather see an indicator like median absolute error or, even better, something like percentage of states called within 0.5%, within 1%, etc... These comparisons have a more direct meaning to anyone reading it -- certainly more useful than kurtosis.

Also, the final figures for popular vote as well as MN and AK senate may change by more than people expect so it's too early to call those...

Other than that -- interesting stuff and I'll definitely have to check out your site...

(btw, who uses Excel for statistical analysis????)

• ##### here's some of those stats(0+ / 0-)

President:

Within 1%  538: 21 states, SD: 22 states

Within 2%: 538: 38 states  SD: 35 states

So, if your goal is to be within 2% on as many states as possible, then 538 holds a 3 state edge.  If the goal is to be within 1%, then SD has a 1 state edge.

The kurtosis for 538 is mostly due to the Washington DC prediction -- where the winner was never in doubt.

Indeed, it would make the most sense to compare the predictions only for states where the margin was within perhaps 5% since the most extreme states were thinly polled anyway and no one cares much about their final margin.  Looking at the 15 states that ended up within 5%, the MAE was 0.86% for 538 and 1.08% for SD.  I'd say this comparison is perhaps the most on-point...

• ##### Fair Point(0+ / 0-)

There are a variety of different metrics, many of which show FiveThirtyEight edging my site out.

This is the principal reason I made the raw data available...