Skip to main content

Dear fellow Kos readers,

I am undertaking a meta-analysis of state polling data to calculate a current snapshot of the probable range of election outcomes. Like most of you, I have a strong bias about how I want the presidential election to turn out. However, I wanted a measure that did not have that bias. Read my preliminary findings here.

The upshot: if the election reflected recent state polls the probability of a Kerry win would be 98%. With 95% confidence I predict between 270 and 322 electoral votes.

The details are mathematical, which is what it takes to analyze many polls at once. The short version is that I calculated the probability of every outcome, and from this calculated summary statistics. I have published original papers on the use of probability and statistics, so I think the results are at least worth looking at.

However, some disclaimers: First, this analysis is not peer-reviewed. True meta-analysis involves sifting through a lot of methods. I didn't do this. Instead, I simply counted all polls equally, counting on the owner of RealClearPolitics to be fair about reporting data. This could clearly be refined. Second, this analysis is heavily dependent on just a few states. My analysis currently indicates that to win the election, Kerry must win Florida or Ohio. Third, because it is a snapshot, the high probability should not be a cause for complacency! Your comments and feedback are welcome.

I will try to update this information about once a week. If traffic is heavy then I will transfer it to a URL that I will list in my signature. If the probability for a Kerry win rails at 100 percent (it's nearly there now) then I may start handicapping other things like control of the Senate. Unfortunately, those data are much more scanty.

In summary, this analysis can potentially save you the trouble of fretting over any particular poll. However, if you are like me, you are addicted to reading each one as it comes out!

Originally posted to mindgeek on Mon Jul 19, 2004 at 02:37 PM PDT.

Your Email has been sent.
You must add at least one tag to this diary before publishing it.

Add keywords that describe this diary. Separate multiple keywords with commas.
Tagging tips - Search For Tags - Browse For Tags


More Tagging tips:

A tag is a way to search for this diary. If someone is searching for "Barack Obama," is this a diary they'd be trying to find?

Use a person's full name, without any title. Senator Obama may become President Obama, and Michelle Obama might run for office.

If your diary covers an election or elected official, use election tags, which are generally the state abbreviation followed by the office. CA-01 is the first district House seat. CA-Sen covers both senate races. NY-GOV covers the New York governor's race.

Tags do not compound: that is, "education reform" is a completely different tag from "education". A tag like "reform" alone is probably not meaningful.

Consider if one or more of these tags fits your diary: Civil Rights, Community, Congress, Culture, Economy, Education, Elections, Energy, Environment, Health Care, International, Labor, Law, Media, Meta, National Security, Science, Transportation, or White House. If your diary is specific to a state, consider adding the state (California, Texas, etc). Keep in mind, though, that there are many wonderful and important diaries that don't fit in any of these tags. Don't worry if yours doesn't.

You can add a private note to this diary when hotlisting it:
Are you sure you want to remove this diary from your hotlist?
Are you sure you want to remove your recommendation? You can only recommend a diary once, so you will not be able to re-recommend it afterwards.
Rescue this diary, and add a note:
Are you sure you want to remove this diary from Rescue?
Choose where to republish this diary. The diary will be added to the queue for that group. Publish it from the queue to make it appear.

You must be a member of a group to use this feature.

Add a quick update to your diary without changing the diary itself:
Are you sure you want to remove this diary?
(The diary will be removed from the site and returned to your drafts for further editing.)
(The diary will be removed.)
Are you sure you want to save these changes to the published diary?

Comment Preferences

  •  I like it... (none)
    Of course, if RealClearPolitics has a bias then you have a bias.

    Have you looked at Paul Brace's work on consistent bias in polling organizations?

    •  RealClearPolitics is sufficiently unfiltered (none)
      You are correct. However, I have not detected major reporting biases on that site. Every poll I read about (for instance on DailyKos) appears there quite rapidly. I have also done basic things like drop crazy outliers such as Fox News and Badger. Dropping these does not affect the calculated outcome. Dropping another controversial poll, Zogby Interactive, also makes little difference.

      The underlying principle is that there are so many polls that it is very hard to game the calculation. This is the same reason that people are interested in meta-analysis.

      I have read a little about biases in polling organizations, but sorry, not the work of Brace. For now I plan to continue calculating in this blind style, mainly because I am concerned about my own biases.

      •  RCP (none)
        is definitely a conservative site...filter through their daily links.
      •  Lots of us shoot the breeze . . . (none)
        with our qualitative impressions of the trends we see in these same polls, writing endless commentaries and analyses based on the sniffles in our nose.  You have provided a mathematical way of verifying our shakey, seat-of-the-pants impressions, and I am most grateful to you for your efforts.  (Don't pay attention to any peer review type comments you get in Kos; this is all about energizing up the masses.)
        •  Quantifying intuition (none)
          Yes, I agree that all I have done is put a number on the sense that we get from reading polls. In general, that's the purpose of any well-crafted statistic! Thanks for making the connection.

          In regard to peer-review type comments: now that you mention it, there is a resemblance. Some of the nastier comments I've seen on blogs do resemble things I have seen in anonymous reviews. I agree that it is better for one's mental health to ignore that sort of thing! Of course, in peer review I do not have that luxury...

          •  I have sat on those panels for NIH (none)
            Those things are two days of relentless ratcheting up the ante of "gotcha" on work that is oftentimes first class out of the gate.  Your input here is first class.  I hope you will continue to do this.
  •  Impressive Work (none)
    Very professional, very rigorous (in my layman's opinion). I look forward to seeing your analyses.
  •  My opinion is (none)
    that polls are (rightly) biased to account for recent voter behavior. This would make for a GOP bias in most polls. The left has been 'stay at home' and lethargic. I do not get the sense that this is the case this year.

    I don't claim to have the mathmatical models to justify this, but my gut hunch is that polls undervalue Kerry by 2~3 points.

    Of course, if the left goes complacent and stays at home, the polls probably overvalue Kerry by a point.

    I guess my point here is that the polling folk are aware of the wrath of the left, but recognize we don't have a history of delivering. Let's shock the shit of them.

    •  registered vs likely (none)
      As far as polls are concerned, the 800 pound gorilla in the room is that it is very difficult to know who the likely voters are right now.  The errors from not having a good model of who will show up are probably greater than the polls' MOE.  I am optimistic that we will have the skew in our favor since many GOP-leaners will be discouraged to vote thanks to their party screwing up the country.
      •  Margin of error vs. true error (none)
        Mathematically, your statement has to be true since the calculated margin of error is based on random sampling error alone.

        One measure of inter-poll variation, the standard error of the mean in data taken over the last 1-7 weeks, is about twice as large as expected from margin-of-error alone. This could be because of differences in voter models, as you state, or because of variation over time (for which we have strong evidence from national polls). I haven't worked on distinguishing between these two possibilities, but I am sure it is possible.

  •  your analysis appears consistent (none)
    with several other electoral vote projections, some of which are from Republic leaning sites, and all but the last one have Kerry currently at 322 or 327:

    From another country under U.S. military occupation ... FREE HAWAII!

    by scottmaui on Mon Jul 19, 2004 at 03:05:49 PM PDT

    •  Differences with other projections (none)
      The principal difference at present is that those sites predict Kerry wins in Missouri and Ohio. I find their Missouri prediction baffling, but note that my calculation sacrifices timeliness in order to average across more polls. I have the current probability of Kerry winning Ohio as 26%. This number is highly volatile at present, as everyone knows.
      •  MO / OH (none)
        I agree with you about Missouri -- I don't see where there is any movement at all towards Kerry there.

        Ohio is a more confused issue.  A few weeks ago it looked as if it were budging Kerry's way, but that stopped pretty quickly.  On the other hand, it doesn't seem to want to firm up as a Bush state either.  I've got it listed as a toss-up in my own projection.

    •  Survey of e.c. projections (none)
      For those interested, I've been periodically surveying various sites with electoral college projections (including the ones you listed here).  I posted the latest survey (18 sites plus my own projection) just a few hours ago -- it's available here.
  •  I'll bet $20 (none)
    I'll bet that if you did the same analysis using polls available 4 days before the last election, you'd have Gore as likely to win.  I'll say at least 80%.

    I think Gore had as much a lead in Florida then as he does now.  The stronger Nadar poll numbers might have made a few other states close enough to lessen the overall likelihood a bit.

    •  Pre-election poll (none)
      On the eve of the last election, Ryan Lizza at The New Republic had just such a calculation. At the time the prediction was that the outcome of the general election hinged on who won Florida. It was too close to call.

      For this reason I place a lot of faith in state polling data!

      By the way, I am guessing that this is why Al and Tipper Gore were at a "Florida Victory" party (for those of you who have seen Fahrenheit 9/11). They knew where it would be decided.

      •  2000 (none)
        I was using the TNR election predicitons as a guide in 2000.  After PA, MI, and FL were called for Gore I was out of my seat.  Fox news sat me back down...  

        All in all though they nailed every state (depending on how you feel about FL)

  •  Kerry can win without FL or OH (none)
    For example, he'd win with all the official Gore 2000 states plus one of the following:
    1. Virginia
    2. North Carolina (he could then lose NM)
    3. Missouri and NH
    4. Arkansas and NH
    5. Nevada and West Virginia
    6. Arizona
    •  asdf (none)
      Just realized that just Missouri is enough.
    •  Permutations (none)
      You are right, but your scenario is just one of thousands of possible swing-state outcomes. The calculation I have posted considers all possible outcomes in 17 swing states (131,072 possibilities) and calculates the probability of each one.

      That having been said, several of your states are not considered swing states. That would be a hard calculation: 50 states makes for 1.12 quadrillion (1,125,899,906,842,624) possibilities. Since the current likelihood of a Virginia or North Carolina win is near zero, these do not factor significantly into an unbiased calculation.

    •  you hit he nail on the head (none)
      not that i'm gonna tell JK/JE what to do with their campaign (i think they are doing great so far!), but they should really focus on 2 of the above combinations, and fight fight fight for them.

      I think NH should go blue.

      Get BIG DOGG in Arkansas for all of sept - nov. 2

      Byrd said Kerry could win WV if he "get's some coal on his face"

      and fight fight fight for Ohio and Florida

  •  nifty analysis (none)
    My stats skills are much rustier than yours, so I'm not in a position to comment on the methodology, but it looks decent.

    Although if I'm not mistaken, it looks like you're pulling down data manually? Hopefully that's only temporary. I'm sure that makes it tedious to update, and puts you at a risk of data-entry errors besides.

    It wouldn't be too difficult for someone with a good working knowledge of perl to automate the whole process: retrieving poll data from RCP, running it through your MATLAB script, and generating HTML. You might consider siccing a grad student on that. :-)

  •  Pretty cool stuff.... (none)
    I ran thru Real Politcs' poll summaries on  July 11th and ended up with Kerry @ 264 EVs
    and Bush with 209 but I thought OH, VA, FL and NV were too close to call.

    You gave them all to Kerry except VA.  I still think OH is too close but NV and FL are looking good.  We could still take VA if Edwards helps us tighten up that (surprisingly close) 5 point gap.  

  •  The only concern I have (none)
    about this approach is that the biases are likely to be highly correlated. Your meta-analysis assumes independence of the errors in each state - yet if in the event something depresses Democratic turnout, say, that will affect all states. This does not affect our maximum likelihood estimate, but it does increase the dispersion of likely results. IOW, I think your confidence interval is likely too narrow; not that we can quantify that.
    •  Beat me to it (none)
      Yes, an adjustment for correlated errors will take a lot of the edge off that 98%.
      •  Polling biases (none)
        This is a reasonable concern. In the MATLAB script is a variable called "bias" that allowed me to play with this.

        For instance, pushing every poll toward Bush by one point reduces the probability of a Kerry win to 86%.

        On the other hand, the posted calculation is for the last six polls. When I use only the last three polls, the probabilities are 100% (no bias), 99.7% (1-point bias), 96% (2-point bias). Still pretty good!

        Maybe I will start posting the probability assuming some constant bias.

  •  Incidentally, CNN (BS*) reviewed state polls ... (none)
    ... and found NO STATE in which Bush is doing better now than he was in 2000.

    *BS == Bill Schneider

  •  Garbage In, Garbage Out (none)
    Before I get frank, let me first say that mindgeek's effort is no worse than many similar efforts on the web to track state polls.  That said...

    There are a some details about mindgeek's effort that are problematic.  Zogby Interactive data is meaningless, and mindgeek's methodology relies heavily on this.  And relying on the last 6 RCP polls of any one state means using some very out-of-date numbers.

    And we get some very odd results in his analysis like putting a state like Iowa into the "Certain Kerry" column despite a very minimal lead in recent polling.  It seems he comes to this conclusion because there are no outlier polls, and if so, that's just faulty methodology.

    But these are quibbles.  The real problem lies in not understanding the uses of state polling this far out from an election.

    The national horserace number is going to be quite fluid over the next 3 1/2 months.  As the horserace number moves, so will the state polls move.  The true use of the state numbers is to see where an individual state lies in relation to the national numbers.

    To take the Iowa example again, there is a decent (but nowhere near overwhelming) probability that Kerry had a tie to slight lead in Iowa last month.  Over the same period, Kerry was essentially tied in the national horserace.  So we can get a general sense that Kerry is running even to very slightly above his national number in Iowa.

    If Bush emerges from his convention with a 5% national lead, we can guess that Bush will lead in Iowa by about 3% - 5%.

    Mindgeek's effort is a nice graphical display of the limited and flawed RCP data.  However, he seems to misunderstand the meaning of that data, and thus his analytical conclusions suffer.

    •  Mindgeek (none)
      says in one of the follow-ups that he took Zogby out of a subsequent analysis and got just about the same results.  He did the same procedure with Fox and Badger and still got more or less the same results.  
      •  Re: Mindgeek (none)
        "Mindgeek says in one of the follow-ups that he took Zogby out of a subsequent analysis and got just about the same results."

        Like I said, the problem with Zogby Interactive is just a quibble.  If he took ZI out of his sample, it wouldn't change my general problems with his analytical conclusions.

        The real problems are:

        The limited data set.  To return to Iowa as a test case, once you take Zogby Interactive out of the mix, you're left with two polls.  Both show a Kerry lead within the MOE.  To me, that means it's more likely that Kerry was leading in Iowa a few weeks ago than Bush was leading, but I don't see how you can get from there to "Certain Kerry".

        The limits of relying on state polls this far out.  To repeat myself, the overall tide - the national horserace number - is going to be fluid from here on out.  The state polls are going to move more or less in concert with that national number.  The true use of the state numbers is for getting a sense of how individual states lie in relation to that national number.  If Bush wins the popular vote by 2%, he's going to win the electoral college quite easily.


        And don't get me wrong.  I'm very optimistic about this election.  But I'm getting that optimism from very different numbers than mindgeek is using.

      •  Omitting Zogby polls (none)
        If I omit all Zogby Interactive polls and use just the last two polls for every state (this simplifies the code-writing), the current snapshot is a 98% Kerry win, and the 95% confidence interval is 271-302 EV.

        Note that this required adding one old poll for West Virginia and relying on old data for Nevada. Neither assumption is desirable, but note that assuming Bush wins both states only reduces the probability of a Kerry win to 95%.

  •  mindgeek - (none)
    I've been keeping track of state polls myself in Excel. Your post inspired me to see what I could do in a similar vein in about 20 minutes. I took my state average margins, assumed a common std error (incorrect, but quick and dirty) and calculated the state probabilities that way. Then did 600 rep. Monte Carlo simulation in Excel (instead of calculating all permutations of swing states as you did) on all 50 states, added the EV and sorted the results. Here it is:

    %ile    Kerry EV

    1. %:    274
    2. %:    282
    3. %:    291
    4. %:    302
    5. %:    316
    6. %:    322
    7. %:    327
    My method gives Bush a 3.7% chance vs. your 2%. I like your answer better! Still, pretty darn close to what you got, and just took a few minutes.
    •  Monte Carlo simulation (none)
      That's interesting. Monte Carlo simulation would give nearly as good a picture as exhaustively going through the permutations. What you did is analogous to surveying part of the population, and therefore would have its own margin of error (in your case 0.8%, I think).

      The exercise made me feel a little more confident about the election, and has allowed me to start focusing on the Senate. Of course, a three-point move in national sentiment towards Bush will have me up in arms all over again...

  •  I wish you'd show us the details (none)
    At any rate, from the MoE and probability, it ought to be possible to figure out exact probabilities for certain ranges.

    I pledge resistance to the President, and to the Republicans, and that for which they stand: One nation, under attack, with liberty and tax cuts for some.

    by JimTXDem on Mon Jul 19, 2004 at 05:26:29 PM PDT

    •  All details are Web-posted (none)
      There is a full verbal description at the link. At the bottom of the page is a link to the original MATLAB code. You might find this a little hard to read, but the comments may help.

      Regarding MoE, this gives a smaller error, probably because of variation among polling methods. I calculated SEM from reported data in order to take a more cautious approach.

      •  Non-Garbage Data? (none)
        In reference to the "garbage in, garbage out" concerns that Petey expressed, has he or anyone else provided you with any data source they believe to be more credible, useful, accurate, etc. than the data relied upon in your analysis?
        •  Re: Non-Garbage Data? (none)
          "In reference to the "garbage in, garbage out" concerns that Petey expressed, has he or anyone else provided you with any data source they believe to be more credible, useful, accurate, etc. than the data relied upon in your analysis?"

          The best source for public state polling data is a paid service at

          I believe they are currently charging $99/year.

          However, even relying on this better source for public state polls wouldn't eliminate some of my other criticisms for how mindgeek is interpreting his results.

        •  Data sources (none)
          More data would always be of interest. But the existing data are already fairly extensive. To my knowledge I am unaware of any other comprehensive analysis of state polls. The Kerry and Bush campaigns probably employ consultants to do what I have done, with almost certainly the same conclusions.

          The critic in question seems to be advocating a more complex analysis involving the use of both national and state polls. This is possible but would mainly add noise without altering the basic result.

          In any event, my motivation for doing the analysis is to come up with practical recommendations for the optimal use of money and time, both nationally and in the NJ/PA area. I now make these recommendations on the site.

          Additional analysis would require both time and quantitative skills. Since I am a bit short on time and the critic has not revealed strong quantitative skills, I think my analysis will have to do for now.

          •  Re: Data sources (none)
            "The critic in question seems to be advocating a more complex analysis involving the use of both national and state polls."

            To some extent this is true.  But I'm also saying that you are drawing overly broad conclusions from limited data.

            To return once again to the Iowa example:

            I look at the two non-ZI polls showing Kerry up by less than the MOE, and I see a snapshot of a swing state barely leaning Kerry.

            You see no polls showing Bush ahead, and you see a snapshot of a 'Certain Kerry' state.

            Maybe we're arguing semantics, maybe not.

            "The Kerry and Bush campaigns probably employ consultants to do what I have done, with almost certainly the same conclusions."

            See, this is where I really disagree with you.  If you watch where the campaigns are spending their money, they have some very different ideas of which states are battlegrounds than you do.

        •  Comparing state polls with each other (none)
          Another way around the problem of state vs. national polls is to do something more limited, namely compare state polls with each other. This would allow states to be arranged in a rank order, ranging from most Democratic-leaning to most Republican-leaning. Although this order might change a bit over time, it would give a rough rule-of-thumb about which states present the juiciest targets.

          Currently, the rank order among big swing states is

          D <-- MN - MI - FL - PA - WI - OH - MO --> R

          Since the current probabilities of a Kerry win in three prize states are FL 98%, PA 89% and OH 26%, this suggests that PA and OH are the best targets for activism such as voter registration and get-out-the-vote. This may explain why America Coming Together has been so attentive to Ohio.

          •  Re: Comparing state polls with each other (none)
            "Another way around the problem of state vs. national polls is to do something more limited, namely compare state polls with each other."

            No argument from me on this one.  Good idea.

            And one polling note from your site:

            "Bush approval rating (<48 means he is toast, >52 means we are toast; in between is uncharted territory)"

            I think Bush is actually in a weaker position than his approval number would show.  I think the crucial number this year is his "Re-elect" number, which has been stuck around 43%.  There is an unusually large segment of the electorate that gives Bush a positive approval rating, but is going to vote for Kerry.

  •  Nice (none)
    More good knowledge to know.

    Faux news needs faux wood blinds to shade the failures of the Bush presidency.

    by dopies on Sat Sep 04, 2004 at 02:19:17 AM PDT

Subscribe or Donate to support Daily Kos.

Click here for the mobile view of the site