Skip to main content

So in early 2011 I was pretty excited to notice that the Daily Kos/SEIU/PPP poll had changed their racial demographic question to include Asian as a category. This meant that eventually, after aggregating the data over a year or so, we would be able to see the opinions of a fast-growing and diverse segment of the population that nonetheless is too small to generate decent data in any individual poll. And in January of this year I set about aggregating that data for the 965 Asian respondents in 2011.

But then I noticed something odd.

Only 35% of Asians in the Daily Kos poll said they lived in the West. But the census shows that 55% of Asian registered voters live in the West. Meanwhile, the Daily Kos poll showed 17% of Asians saying they lived in the Midwest, compared to 8% in the census.

Uh oh.

Now, yesterday we saw that about 5-9% of respondents enter the wrong geographic location for various reasons, depending on region, so instead of categorizing location based on what respondents said, I based it on area code using the same regional definitions for the census and polling data... and found 37% living in the West and 15% living in the Midwest.

Perhaps it was just too few respondents? But now, in September 2012, with many more respondents, the numbers are... 37% living in the West, 16% in the Midwest (by area code; all further geographic numbers in this post are based on area code).

Meanwhile, among Hispanic registered voters, the poll shows 30% living in the West and the Census says 41%. The poll shows 17% living in the Midwest and the census says 8%. (American Indian numbers are also messed up but that will be a whole separate post. Eventually.)

What is going on?

We already know that around 5-9% of the poll respondents (after weighting by age) enter the incorrect geographic region, only a small minority of which can be attributed to accident. What if a similar amount enter the wrong racial code?

Looking back at the geographic data, there is a pattern to what number is selected for wrong answers. Averaged out, about 4% chose option 1, 3% 2, 2% 3, and 1% 4. This actually seems like what you might get if people were 'randomly' choosing a button (we're not so random as we think - assuming something akin toballot order effects occurs during a phone poll).

The Daily Kos poll starts with about 1500 completed calls, around 1200 of which claim to be white, and then around 500 mainly older, white respondents are removed at random to produce the raw polling data which is then weighted by age. If we ignore for the moment the weighting by age, and assume a similar error rate for the race question as the geography question, that would mean about 80 respondents in each poll are labeled as minorities but are actually white - about a third of the minorities. You may notice a problem here.

We can do a simplistic little simulation by region now, assuming equal response rates among different racial minority groups and different regions (not necessarily valid assumptions!) And what we get is... 43% of Asians in the West, 14% in the Midwest; and 30% of Hispanics in the West, 18% in the Midwest.  That's damn close to what we see in the polling data.

Here's graphs comparing the census data, the polling data, and the simulation data:

Geographic distribution of Hispanic voters in Daily Kos poll is incorrect, but can be approximated if we assume 9% of respondents enter incorrect race.
Geographic distribution of Hispanic voters in Daily Kos poll is incorrect, but can be approximated if we assume 9% of respondents enter incorrect race.
(The geographic distributions of white and African-American voters is about the same for the census, the poll, and the simulation, so I didn't bother to show it. For the record though, the polling results match the census slightly better than the simulation does for African-Americans.)

Remember, I made some big assumptions and ignored weighting by age, but even so we can conclude that the strange geographic distribution in the poll is consistent with about 9% of respondents pressing buttons 'randomly' on the race question in the same pattern as they do for the geography question. (Of course, other explanations are still possible too.)

This would imply that a large proportion of minorities in PPP polling, of substantial but uncertain magnitude, are indeed people who are just messing with the poll or pressed the wrong button by mistake. I would assume that a similar phenomenon would be seen for all automated polling firms, and perhaps for some live interviewers as well.

What proportion, exactly? The simulation gives me numbers, but with a fair amount of uncertainty because I'm not too sure about the proportion of people incorrectly pressing each number on the race question. But I can also calculate a proportion by using what I think are 'true' polling numbers (from sources listed below that I believe have good track records and/or good methods such as multilingual polling) and comparing them to what we observe in the aggregate 2012 DailyKos polling.

Despite all the different sources of error for these estimates, both methods give numbers in the same ballpark: somewhere around half of respondents choosing Hispanic and Asian, and (only!) about a fifth of those choosing African-American, are not the race indicated by their choice.

Now we can say with confidence that two independent methods of estimating the percent 'fake' minorities in the Daily Kos poll come up with the answer of 'lots' - which while not a very specific number is good enough for us to know we shouldn't trust absolute values of the crosstabs for minority demographics in PPP polls and most likely all automated polls. And, we have one hypothesis consistent with the data for why there are lots of incorrect responses to the race question: people are doing about the same thing they did with the geography question.

I think this would be a good time to remind the reader that PPP and other (legitimate) automated pollsters still get their final numbers pretty close, despite this. Also, this analysis is only possible because of PPP's transparency in releasing all their raw data.

But for all those people who have said to beware the crosstabs... you're absolutely right.

Data Sources:

Pew. Averaged over the summer, they have it at about 67-25 for Obama among Hispanic voters and 94-3 Obama among Black voters. They also have a lot of interesting data for different demographic groups in their Religion & Public Life series.

Latino Decisions. Latino Decisions is releasing a poll of 300 respondents every week. The error is high, but the numbers are clear: over the past four weeks, they've had it at about 66% Obama, 28% Romney.  Latino Decisions has also released some state polls.

Fox News Latino. In March they had Obama up 70-14 over Romney; this week they have it at 60-30.

USA Today/Gallup Latino Poll From the spring, Obama favored 72/19 among immigrants and 63/28 among US born.

Asian American Justice Center and APIAVote. A poll last spring - the first ever national poll of Asian-Americans - showed Obama ahead of Romney 59-13.

BET Poll. Frustratingly, I can't seem to find the full release of this poll anywhere, but it does say in one of the press releases that only 2% of African-Americans in battleground states support Romney.

___________
Beyond the Margin of Error is a series exploring problems in polling other than random error, which is the only type of error the margin of error deals with.

Previously:
Why Don't People Know Where They Live in the DKos Poll? A small number of respondents - around 5-9% -  press the wrong button when answering the geography question on the Daily Kos poll. This is far greater than than can be explained by observed rates of misunderstandings or data entry errors.
Why State Polls Look More Favorable For Obama than National Polls. In the spring and summer, lack of support in Blue States was bringing down Obama's performance in national polls, while Swing States and Red States were polling about the same as 2008.
Presidential Polls Are Almost Always Right, Even When They're Wrong.  How the presidential polls in red and blue states are off, sometimes way off, and how to predict how far off they'll be.
When Polls Fail, or Why Elizabeth Warren Will Dash GOP Hopes. Why polls for close races for Governor and Senate are sometimes way off, and how to predict how far off they will be.

EMAIL TO A FRIEND X
Your Email has been sent.
You must add at least one tag to this diary before publishing it.

Add keywords that describe this diary. Separate multiple keywords with commas.
Tagging tips - Search For Tags - Browse For Tags

?

More Tagging tips:

A tag is a way to search for this diary. If someone is searching for "Barack Obama," is this a diary they'd be trying to find?

Use a person's full name, without any title. Senator Obama may become President Obama, and Michelle Obama might run for office.

If your diary covers an election or elected official, use election tags, which are generally the state abbreviation followed by the office. CA-01 is the first district House seat. CA-Sen covers both senate races. NY-GOV covers the New York governor's race.

Tags do not compound: that is, "education reform" is a completely different tag from "education". A tag like "reform" alone is probably not meaningful.

Consider if one or more of these tags fits your diary: Civil Rights, Community, Congress, Culture, Economy, Education, Elections, Energy, Environment, Health Care, International, Labor, Law, Media, Meta, National Security, Science, Transportation, or White House. If your diary is specific to a state, consider adding the state (California, Texas, etc). Keep in mind, though, that there are many wonderful and important diaries that don't fit in any of these tags. Don't worry if yours doesn't.

You can add a private note to this diary when hotlisting it:
Are you sure you want to remove this diary from your hotlist?
Are you sure you want to remove your recommendation? You can only recommend a diary once, so you will not be able to re-recommend it afterwards.
Rescue this diary, and add a note:
Are you sure you want to remove this diary from Rescue?
Choose where to republish this diary. The diary will be added to the queue for that group. Publish it from the queue to make it appear.

You must be a member of a group to use this feature.

Add a quick update to your diary without changing the diary itself:
Are you sure you want to remove this diary?
(The diary will be removed from the site and returned to your drafts for further editing.)
(The diary will be removed.)
Are you sure you want to save these changes to the published diary?

Comment Preferences

  •  More misrepresentation in one direction? (1+ / 0-)
    Recommended by:
    David Nir

    It looks to me from the above that more Romney supporters than Obama supporters are mis-indicating their race as a minority, is that correct? (Although the percentages are higher in terms of "true" support by non-whites of Obama - does that suggest the reverse is true for Obama supporters? I.e., more  Obama supporters mis-identify as white? This is of course assuming, which is a big assumption, that responders are not making a critical choice about these categories and their own backgrounds - among other things.)

    I know one can't know anything about cause from this data, but, to the extent I've understood correctly, it makes me wonder if there is a deliberate effort to represent Romney's base as less white than it actually is - maybe something they know to be an issue.  Cf. the attempt to identify nonwhites at the GOP by JJP earlier this year?

    We might be heading into the McCarthy era. But I hold out hope: after the 50s came the 60s, and that's when they invented sex.

    by marynyc on Thu Sep 20, 2012 at 09:44:40 AM PDT

    •  Er, (0+ / 0-)

      Not that I endorse JJP's attempt, necessarily - for the record, I think it's problematic to try and visually i-d someone's race, on a few grounds.

    •  Oh boy (1+ / 0-)
      Recommended by:
      David Nir

      That is a very funny link.  Thank you!

      On to the meat of your comment. I cannot tell from the data if more white Romney voters than white Obama voters are choosing the wrong race. For the simulation, I assumed all respondents had an equal chance of choosing the incorrect race.

      This assumption may not be correct, but there is no way to test it with the data available.

      Personally I would suspect that white Romney supporters are more likely than white Obama supporters to provide intentionally misleading answers to the race question, cause it's kind of a jerk thing to do and being a jerk is kind of the foundation of the Republican mindset: I've got mine, screw you. But it's not just my personal bias that leads to this supspicion.

      Why? In an informal study, of 6% of drivers who intentionally ran over a rubber animal, 89% of them were driving an SUV. (Ya gotta click through this link.)

      And, large SUVs were favored by 76% of Republicans compared to 5% of Democrats. Make of that what you will.

      •  What's the probability Romney supporters choosing (0+ / 0-)

        Native American as their 'origin' or ethnicity, did so because they firmly believe they indeed are native-born full citizens of the U.S. of A (versus, say, Kenyans or 'redskins' born on a reservation or illegitimate illegal immigrants)?  I'm be willing to bet that sort of view could explain a fair chunk of these.

        Might need to factor in what percent of the population may have some sort of disability like dyslexia which might lead to incorrectly linking words & meanings to spatially located check boxes or bubbles.  

        It might seem shocking but some conservatives may know identifying as a non-white race on a census form might mean more federal funding coming to their districts and they may feel that's just peachy to help that happen...it's as easy to take money from the 'liberal' socialist government as it is to take candy from a baby...and they'd justify it thinking we'd just be takin' back some of the real tax payers money and putting it to more righteous use for 'real' Americans.

        Consider as well that the labels 'White' and 'Caucasian' can be confusing.  How much White ancestry makes someone white?  If I am 'mixed' but can 'pass' for white walking down the street, isn't that sufficient?  Ask a less educated White person if they are Caucasian and you may get slapped in the face if not a look of confusion.  Ask an average White person who actually knows they are technically labeled 'Caucasian' why that name applies and you may get total bafflement.  Ask a college educated White where to find the Caucas mountain region on the globe or to name the continent where it is found and you'll probably get more blank looks and embarrassment...or hear random mumblings about Christopher Columbus, Portugal, Spain, England, Puerto Rico, Cuba, Florida...no wait, the Mayflower...Puritan Pilgrims, Plymouth or Chrysler Rock or GM rock...  the French...Lousiana..., Jamestown, Catholics Protestants... slaves...  turkey... kool aid... oatmeal... maybe it's Quaker Caucasians?

        When life gives you wingnuts, make wingnut butter!

        by antirove on Thu Sep 20, 2012 at 01:44:02 PM PDT

        [ Parent ]

        •  That's a possibility. (2+ / 0-)
          Recommended by:
          antirove, MichaelNY

          I'll be exploring the Native American poll responses in a separate post, but you bring up one possibility for sure.

          As far as a learning disability increasing error rates on a push-button poll response, that too is certainly possible. But in the previous post in this series, I showed a maximum error rates on the gender question of about 0.2%, and this should include the sort of error you mention. So we know it will not have a huge impact on responses.

    •  Smaller subgroups get skewed more (2+ / 0-)
      Recommended by:
      tapu dali, MichaelNY

      from the randomness, as the respondents incorrectly claiming to be in the group are from larger groups, where the same rate of error produces larger raw numbers of wrong answers.

      It's the same small group/large group imbalance that makes lie detectors unreliable for distinguishing liars; since very few taking lie detector tests are liars, the truth-telling tests that err load a group into the "liar" set that is as large as the actual liar set, correctly identified.

      Let's look at an example:

      Say there are 1000 respondents in two groups, one of 900 (A) and one of 100 (B).  Say there is a 10% chance that a respondent is mislabeled into the wrong group.  Then 90 A's are labeled B's, and 10 B's are labeled A's.  So the sample would be identified as 820 A's and 180 B's.  The slide from 900 to 820 (under 10%) is minimal in comparison to the climb from 100 to 180 (nearly double).

      1. Corporations control our democracy and do not have our interests at heart;
      2. The media is not neutral -- and not blameless;
      3. Ordinary people have extraordinary power.

      by MooseHB on Thu Sep 20, 2012 at 11:02:35 AM PDT

      [ Parent ]

  •  Aha! (2+ / 0-)
    Recommended by:
    MooseHB, MichaelNY

    This was just my theory - that maybe 15% of poll respondents (especially on robo-polls?) were just fucking around, basically. Hence Romney regularly winning the Native American vote. Hence the stronger party in a given state outperforming the polls, since random responses would tend to lead to results that regress towards 50-50. And hence toplines that usually look pretty good, since the random responses would tend to cancel out overall, even as they distort minority subsamples.

  •  There is an inherent difference (1+ / 0-)
    Recommended by:
    MichaelNY

    between Census numbers, which include everyone, and respondents in a poll. The Census %s for Hispanics would be larger than the pool of possible poll respondents because the Hispanic population has a larger % of people under 18, compared to other Non-Hispanics.

    Also, in the Census Hispanic is an ethnicity, not a race (a Hispanic person may be of any race). So in the Census a person may be Black and Hispanic (2 different questions). I don't know how the polling organizations ask the question but a person may have to choose between answering Black or Hispanic. This would skew the percentages, if true.

    I don't know about the Asian discrepancies, though.

    You fell victim to one of the classic blunders, the most famous of which is "Never get involved in a land war in Asia".

    by yellowdog on Thu Sep 20, 2012 at 03:04:34 PM PDT

Subscribe or Donate to support Daily Kos.

Click here for the mobile view of the site