This is the second in a series of analysis of congressional districts.

Note that one should not use these analyses to make statements about individuals. That's called the ecological fallacy, and it can lead you very far astray, very quickly.

Also, please ask questions.  Don't look at the graphs and equations and run away.....ask.  There are no dumb questions.  I will not tell you you are stupid for asking.  Statistics is confusing to lots of people, not just you!  So ASK!

Today, I started off by looking at median income and Cook PVI.  That led to other things.  More below the fold

My suspicion, before looking at the relationship between median income and Cook PVI (Cook PVI is, essentially, a measure of how Republican or Democratic the district was in the last two presidential elections, compared to the national average) was that higher median income districts would be more Republican.  I did know that some high income districts were quite Democratic, but I thought these were exceptions.  Well, one reason to explore the data is to see whether your suspicions are correct.   Here's a graph of median income and Cook PVI across 435 districts:

My favorite professor in grad school used to say "If you're not surprised, you haven't learned anything".  I'm surprised, but what can we learn?
The very poorest districts are, indeed, very Democratic. At the extreme, the poorest district (NY16) is also the most Democratic (Cook PVI is D + 43).  But above a median income of about 30,000, there is only a modest relationship, and, what there is points to wealthier districts being more Democratic..... hmmm.

When results surprise you in this way, one thing that may be going on is that there is some third variable that is affecting the relationship.  I know that people in rural areas have different views than those in urban areas....

The language I used to draw these plots R offers a tool called conditioning plots, that lets you look at three variables in an interesting way.  You divide the third variable into groups, and then plot the first two in each group.   Easier to show than tell:

Each panel of the graph is congressional districts of a certain level of urban-ness.  The lower left is less than 50% urban, lower right is 50-75%, upper left is 75-90% and upper right is over 90% urban.  (Note, it is probably better to think of 'urban' as 'urban or suburban' or, perhaps 'rural').  This is interesting!

First thing that strikes me is that there is almost no relationship between median income and Cook PVI except in the highly urban districts, where it is strong and in the expected direction: Higher median income = more Republican.

Next, we can see that more urban districts are, generally, more Democratic: All but one of the districts with Cook PVI over D+20 are over 90% urban.

Third, all the high income districts are mostly urban.  Of districts with median income above \$60,000 or so, none were mostly rural, and most were 90%+ Urban.

Graphs are good for exploration, now let's look at a model.  In specific, let's look at several regression models, with the dependent variable being Cook PVI and the IVs being different combinations of urban and median income.

First, Cook PVI as a function of median income (I measured median income in thousands of dollars):
The resulting equation is:
CookPVI = 3.69 - .051*MedInc.

What this means is that the predicted PVI for a district with a median income of 0 is D+4, and that it declines by .05 for each thousand dollar increase in median income.  This difference wasn't significant, and the R^2 for this model was only 0.0001, meaning that almost none of the variation in CookPVI is accounted for by median income.

Second, Cook PVI as a function of %Urban
This gives:
CookPVI = -29.45 + 0.39*Urban

that is, when urban = 0, the predicted CookPVI is R + 29, and it gets more Democratic by 0.39 points for each percent increase in Urban.  So, for a 50% urban district the predicted Cook value would be -29 + 50*.39 = R+9, and for a district that's 100% urban, it would be D + 10.
R^2 here was 0.29 indicating that urban-ness accounted for 29% of the variation in Cook PVI

Finally, a model with both urban and median income:
Cook PVI = - 18.8 - 0.41*Median Income + 0.48*Urban

that is, for a district with median income = 0 and urban = 0, the predicted Cook PVI was R + 19, and this got more Republican by 0.41 units for each thousand dollar increase in median income, but got more Democratic by .48 units for each unit increase in Urban.

Both urban and median income were very significant, and this model had R^2 of 0.38.

All too often statistics is presented ex nihilo, as the voice of god.  Thanks for showing the guts of the calculation - I was able to reach waaaaay back to middle school exploration of statistics and, with a little googling, make sense of it.  Great!

ask any questions.... I am sure others have similar questions.

beware the ecological fallacy.

You're extrapolating the characterestics of individual voters from aggegrate data.  In this case you're using measures of central tendency of the whole, (i.e. means, medians, maybe even modes?) to say something about the parts.  Equally important are measures of dispersion (i.e. standard deviation) that tell us about the distribution of cases within the aggregate.

A commonsense illustration.

I have two tubs of water to put my feet in.  The two aren't the same temperature, but I know that the mean for the 2 is 100 degrees so no big problem, right?

Potentially very wrong.  Measures of dispersion matter.  If one tub is 95 degrees and the other is 105, no big deal.  But if one tub is 40 degrees and the other is 160 degrees, the mean remains 100 degrees.  Obviously, the latter case is substantially different from the former.  Drawing a conclusion about the parts from the whole in the former case where measures of dispersion would be low is much less prone to ecological fallacy than the latter case.

ecological fallacy in the diary.

You're quite right that it's a problem, but this diary is about districts, not individuals.

this, however I think that it's something that needs to be made explicit when dealing with persons who don't have the statistics background to take this as a given.

• ##### You're right(0+ / 0-)

it's one of the fallacies that is easy to fall into, and hard to see the logic of.

I would really like to understand it but I've been up since 6am yesterday.  We need more congressional analysis around here, not just the senate.  Because it is important to understand what makes up a district in order to win it.

With that said- my prediction is dems +40, about the same as when I got lambasted early this summer (seriously check my diaries).

You must not lose faith in humanity. Humanity is an ocean; if a few drops are dirty, the ocean does not become dirty. - Mahatma Gandhi

anything.  I love that expression.

In my travels these days I have often been surprised that my prejudice about income and political leanings often do not hold true.  And now you show the data.  Many thanks, plf.  This is fine stuff.

Now back to pondering the full impact of what you presented.  Peace.

Jerry Northington for Congress in '08, DE-AL Elect a real Progressive Democrat in '08.

New York state Congressional districts?

I think Sen. Clinton would make a very good president.

all 435 districts.

I did learn something -- I am very surprised to see that Serrano's district is the poorest in the country.  I know that that part of the Bronx has a lot of economic problems, but I thought that incomes would be in some part offset by the general way that wages are higher in New York than elsewhere.  Wow.

I think Sen. Clinton would make a very good president.

• ##### NY16 is interesting(10+ / 0-)

one of the reasons it's the poorest CD (and by a lot) is that it is one of the few districts that is all in a poor area of an inner-city.

The south Bronx is probably no worse off than, say, the south side of Chicago, or south-central in LA.  But there's no district in Chicago that's just 'south side'.

NY16 is the smallest district in the USA (12 sq. miles).  In comparison, the south side of Chicago is divided mostly among IL01 and 02 each of which also includes suburbs (IL01 is 99 sq. miles, IL02 is 192 sq miles, and IL03 is 123 sq. miles).  Median income is \$37,000 and \$41,000).

The City's prison, Rikers Island is in this District. ~10,000 count for census purposes, but show no income.

Also many undocumented workers, paid under the table.

Democratic Candidate for US Senate (Wisconsin 2012)
Court certified Marijuana Expert

I know there are districts that have a foot in in citey but stretch way out across the country.
I was surprised once when I lived in a town of 2100 in the middle of nowhere that i was considered an urbanite.

Politics (a great resource!) they got their data on this from the census bureau.  I don't know the details, but it has to do with population density and integration with a city and so on.

and if you view that as a dig at some applications, say in education, I can't imagine why you think that  :-)

peace

Those who can, do. Those who can do more, TEACH! If impeachment is off the table, so is democracy

hehehe

Who says sarcasm can't work on the web?

a definition and link in the text would be nice. :)

CHRISTIAN, n. One who believes that the New Testament is a divinely inspired book admirably suited to the spiritual needs of his neighbor. A. Bierce

you're right!

I will fix

of jargon that may be common knowledge in your own field.

CHRISTIAN, n. One who believes that the New Testament is a divinely inspired book admirably suited to the spiritual needs of his neighbor. A. Bierce

• ##### Gotta go for a bit(0+ / 0-)

but I will be back

I also take requests for further analysis

with an interaction term (i.e. Income X Urban)?  Based on your plots it looks like the effects of income and urban are not independent of one another.  As you noted income seems to only have an effect in the more urban districts.  It would be interesting to see if  the results were qualitatively different with the interaction included.

Great diary - very nicely explained.

I did try one with an interaction; the interaction term was fairly small and not statistically significant, and I didn't want to get into explaining interactions, on top of all the technical stuff that was in the diary already.

FWIW, the model with the interaction was

CookPVI = - 2.2 - .87 * Median Income + .30*Urban + 0.005*medinc*urban

and this model had R^2 = .38, just slightly better than the model without the interaction

I think that you've pointed out something empircally that we know anecdotally, Democratic vote is related negatively to income in urban areas, but the coefficient (thinking in regression terms here) is much lower in rural areas.

The question is why?

Does this mean that we should abandon rural areas, or that there's a tremendous opportunity to take a whole swath of seats from Republicans?

Put another way if you created a list of predicted values for rural districts using the income coefficient for urban areas (i.e. saying that income would have the same effect on vote on rural areas as in urban areas) what seats does that bring into play?

Polling in 2006 showed that there's a lot of tumult in the rural vote, and that if given a message that focuses on economic ineqaulity rather than running a campaign based on race, gender, and the culture war that Democrats take seats.  One interpretation might be that the incidence of culture war issues in the newsmedia acts as an intervening variable lessening the impact of income on vote in rural areas.  What happens if you change that?

I think that you'd be suprised if you run the predicted values using the urban income coefficient in rural districts.

So to be clear.

1.  Regress income as the IV on Dem vote as the DV in only Urban districts.
1.  Apply the constant and coefficient from (1) on rural districts, and see what the predicted vote total is.
1.  The regression equation for only districts with more than 95% urban is

43.17 - 0.61 * Median Income.

that part's easy.

Now, if we apply that to all the 435 districts, the Cook PVI becomes, on average, 15 points more Democratic, and 423 districts would vote Democratic.

Among the districts with less than 50% urban, the vote would be 27% more Democratic.

That is, if every district worked the way urban districts worked, nearly every district would be Democratic.

How to make that happen?   Good question.

urban threshold to 65% you'd get a lot more meaningfull results.  At 95% you're probably getting a lot of multicollinearity with race.

• ##### About 3/4 of all districts(0+ / 0-)

are over 65% Urban

It's an odd definition, but it's what the census uses.

The correlation between Urban and Black is only .17; of districts with 90% or more urban, the range of percent Black is from 0.5% to 65.2% (the one with very few is UT-03).

• ##### Interesting(1+ / 0-)
Recommended by:
plf515

From the first graph, it looks like PVI bottoms out for Dems around 40k, and slowly upticks from there.

But the reason for the uptick is mostly due to higher degree urbanism of the wealthier districts.

To continue with mole's question above: while % Urban and Income don't appear to be independent, that may just be a factor of the fact that in a large urban area, the population is split up among many CDs, leading to creation of both very wealthy and very poor districts.

I would think you would find a positive correlation between %URBAN and Absolute Income Index, defined like this:

Income Index = difference between CD's income and the median income of all CDs

Absolute Income Index = absolute value of the Income Index

(In other words, urban CDs are more likely to be either very rich or very poor.)

let's see...
The median median income is \$41,100
the absolute income index for districts with over 90% urban is 11.16, for districts that are less urban, it's 6.33.  So, your idea is correct!

For more urban districts, the range of Med.Inc was \$19,300 to \$80,400 for less urban districts it was \$21,900 to \$73,400, which is a cruder way of confirming the same hypothesis.

Very Interesting.

This is somewhat related to the Great Democratic Demographic Paradox.

For example, Democrats do very well among African Americans and poor voters (overwhelmingly so).  But we fare very poorly in states with large numbers of African American voters (MS, LA, SC) and very well in states with high incomes (NJ, CT, MD).

• ##### If the Black population in MS(1+ / 0-)
Recommended by:
plf515

ever creeps back over about 45%, the Republicans are going to start having a really hard time there.

• ##### Depends how many vote(0+ / 0-)

and in what districts they are.

Both low turnout and gerrymandering can hurt things.

find a congressional district with a black population greater than 45% that elects a Republican representative.

I'm not aware of one.

• ##### There isn't one(1+ / 0-)
Recommended by:
andgarden

The highest percent Black of a district that elects a Republican rep is 33.7%

There are 7 districts that elect Republicans and have more than 30% Black:
AL03 VA04 LA04 LA06 GA08 LA05 MS03,
the highest is in LA05

I think the reason is that blacks are overwhelmingly democrats everywhere. Whites are not overwhelmingly Republican everywhere. Even if Republicans get 70% of the white vote, they've got a problem if Democrats are getting 90% of the black vote.

• ##### There's a great article(0+ / 0-)

by Andrew Gelman: Rich state, poor state, red state, blue state

There's a reference elsewhere on this thread, I also wrote a diary about it here

the basic idea is that income is related to vote only in poor states

logrithmic trend line?

"I count him braver who overcomes his desires than him who conquers his enemies; for the hardest victory is over self." --Aristotle

do you mean some analysis of several election cycles?

The big handicap to that is data entry.... I had to do these 435 by hand

thought it might reveal some interesting stats like what direction the trends are moving and possibly point to some events that were major drivers of the dynamics. Maybe something like the purchase of rural AM radio stations yielding spikes in the republican trends. These are the type of statistics that can reveal where efforts need to be focussed to bring about real changes.

"I count him braver who overcomes his desires than him who conquers his enemies; for the hardest victory is over self." --Aristotle

at the University of Michigan has ecological datasets that link election returns to demographic characterestics.  You should be able to download them no problem, but they will probably be in SPSS or Stata format.

This having been said, setting up a google docs account that you could use to share spreadsheets might not be a bad idea.  A lot of people have created these datasets for diaries, but they've never been collected in one place. Doing that would make it much easier to do this sort of analysis by preventing duplication  of efforts.

I used to write diaries in this area, and have a lot of old excel spreadsheets, but I've been so busy with grad school that this was placed to the wayside. And I've decided that I'm a qualitative researcher, because I fucking hate trying to get data where operationalization matches conceptualization.

• ##### I have a Yahoo group(1+ / 0-)
Recommended by:
ManfromMiddletown

called stats_geeks_of_daily_Kos

several people urged me to switch to Google groups, but I'm having some technical problems doing so

• ##### I would check this by region(1+ / 0-)
I assume it would be.

• ##### Good idea(1+ / 0-)
I assume you are familiar with the work by Andrew Gelman looking at income and voting patterns that reconciles how richer states tend to vote D yet the likelihood of voting D goes down as income goes up.  There's a lot on it in his blog for example,  here .  He's also done some work looking at church attendance as an additional predictor.

Here's part of the abstract from his paper "Rich state, poor state, red state, blue state: What's the matter with Connecticut?" on this topic (I had trouble getting a direct link to work, but you can find it on-line easily)

We find that income matters more in "red America" than in "blue America." In poor states, rich people are much more likely than poor people to vote for the Republican presidential candidate, but in rich states (such as Connecticut), income has a very low correlation with vote preference. In addition to finding this pattern and studying its changes over time, we use the concepts of typicality and availability from cognitive psychology to explain how these patterns can be commonly misunderstood. Our results can be viewed either as a debunking of the journalistic image of rich "latte" Democrats and poor "Nascar" Republicans, or as support for the journalistic images of political and cultural differences between red and blue states— differences which are not explained by differences in individuals’ incomes.
For decades, the Democrats have been viewed as the party of the poor, with the Republicans representing the rich. Recent presidential elections, however, have shown a reverse pattern, with Democrats performing well in the richer "blue" states in the northeast and west coast, and Republicans dominating in the "red" states in the middle of the country. Through multilevel modeling of individual-level survey data and county- and state-level demographic and electoral data, we reconcile these patterns.

• ##### Fascinating article(0+ / 0-)

I wrote a diary about it!

Gelman does a lot of good interesting work.  His blog has a lot of it (it's on my blogroll)

or what the graphs are showing, but the rise in republican-ness as a function of the rise in median income resonates.

However the definition of republican-ness may be oversimplified. There are two republican parties: the Christian, flag-waving, values based party and the business republicans, who care for nothing but money in their pockets.

The democratic party has two similar parts: what remains of labor in the democratic party, and the progressive, well-read, blogging, party core. Historically, this is expressed in the 1980 election, when the intellecual wing of the democratic party rose to prominence and the Teamsters (and PATCO - bad move, huh?) wound up supporting Reagan.

The Political Compass breaks out the political spectrum along 2 scales: x = rich v. poor and y = authoritarian v. libertarian. (I land somewhere around the Dali Lama. Wierd.) I haven't a clue how you'd express a fourfold party breakout statistically, or even how to identify where voters land on this graph.

One other concern. Beware of the Cook PVI. I know that in the MA-5 (D+11) it's way off base. I'd rate it D+3 or D+4 and getting worse. The district has been gentrifying in the last 20 years, and lots of the the labor-democrats have gone over to the right.

"Freedom is a choice you have to make everyday."

being way off base - it's based on the last 2 presidential elections

for the MA-05 is way off. It's the most conservative district in Massachusetts.

Good diary, you're subscribed, rec'd and tipped.

"Freedom is a choice you have to make everyday."

and Gore a 19% win.

I don't know much about the district, but it clearly votes very Democratic in presidential elections

is more liberal than the most liberal district in Utah. Are you surprised?

As long as the other Massachusetts districts correctly show a higher +D PVI than the 5th, I'd say there's no mistake.

"...And I woulda got away with it, if it hadn't been for that meddling Kos!" ---attributed to Tom DeLay

Massachusetts

1st D + 15
2nd D + 13
3rd D + 13
4th D + 19
5th D + 11
6th D + 11
7th D + 19
8th D + 33
9th D + 15
10th D + 9

The 5th is NOT the most conservative district in Massachusetts. The Cape Cod 10th--which has more old-money people--is.

And even that one's at D+9. I'd take that for my state's worst district in a heartbeat. Go Massachusetts!

"...And I woulda got away with it, if it hadn't been for that meddling Kos!" ---attributed to Tom DeLay

What Districts should we hold that we don't if you apply these stats to all districts?

Wow how much work went into this one?

which districts are in the 'wrong' camp, based on their demographics.  But this diary is about Cook PVI, which (as I know you know) doesn't answer....it's about past presidential vote.

I do have data on which districts have Republican or Democratic reps...... I'll try a logistic regression equation to predict that....

I already have a diary coming today (not election related), so maybe Tuesday or Wednesday, not sure what time.  I'm also working on one that will take your great info on who's running where and look at it in a little more detail, mostly who is running and not running in competitive districts.   I'll probably put that one up on Tuesday, and the logistic reg. on Wednesday.   I also found a fascinating article by Gelman and his associates (in press) that I want to diary about....that may be Thursday

As to how much work.....well, first I had to do data entry (ick).  I think each district took between 30 seconds and a minute, so the total was about 5 hours. But that will support several diaries.  After that, the work on this diary itself was about 2 or 3 hours.

You couldn't find a place to download the demographics?  I thought census.gov would have this info...  I know they have both short and long form data tables by CD available via ftp.

that would help a lot (although I doubt they have things like Cook PVI). Do you have details?

Almanac of American Politics is online, as well, but not in a way that is easy to import.

No PVI in the census data, of course, but lots of other things are out there...

The 2000 census is getting old at this point but has tremendous detail.  You can download all of the long form 5% sample data by CD within a separate file for each state at Congressional District ftp data.  Dealing with that massive amount of data is kind of a project (but I could probably write a program in Stata that automatically downloads and imports all of it -- that's a little challenge...)

The census bureau's American FactFinder lets you build your own tables and they let you select CD as a geographic unit.

The ACS data are more recent (2006) but you will probably need to poke around to find what you want.  They have very (too) detailed xls tables for each CD that you can find at ACS ftp site.  You may want to explore up and down the directory tree at that ftp site to see if you can find some tables that cover all CDs in one.

It might be easier to get basic census info by CD using the Kids Count web site  where you can select CD (108th) as a breakout and they give you 5 tab delimited files that cover more than 1000 demographic variables (including some from 1990 and 2000).

Good luck...

And I thought I could catch you out!!!!!! Looking forward to the other diaries. For my part I might do a Call to arms for texas races/candidates later in the week.