Handicapping the upcoming Pennsylvania primary, many pundits are saying in effect that as Ohio went on March 4, so Pennsylvania will go on April 22, citing "similar demographics."
As such, it's useful to compare the demographics of the two states as a kind of bottom-up approach to predicting the PA outcome. I did a statistical cluster analysis on Ohio's 88 counties, classifying them according to five variables: population density, black population, household income, high school graduation and college graduation. The analysis (k-means clustering) was set to report 6 different clusters. Here's a map of the result:
More...
The six clusters are defined roughly as follows:
Hillary-billies
Low income, low education, low population density, 99% white; supports Hillary at 80%.
Ed-necks
Like the Hillary-billies, just a little less so (slightly more income, education, etc), 97% white; supports Hillary at 70%.
Proletariat
Medium income, medium education, exurban, 94% white; supports Hillary at 61%.
Hoosierdom
Medium income, medium education, rural, 98% white; supports Hillary at 61%.
Crunchy-cons
High income, very high education, medium population density, 97% white; supports Hillary at 55%.
Obama-crats
Medium income, high population density, high education, urban, 79% white; supports Obama at 54%.
So while the clustering was done on non-political factors like education, income and race, there are distinct political differences in Hillary support, reaching a remarkable 80% in deep Appalachia, and 70% among Appalachian "Ed-necks"--a pun on PA Gov. Ed Rendell, who famously noted that some whites are not ready to vote for a black candidate:
Rendell pegs the white racist effect at a "tiny percentage" of the electorate, but notes that small percentages can determine elections.
As for how these clusters break down in share of the total Democratic electorate in Ohio that turned out on March 4:
Certainly the Hillary-billies are a tiny slice of the Democratic pie, being thinly-populated counties that have a fair number of Republicans (though generally going for Bill Clinton in 1992). The Ed-necks, supporting Hillary at 70%, are more significant. The two groups together represent 12% of the Democratic turnout in Ohio.
The only group that Obama prevailed with is the Obama-crats, i.e., the populous urban counties with significant minorities of blacks. This is the largest cluster, almost half the Democratic turnout, but even there his margin was not large (54%).
Among the largely white counties, Obama comes closest with the Crunchy-cons, traditionally Republican suburban counties that may have had some progressive encroachment of late. He actually did squeak by in Delaware county, the county just north of Columbus (Franklin). That was the only non-urban county he took in Ohio, and that only barely. In fact, he only came out on top in 5 counties:
The strongest factor was the percent black population in the county. Here's a scatter plot of Hillary% vs. black% in each county:
Overall, Hillary won 55%-45% in Ohio.
So that's the uphill battle Obama faces in Pennsylvania. The question remains how similar exactly are the demographics in the two states. Here's the result of a similar cluster analysis of Pennsylvania's counties:
So the hill is even steeper for Obama in Pennsylvania. Only one county, Philadelphia County (synonymous with the city limits), is "Obama-crat" territory, having a urban population with 45% blacks. Allegheny County (Pittsburgh) has only 13% blacks. In comparison, Cuyahoga County in Ohio (Cleveland) has 29% blacks and Obama was only able to eke out a 53-46 victory there. It would seem that the odds are against him winning Allegheny County. In fact, there's a possibility he will only take Philadelphia County and perhaps Centre County, the latter being the home of Penn State. The suburban "Crunchy-con" counties west of Philadelphia might offer some chance, but they'd be about toss-ups.
Here is a comparison of the six groups in both Ohio and Pennsylvania, in terms of share of the Democratic electorate (projected in the case of PA):
So the Hillary-billy/Ed-neck fraction is much greater (24% combined compared to 12% for Ohio). The Obama-crat fraction is greatly reduced, comprising only Philadelphia, and "Hoosierdom," where Hillary can generally expect in excess of 60% of the vote, is also increased.
It's possible to simply take the Hillary preference in each of the six groups in Ohio and apply it to the same groups in PA to get a projected outcome, but some minor tweaking is in order, primarily for differences in racial composition in the analogous groups, which has been so determinative of the vote. For example, although the "Obama-crat" county group is much reduced in PA, it has 45% blacks compared to 21% in Ohio.
Here's the demographic comparison of the six groups in each state:
For the projected PA primary results, the percent turnout of each group is projected to match that of the analogous group in Ohio. The projected Hillary percent is adjusted from the value in Ohio, modified to account for differences in education, black population, etc.--obviously not an exact science. For example, the Proletariat group in PA has more college degrees than in Ohio (25% vs. 19%). Since college graduates tend to break something like 60/40 for Obama, this difference would give him a percent or two additional.
Given the projected numbers in the table, the projected outcome is a 57%-43% victory for Hillary. If anyone would like to try different projected numbers, here's the spreadsheet:
OH/PA Cluster Analysis (xls)
An additional wrinkle in Pennsylvania is that the primary is closed to all but registered Democrats--no crossover of Republicans and independents allowed, in contrast to Ohio where the Democratic primary was open to all registered voters. In Ohio, there was significant crossover for Obama, as well as for Clinton. By some conventional wisdom, Hillary does better among mainstream Democrats, so this could help her in PA. However, a case could be made that it might help Obama, for instance if there is significant racial prejudice component to the voting, it could be that that effect is stronger in Republican/independent crossovers than in the Democratic Party, so excluding the former might help Obama. It should be very interesting to see how the voting patterns compare to Ohio's open primary.