With Iowa having pretty much immolated its long-standing first-in-the-nation status—not just through its incompetent administration of this year’s Democratic caucus, but also the deeply problematic nature of having two of the nation’s whitest states occupying pole position in the Democratic primary calendar—a new question has arisen: Which state should get that role instead?
There are a lot of different options, many of which don’t involve any particular state going first at all, such as one national primary day, or rotating regional “super primaries.” But if we’re going to stick at least partly with tradition and let one or several states go first, then you have to ask which states are most representative of the nation as a whole. We have some answers.
One state that recently volunteered as tribute is Illinois, with Democratic Gov. J.B. Pritzker suggesting last week that the Land of the Lincoln ought to go first. Pritzker’s argument was that Illinois is the most representative state in the country; in other words, it’s the most demographically average state, or the one that’s most similar to the nation as a whole.
It’s actually not that controversial an argument: Most demographers agree that Illinois in fact is the most representative state. In fact, you’ve probably heard the phrase “Will it play in Peoria?” There’s a historical reason for the cliché: It dates back to vaudeville theater, but even in the decades since then, Peoria, Illinois, has often been used as a test market for new products or media offerings. That’s largely because of Peoria’s high level of demographic averageness—an averageness that extends to the rest of the state as well.
You might be thinking, though, that this claim sounds like a lot of subjective mumbo-jumbo; how would you ever know for sure what place is most representative? It turns out there’s a statistical way to do that: It’s called “nearest neighbor analysis,” and you can use it for any sort of question when you have a lot of descriptive data about things you’re trying to compare. If you’re interested in states, the Census Bureau has nearly endless amounts of information that you can use to make those comparisons. The only hard part, really, is deciding which criteria you should narrow it down to.
I’ve settled on broad cross-section of 28 different variables, which you can find here. Looking at this collection of data, Illinois is indeed the state that’s most like the United States as a whole. It has, for instance, similar proportions of whites, blacks, and Hispanics compared to the nation at large. It’s also demographically similar in terms of median age, median household income, education levels, manufacturing jobs, its share of renters, typical home values, and more.
There are certainly other states that might be closer to the national average in one or more of those categories, but when you add up all the differences across all categories, Illinois is cumulatively the least divergent. Below are the total scores for the top five states that are closest to the entire U.S.:
- ILLINOIS: 14.6
- ARIZONA: 22.5
- VIRGINIA: 25.9
- CONNECTICUT: 25.9
- NORTH CAROLINA: 26.5
Illinois has the lowest score, meaning it’s the most similar state to the country as a whole; it’s followed by Arizona (which holds its spot by virtue of most residents having moved there from somewhere else), Virginia, Connecticut, and North Carolina.
With the formula that I’m using, the score starts at 0 and goes up from there as states become less similar. Hypothetically it could be infinite, but practically speaking, it’s going to max out around 100. We calculate the total score by looking at each state’s stats in each of our 28 variables and subtracting them from the figure that describes the U.S. We then tote up the absolute value of those differences.
Fourteen percent of the population in Illinois for instance, identifies as black, while 12.3% of the country does so. That gives Illinois, on this metric, 1.7 points. We then multiply each of these individual scores by a weighting that corresponds to how well each demographic correlates with voting behavior. While theoretically, these multipliers could range from 0 (no correlation) to 1 (perfect correlation), in practice, ours fall between 0.1 and 0.7.
In our analysis, black population has a 0.5 correlation with overall voting behavior, which is fairly high. We therefore multiply Illinois’ 1.7 black population differential by 0.5, giving us 0.85 points. We repeat this process for the other 27 variables (each of which has a different multiplier) and tote it all up, giving us Illinois’ final score of 14.6. (If you’re curious, the variable with the highest multiplier, 0.7, is a state’s percentage of women who have never married.)
Interestingly, Connecticut is another state that you sometimes see mentioned as a potential first-in-the-nation state, not just because of its overall similarity, but also because, unlike Illinois, it’s small enough, both in terms of population and geographic area, to allow for some of the emphasis on retail campaigning that’s such a big part of the Iowa and New Hampshire traditions. (Unlike Iowa, though, Connecticut doesn’t have cheap media markets.)
This methodology can also be used to compare one state to another, and that’s where it’s most valuable. Nate Silver popularized a similar approach over a decade ago as part of his predictive models forecasting the 2008 elections. In his case, he used it to extrapolate what polling might look like in a state that hasn’t been polled much, based in part on the polling data from similar states that have been polled more often.
And that’s really the main reason that I’ve put together my own state similarity index. While I won’t be doing a strictly quantitative predictive model for the Democratic primaries, I’ve been publishing detailed previews. Now, we’re about to move from states where we had a ton of polling data (Iowa and New Hampshire) to others where the polling is scant (Nevada and South Carolina), and from there to others where polling is so far largely nonexistent (most of the Super Tuesday states beyond California).
So while I won't attempt a level of precision like FiveThirtyEight’s in my analyses, it’s important to have some sense of the web of similarities among states that have already voted and states where we’re flying blind—and I’d rather experiment with my own home brew than just steal from the ostensible competition. As you’ll notice, if you click through, a lot of the results that I've generated are quite similar to Silver’s work, even though some of the variables that I’ve chosen are different, and they may be weighted differently. That, in fact, gives me some confidence that what I’ve done passes the smell test.
The actual score itself is important only to the extent that it tells you how similar different pairs of states are on a relative basis. For instance, if you look below at the full chart showing the three most similar siblings for every state (plus the District of Columbia), you’ll see that Missouri and Ohio are the most-alike pair of states in the nation, with a difference of 9.1 points.
That edges out even North and South Carolina, which were the closest pair in Silver’s 2008 analysis. This also points up an interesting phenomenon: State A might be most similar to State B, but the reverse isn’t necessarily true. While North Carolina’s nearest neighbor is indeed South Carolina, South Carolina’s nearest is Alabama. In fact, just 20 states are paired the way Missouri and Ohio are; the remainder are not.
Some states, meanwhile, really aren’t very much like any other state. Hawaii, California, and especially the District of Columbia fall into this bucket. Of course, each of them still has a “nearest” neighbor—it just might not be all that close by.
STATE |
MOST SIMILAR |
SCORE |
2ND MOST |
SCORE |
3RD MOST |
SCORE |
ALABAMA |
South Carolina |
13.5 |
Tennessee |
21.4 |
North Carolina |
21.5 |
ALASKA |
Washington |
31.9 |
Arizona |
32.8 |
Oregon |
34.6 |
ARIZONA |
Florida |
28.7 |
Texas |
28.9 |
Nevada |
29.1 |
ARKANSAS |
Tennessee |
14.8 |
Oklahoma |
18.8 |
Kentucky |
22.3 |
CALIFORNIA |
Nevada |
48.9 |
New Jersey |
51.4 |
Texas |
51.9 |
COLORADO |
Washington |
15.2 |
Oregon |
23.9 |
Connecticut |
31.6 |
CONNECTICUT |
Illinois |
24.7 |
Virginia |
24.7 |
Rhode Island |
25.8 |
DELAWARE |
North Carolina |
22.5 |
Virginia |
24.7 |
South Carolina |
27.4 |
DIST. OF COLUMBIA |
New York |
125.9 |
Maryland |
138.0 |
California |
146.3 |
FLORIDA |
Arizona |
28.7 |
Nevada |
33.9 |
Illinois |
36.2 |
GEORGIA |
North Carolina |
23.2 |
Louisiana |
26.0 |
South Carolina |
27.9 |
HAWAII |
California |
56.0 |
Nevada |
76.3 |
New Jersey |
77.3 |
IDAHO |
Wyoming |
19.7 |
Montana |
23.3 |
Indiana |
25.6 |
ILLINOIS |
Connecticut |
24.7 |
Pennsylvania |
30.5 |
Virginia |
30.6 |
INDIANA |
Missouri |
10.4 |
Ohio |
15.0 |
Michigan |
17.9 |
IOWA |
Nebraska |
16.6 |
South Dakota |
19.3 |
Wisconsin |
21.7 |
KANSAS |
Nebraska |
12.2 |
Missouri |
18.6 |
Indiana |
19.7 |
KENTUCKY |
Indiana |
21.3 |
Missouri |
22.1 |
Arkansas |
22.3 |
LOUISIANA |
Mississippi |
20.4 |
South Carolina |
24.4 |
Alabama |
25.3 |
MAINE |
Vermont |
17.9 |
Montana |
24.5 |
West Virginia |
27.0 |
MARYLAND |
Virginia |
30.6 |
New Jersey |
30.6 |
Connecticut |
36.9 |
MASSACHUSETTS |
Rhode Island |
27.8 |
Connecticut |
29.0 |
New Jersey |
33.8 |
MICHIGAN |
Ohio |
15.8 |
Missouri |
16.9 |
Indiana |
17.9 |
MINNESOTA |
Wisconsin |
21.2 |
Nebraska |
24.8 |
Iowa |
30.5 |
MISSISSIPPI |
Louisiana |
20.4 |
Alabama |
25.5 |
South Carolina |
30.3 |
MISSOURI |
Ohio |
9.1 |
Indiana |
10.4 |
Michigan |
16.9 |
MONTANA |
Wyoming |
16.8 |
South Dakota |
23.1 |
Idaho |
23.3 |
NEBRASKA |
Kansas |
12.2 |
Iowa |
16.6 |
Wisconsin |
17.2 |
NEVADA |
Arizona |
29.1 |
Texas |
30.1 |
Florida |
33.9 |
NEW HAMPSHIRE |
Vermont |
18.2 |
Maine |
28.0 |
Utah |
30.9 |
NEW JERSEY |
Connecticut |
30.6 |
Maryland |
30.6 |
Illinois |
32.3 |
NEW MEXICO |
Texas |
31.9 |
Arizona |
32.2 |
Nevada |
48.0 |
NEW YORK |
New Jersey |
40.8 |
Illinois |
55.3 |
Florida |
56.8 |
NORTH CAROLINA |
South Carolina |
15.6 |
Tennessee |
19.4 |
Alabama |
21.5 |
NORTH DAKOTA |
South Dakota |
24.3 |
Wisconsin |
27.0 |
Nebraska |
30.6 |
OHIO |
Missouri |
9.1 |
Indiana |
15.0 |
Michigan |
15.8 |
OKLAHOMA |
Arkansas |
18.8 |
Indiana |
23.2 |
Kansas |
23.5 |
OREGON |
Washington |
18.9 |
Colorado |
23.9 |
Kansas |
27.5 |
PENNSYLVANIA |
Michigan |
22.8 |
Ohio |
23.1 |
Wisconsin |
25.6 |
RHODE ISLAND |
Connecticut |
25.8 |
Massachusetts |
27.8 |
Oregon |
32.3 |
SOUTH CAROLINA |
Alabama |
13.5 |
North Carolina |
15.6 |
Tennessee |
22.1 |
SOUTH DAKOTA |
Wisconsin |
18.4 |
Iowa |
19.3 |
Nebraska |
19.9 |
TENNESSEE |
Arkansas |
14.8 |
Missouri |
18.2 |
North Carolina |
19.4 |
TEXAS |
Arizona |
28.9 |
Nevada |
30.1 |
New Mexico |
31.9 |
UTAH |
Oregon |
28.0 |
New Hampshire |
30.9 |
Idaho |
33.3 |
VERMONT |
Maine |
17.9 |
New Hampshire |
18.2 |
Montana |
30.4 |
VIRGINIA |
Connecticut |
24.7 |
Delaware |
24.7 |
North Carolina |
28.8 |
WASHINGTON |
Colorado |
15.2 |
Oregon |
18.9 |
Connecticut |
30.7 |
WEST VIRGINIA |
Kentucky |
24.5 |
Maine |
27.0 |
Montana |
31.0 |
WISCONSIN |
Nebraska |
17.2 |
South Dakota |
18.4 |
Ohio |
20.5 |
WYOMING |
Montana |
16.8 |
Idaho |
19.7 |
South Dakota |
23.3 |
If you would like to see our full spreadsheet, showing each of the variables used, and showing each state’s similarity score relative to all other states instead of just the closest three, click through to the Google Docs version.
Please keep in mind that what I’ve proposed here is by no means the only right answer. I would have gotten slightly different results if I’d chosen more, fewer, or different variables, or weighted them differently. But most of what you see makes intuitive sense at first glance, and even in the cases where it might seem odd at first, they might make more sense if you think about individual categories.
For instance, is Utah really similar to New Hampshire? Sure, if you see that they have similarly white populations, similar urban-suburban mixes, and similar education and income levels. And beyond that, many Utahns are directly descended from the earliest residents of New England hundreds of years ago.
One other thing you might be wondering is whether this formula is applicable only to states. No, not at all! You could apply the same formula to any level of geography for which the Census Bureau regularly publishes data. You could find the nearest neighbors for counties or for cities or congressional districts … or even for ZIP codes or census tracts, if you wanted to get impossibly granular.
For instance, one question that pundits have been wondering out loud about recently, in the context of Pete Buttigieg’s rise to prominence, is whether South Bend, Indiana, is a college town or a struggling Rust Belt city. One way to approach that question is by calculating which other cities are South Bend’s nearest neighbors, demographically speaking.
It turns out that South Bend really isn’t one or the other. For instance, it isn’t quite like Youngstown or Flint, and it isn’t quite like Ann Arbor or Champaign. Its closest analogues are, in fact, other medium-sized Midwestern cities with a declining industrial base that nevertheless still benefit from the presence of a nearby large university. Using my formula, the most similar cities to South Bend are Toledo, Ohio (home to the University of Toledo); Rockford, Illinois; and Lansing, Michigan (home to Michigan State University).
One final observation: Even when measured quantitatively, similarity is still a fuzzy tool. While polls of, say, Texas might offer some hints about the political landscape in Arizona, it’s no substitute for the real thing. But with similarity scores, we can at least know which neighbors are worth checking in on.