Originally posted at Overdetermined.net
Consumer data is one of the most frustrating areas of voter file design, especially if you're interested in transparency or an open-source methodology. Large parts of the voter file process use publically available data, and can be duplicated by any group with relatively low levels of equipment and technical skill (for example, witness my tutorial effort, which I absolutely promise to get back to after the election). There's no such ability for data appends. After the jump I'll explain why.
To get a sense of what exactly we're talking about, I think it will help to take a close look at a firm that does this for a living. In this example, we're going to use Info USA. As we explain on our resources page: "If you ever need any kind of commercial data for any of your projects, you're most likely going to get it from InfoUSA. It is almost frightening how much consumer data they have, but if you're looking to poll targeted groups, like, say, people who frequent motels six or more times a year, you're going to get that sample from InfoUSA." What exactly goes into that data?
To see what's on offer, let's head over to the InfoUSA website. The first thing to note is that their political offerings are not highlighted at all. In fact, the most prominent things on the front page are offers for lists of businesses and executives. Not very useful for political purposes (except for fundraising), but just the sort of thing that would appeal to someone selling business-to-business services or luxury products. This makes sense--althougth it receives more than its share of hype, political uses are far down the list of profit centers for companies like Info USA. According to Newsweek, for instance, IUSA's total 2006 revenue was $700 million; meanwhile, the firm's founder, Vinod Gupta, estimates that a partnership with Bill Clinton brought in "over the last seven years, easily over $40 million", presumably mostly political business. Political use is not the primary driver of consumer data collection; instead, it's a side market that happens to be profitable. The main driver of consumer data is advertising, which makes sense--it's a much larger market.
Nevertheless, what's there can be useful to political groups. For a few examples, let's take a look at the IUSA catalog. Their consumer list (PDF), for example, offers verified ages, incomes and even home values (age and income can be especially useful because they are often collected as crosstabs on any polls you might conduct). Even more intriguing is their list of lifestyle and hobby categories (PDF), which could help guide a targeting effort or pinpoint your supporters (i.e. microtargeting). You can purchase lists of people interested in politics; people interested in religion; people who ID as conservative or liberal. There's a ton of information that could be useful here.
So can you duplicate it yourself? Not just no, but hell no. There are multiple barriers to entry here. First of all, unlike state voter files, which come from the Secretary of State's office, the information contained in a consumer data file comes from tons of different sources. To quote IUSA themselves: "Data specialists [enhance each consumer's record] with buying habit and lifestyle information from real estate transactions, product registrations, magazine subscriptions, and survey responses." Even if we allow for a certain amount of self-inflation here, it's still clear that this information does not come from only one source--there's no one government agency that spits out the collected information on the cheap, like there is for voter files. The reason people pay InfoUSA is because IUSA goes to the trouble to collate all this information and combine it into one record with a trackable identity. In fact, the only groups that know as much about you as IUSA or Acxiom are...IUSA and Acxiom.
Aside from gathering all this information into one place, these companies also have the expertise to match it to any data you might already possess (you do not have this expertise). Matching records is tough--people move, get married, die, have children with the same name as them, and generally make life hard. Gathering, collating and coordinating all this information is a massive enterprise--IUSA employs 600 people and spends millions every year. It is simply impossible to use consumer data without purchasing it from a large and for-profit vendor.
Which brings us back to our initial frustration. Obviously, these companies have no motivation to disclose any more about their methods than they have to. If you're a client, you may be able to learn more about their methodology than a random person on the street. But even then it's a relatively closed process. Consumer data is the least transparent and least duplicable part of the voter file process, and this is a shame, since there's so much potential in the field.