I posted this as a reponse to the Barrons thread... but after I posted it I realized it was a little tangential, so I thought I would create a diary for anyone that wants to discuss data mining technology.
I write data warehouse software for a living... the very kind of things that are used to do data mining. I write for different vertical markets, mostly real estate, but the principles are the same.
The power of data mining is not just in the queries that you KNOW you want to ask ("where are the calls placed from with certain keywords in them?"). The real power of a well-designed data warehouse are the queries you didn't even know you COULD ask ("what are the keyords used in communications in districts that voted against the GOP?"), for example.
They are very powerful tools and they often provide insights that were previously unquantifiable.
The keys to creating a good data warehouse model are rich attributes (that is, EVERYTHING about the "subject" of the query) and conformity of the dimensions you are querying on (keep all the facts about say "the customer" in one place.
But the aspect of this I can't even fathom is how they feed this data warehouse. The process is called ETL (Extraction, Transformation and Loading). It means that you have access to all of your source data (databases, flat files, etc...) and you have systems powerful (and large) enough to process this data in to the conformed rich dimensions that you need in order to do adequate data mining. This require tremendous horsepower AND it introduces a concept called "Grain" of the data. Grain is the minimum level of aggregation that you are going to store and view the data. An example of grain in a retail environment might be Transactions Per Day Per Store. The limiting factor on the grain of your data is the time it takes to run your ETL process. If it takes more than 24 hours to run your ETL, you are always going to have a latency of at least one day in your data warehouse queries.
So if the government is saying that they are using "data mining" to identify areas of interest, then that database is at least constrained by the time it takes to gather and transform that data into useful schemas.
There are such things as "near real time" data warehouses, I built one for a role-playing game company a while back. We were trying to determine things about player behavior WITHIN the game by analyzing things like player proximity, motion, inventory, attributes etc...
I was discussing this project with my father, who did radar analysis for defense contractors, and he pointed out that the data analysis was remarkably similar to what is called ELINT (Electronic Intelligence) that is used in military intelligence. The similarities were so strong that we are working on a white paper about this very subject. I'll post a link to it when we get it in shape.
One of the well-known gurus in the Data Warehouse world, Bill Inmon, has done a lot of work with government data mining models.
Government Data Warehouse
I live and breathe this technology, so if anyone has any specific questions, I'd be happy to try and answer them for you.
Oh and Happy Holidays to all!
Frank