Russell Tice, former NSA analyst, is talking to Keith Olberman as I type this, providing some stunning revelations about the extent and nature of the Bush administration's domestic spying program. I'm going to need a transcript to get into all the details, but wanted to write this in order to try to clarify at least one or two technical details he brought up.
Those points aside, though, there are two big takeaways from this:
- Domestic spying was pervasive -- far, far larger than anything we knew.
- Domestic spying targeted specific known non-terrorist groups: e.g., journalists.
Let's try the technical stuff first.
Mr. Tice talked at some length about the difference between large-scale technical surveillance and more focused directed surveillance. If I've understood him correctly, then I think I can explain what he was talking about by using email as an example.
If you were interested in screening huge amounts of email, but didn't have the capacity to capture or store it all, you might decide to just content yourself with the metadata. Metadata is just "data about data". For instance, in the case of email, some interesting metadata might be: (a) what language it's in (b) the sender's address (c) the recipient's address (d) the length in bytes (e) the length in lines (f) what kinds of attachments, if any (g) what mail program was used to compose it (h) what the Subject line was, and so on.
This sort of metadata is relatively easy to extract and takes up a lot less room than the actual data: the metadata for an email message with 2M of photos attached might fit in 1K. (And this is the point where it should dawn on you that similar metadata exists for faxes, phone calls, and every other electronic form of communication.)
Metadata can be useful. Suppose you know that The Bad Guy always uses Eudora 1.1 to compose mail messages and always attaches photos that are 772x448 pixels in JPG format. If you've extracted the right metadata from billions of messages, you might be able to figure out that the 99.999% of them aren't what you're looking for by using that as a filter. If you're lucky, the only messages left will be the ones you want -- or the number will be small enough that brute force or maybe a simple search will get you what you want.
But metadata can be abused. It's possible to use the same collection to reconstruct
the salient details of every message sent by A. Or from A to B. Or which has a "Subject:" line containing the string "protest". And so on. It enables ad hoc fishing expeditions that are limited only by the scope of the collection, the kind of metadata extracted -- and the restraint of those conducting them, which I think we can safely characterize as "nonexistent".
If I understood Mr. Tice correctly, metadata collection was untargeted and pervasive.
They went for everything they could get. Which means if you sent a message to Aunt Mary with a photo of the dog on July 17, 2004, they acquired -- or at least tried to acquire -- the metadata for it.
Mr. Tice's further point was that high-level technical analysis like this was used to select specific targets for detailed analysis -- and in that detailed analysis, EVERYTHING was collected. Not just metadata: everything. Every phone call, every fax, every email, every instant message, everything. All captured and stored in a database...somewhere.
This so much worse than what we knew at this time last year that I hardly know where to begin. Let me just recapitulate part of what Mr. Tice said: he stated that he'd been asked to identify particular groups so that they could be excluded from surveillance...but eventually he realized that this was an internal NSA cover story, and that those were precisely the groups being targeted. It took him a while to get around to naming one of those groups, but when he did...journalists. Reporters. The news media.
So not only did our own government spy on ordinary citizens -- which is bad enough -- it spied on the people most likely to be contacted by whistleblowers passing along the news that the government was spying on ordinary citizens.
All of which leads us to a number of very disturbing questions:
- Who authorized this?
- Who knew about this?
- How was this done? Who, among telcos and ISPs, collaborated? Were they served National Security Letters to silence them?
- Is it still going on?
- What are all the people and groups who were targeted?
- Where's the data?
- Who has had/now has access to the data?
- What purposes has this data been used for?
I'm sure there are more -- we have, I'm afraid, only begun to hear the smallest portion of this and there is likely much more to come.
Update 1:It's been pointed out to me that I didn't mention one of the most powerful methods of metadata analysis -- one that quite likely was used on this data. (Is being used?) It's possible to use the metadata to ask questions like "Who does A talk to?" and "Who do they talk to?" and "Who do they talk to?" and "Does this look like a close-knit group?" and "How often do they talk to each other?" and "What are the characteristics of those communications?" There are a bunch of terms for this; "network analysis" is the one I know. It's a way of discerning a connected group in an ocean of data. Like many tools, it can be used for good, evil and neutral purposes. But what makes it so powerful is that very few people are aware of its existence, and thus they unknowingly collaborate in creating this metadata and making it useful.
Update 2:Here's a link to the MSNBC site with the interview, and maybe, if I've not botched this (I did -- so pulling the attempt. Just open the link in another browser tab).
Update 3:KateCrashes points out in the comments that there is a Wikipedia entry on Russell Tice; thanks! A quick scan of that article reveals a number of links to previous news stories about him as well as considerable backstory about his history as a whistleblower.
Update 4:Thank you for the recommendation; very kind of you. One more clarification based on a re-listen to the interview. Mr. Tice was very careful to say, with emphasis, "all computer communications". That's a massive amount of data/metadata: it could (and probably did) include not just email and IM and blog postings and so on, but every click (HTTP request) sent out, every initiated login session (with HTTPS to a web site, let's say, or SSH to a server), every file retrieved with FTP or HTTP, every file served or retrieved with P2P protocols...and I'm presuming, timestamps on all that to facilitate later sequencing of events. I've used this kind of tap in the course of my work (network security) and the amount of data it provides on just a single computer (and its users) is amazing; applied on a very large scale, it would put the Panopticon to shame. If he does come back tomorrow to Countdown, as KO asked, it will be interesting to hear how he augments and clarifies his statement.
Update 5:The morning after, I've had a chance to slowly and carefully read the transcript. Something else that I think may be important jumped out at me. Now it's entirely possible I'm way off-base here, but let me quote the relevant text and try to explain.
TICE: Well, as I was going for support for this particular organization, it sort of was dropped to me that, you know, this is 24/7. Because I was saying, you know, I need collection at this time, at this point for, you know, for a window of time. And I would say, will we have the capability at this particular point? And positioning assets, and I was ultimately told we don't have to worry about that, because we've got it covered all the time. And that's when it clicked in my head, this is not something that's being done on a onesy basis, onesy-twosie. This is something that's happening all the time.
If your goal is to tap all email (again, as above, just using that as an example) then when you put the tap(s) depends on the extent of what you're doing. If, for example, you just wanted everything in and out of Yahoo, then you serve Yahoo with a NSL, and place the taps directly on their mail server infrastructure. But if the next day, you want everything in and out of AOL, you have to repeat the process there. This is, I think, what he means when he uses the phrase "positioning assets"; I've seen it used in the past to describe the process of figuring out where taps need to be to monitor a target.
But "positioning assets" is tedious and inefficient if your goal is to tap everything. It requires the collaboration of an ever-increasing number of people, and as that number grows, so does the probability that one or more will deliberately or accidentally leak what's going on. So rather than doing this the right way -- the way it's intended, with court approval for limited, targeted taps whose purpose is clear and whose extent is curtailed to avoid collecting any more data than necessary -- it sounds to me, from Mr. Tice's description, that precisely the opposite approach was chosen.
That is: "assets" were placed so that they were minimal in number but maximal in scope. This probably means that they were placed at network exchange points -- which to a first approximation, you can think of as the backbone of the Internet. That's why assets didn't need to be repositioned if the target changed: they were already positioned someplace where they could see everything.
I can't imagine a FISA court approving this. Perhaps that's a failure on my part, and I'm simply not thinking darkly enough, but this seems so completely over-the-top that I can't imagine it.