NSA whistleblower William Binney says that the NSA is recording 3 billion calls per day.
http://www.masslive.com/...
One of the diaries on the NSA Spying program devolved into a pie fight on whether or not it was possible for the NSA to deal with and store all of the phone conversations that occur in the US--that the volume of 3 billion is just to high for the NSA to deal with. And there were some pretty bad assertions made, based on a lack of knowledge about audio compression. It was also asserted that compression uses to much computing power -- however all cell providers already use advanced compression at the source.
There's a lot of unknowns, true, but the technology is quite possible.
So a few quick facts:
The assumption people have made seems to be that bit rates need to be similar to what is used for compressed internet / music radio. That is not the case.
1. Speech encoding is no where near as data intensive as music encoding, because the human voice has a very limited range. While "CD Quality" uses 16 bit / 44.1 samples per second, and pro audio uses much higher rates (24 bit /192K samples is now common) , speech encoding is limited to 8 bit 8K samples per second with no troubles - perhaps much less, 7K samples would still give more than enough to pass the Nyquist frequency for the very highest speech frequencies (speech range is 350 - 3400 hz, you need to double that and / or filter the high end to avoid 'foldover')
So 1 hour of audio would be around 23.8 megs of storage, maybe much less.
This is not compression - it's the full version. And it's one reason why music on hold sounds so crappy today -- the system is not optimized for music frequencies.
2. All voice calls already use compression.
It was the driving force behind cell providers changing from Analog to Digital in the mid 1990's, because even the early digital systems allowed many more user per tower. Since then, they have added more and more calls / cell tower by both increasing data rates and better and better compression rates, which today with 4G are very high.
4G already uses an algorithm based on a Fast Fourier Transform (breaks down audio into it's harmonics) , but takes even that to another level: 20x data compression at the source is common.
Which brings us to 1.19 Megabytes per hour of storage before any special NSA compression.
(0.00000113487243652314 terabytes)
A 3 terabyte hard drive starts at $56 retail.
3. Any pauses, hold time, any blank spaces in conversation I imagine use no or almost no space at all in a FFT based compression system, because only the actual harmonics need to be stored, nothing more. It's possible that low voices could use less data than higher children's or females voices.
4. The length of the average telephone call is now half what it once was: it's now 1.40 minutes--it was almost 4 minutes in 2003.
5. Calls to customer service, 800 numbers, and corporate data centers do not need to be stored- they're probably already recorded and stored anyway by the corporation.
What this all means in terms of actual storage needs I don't know because first of all I don't know how much space is saved by #3, and because it's obvious that the NSA would be using much more advanced compression tech than the cell providers need, but overall it's a lot less raw data than the numbers some people were throwing around yesterday.
With special techniques, the NSA could get this data to be much smaller.
Here's another possibility: the data may not be stored as audio at all, but run through SIRI like speech recognition software, discarded, and stored as text, either selectively based on keywords, or all of it. That certainly seems like the most usable way to keep and sort through this data.