Skip to main content

View Diary: New Guardian Report: NSA Can Use XKeyscore to Track All of Your Internet Activity (278 comments)

Comment Preferences

  •  20 TB is a LOT of data (7+ / 0-)

    It's not just storing it, it's being able to address/access it.  You'd get I/O bound before you'd run out of storage.

    I'm chuckling and remembering the days when you had to partition drives over 640K and 20 MEGAbytes was a big drive.

    I'll believe corporations are people when one comes home from Afghanistan in a body bag.

    by mojo11 on Wed Jul 31, 2013 at 12:23:58 PM PDT

    [ Parent ]

    •  Things have changed (7+ / 0-)

      A lot.  Witness google maps (for example).  And algorithms that exist and are still being improved in ways you probably can't imagine.

      They have the horse-power now.  Both hardware and advanced software.

      Republicans: Taking the country back ... to the 19th century

      by yet another liberal on Wed Jul 31, 2013 at 12:26:44 PM PDT

      [ Parent ]

      •  Storage was the last hurdle, IMO. (3+ / 0-)
        Recommended by:
        gulfgal98, nchristine, Dumbo

        And that's a done deal, like I said, I got 10TB on my desk right now. They are 8"x6"x6" in total.

        Are we really supposed to believe that the NSA is collecting 2x the amount of storage that I have sitting in a breadbox on my desk?

        What a friggin' joke.

        Democracy - 1 person 1 vote. Free Markets - More dollars more power.

        by k9disc on Wed Jul 31, 2013 at 03:52:10 PM PDT

        [ Parent ]

    •  Not really a problem if they've got their data (8+ / 0-)

      set up the right way. SQL databases push terabytes around before breakfast just to show off.

      The trick is parallelizing the data streams. If you try to scan the data sequentially, sure, you'd choke and die. But nobody in "Big Data" does that anymore.

      Spite is the ranch dressing Republicans slather on their salad of racism

      by ontheleftcoast on Wed Jul 31, 2013 at 12:27:18 PM PDT

      [ Parent ]

      •  I was just thinking of addressing (2+ / 0-)
        Recommended by:
        Hey338Too, duhban

        the storage media itself.  And we're not merely talking about Terabytes here.  He's talking about adding 20 terabytes a DAY.  no matter how advanced or expensive, sooner or later something's gotta give.

        I'll believe corporations are people when one comes home from Afghanistan in a body bag.

        by mojo11 on Wed Jul 31, 2013 at 12:40:20 PM PDT

        [ Parent ]

        •  See the comment upstream about the data storage (2+ / 0-)
          Recommended by:
          JesseCW, nchristine

          they're planning in Utah. The 20T wouldn't be a problem there and the local office could easily pack up a couple of 10T devices every day and send them to Utah. It's hilarious but the highest bandwidth we have for transferring large amounts of data is a cardboard box in the belly of a 747.

          In 1990, yes, 20T was a mind boggling amount of data. in 2013 it's barely a blip. Moore's Law didn't just apply to CPU performance but to lots of other things as well.

          Spite is the ranch dressing Republicans slather on their salad of racism

          by ontheleftcoast on Wed Jul 31, 2013 at 12:48:28 PM PDT

          [ Parent ]

        •  Here's another way to look at it. (4+ / 0-)

          There are 86,400 seconds in a day. Assuming that 20T is correct (and it appears to be a high number) that comes to 230Mb of data collection a second.

          Now, a PC with an SSD can easily push 500Mb/sec (I know, I've got 3 of them at home) and internally the system bus can handle 10-20 Gb/sec.

          So the data processing power required to handle their data at the upper end is well within the processing capabilities of a standard laptop computer.

          Split the data across a dozen or so devices, let them run some simple analysis of the incoming data to highlight potentially interesting data, and ship the indexed files to Utah by swapping out the hard drives once a week.

          Of course the NSA will probably cough up $20 million to some private contracting firm to handle the problem that could be done for far, far less. We all know the real winners in this harvesting of data -- big corporations with friends puppets in Congress.

          Spite is the ranch dressing Republicans slather on their salad of racism

          by ontheleftcoast on Wed Jul 31, 2013 at 01:06:39 PM PDT

          [ Parent ]

          •  Most likely they are still using mechanical (2+ / 0-)
            Recommended by:
            patbahn, duhban

            hard drives and not SSD as commercially available SSD drives generally top out at under 200GB while you can easily get 4 TB conventional drive.  On my system even with a 7200RPM WD Black drive it is limited to around 35MB/sec (megabytes, not megabits) and that is just a straight file copy.

            You have watched Faux News, now lose 2d10 SAN.

            by Throw The Bums Out on Wed Jul 31, 2013 at 01:46:35 PM PDT

            [ Parent ]

          •  So Markos should just host... (2+ / 0-)
            Recommended by:
            duhban, Reggid

            ... the DailyKos on his MacBook Pro and not worry so much about subscribers or advertisers.  Right?  Who needs server farms and load balancing?

            That 230mb/second has to be processed - this isn't like a vacuum cleaner where all of the stuff goes into the same bag.  According to the charts there are a number of "plugins" that are run to segregate the data.  Then it has to be checked for errors (a performance hit), parsed (a performance hit), added to some database (a performance hit), and indexed (a performance hit).  On top of that, while new data is being added, the old data is being queried (or purged or backed up or mirrored - more performance hits).  It wouldn't surprise me if the data collected at the stroke of midnight isn't available to analysts for many hours after it's been collected.

            Looking through the bent backed tulips, To see how the other half lives, Looking through a glass onion - John Lennon and Paul McCartney

            by Hey338Too on Wed Jul 31, 2013 at 02:18:17 PM PDT

            [ Parent ]

            •  No, the point I'm making is 20 trillion isn't that (3+ / 0-)
              Recommended by:
              patbahn, Hey338Too, nchristine

              big of a number in computing. Not anymore. Billions are the work of the moment, a trillion just takes a thousand moments. And splitting up data between inexpensive, relatively powerful, computers would be a very effective way to do the first pass on the data analysis. And many of these types of data search algorithms lend themselves to GPGPU solutions. I'm not saying they get the data processed in real time, they don't need to. But the shear volume of data isn't a daunting task to process and with some rather simple and inexpensive setups could easily be handled, initially processed, then shipped off to Utah (or wherever) for further analysis and possibly permanent storage.

              Spite is the ranch dressing Republicans slather on their salad of racism

              by ontheleftcoast on Wed Jul 31, 2013 at 03:00:45 PM PDT

              [ Parent ]

              •  I have to admit that I laughed... (3+ / 0-)
                Recommended by:
                nchristine, Reggid, ontheleftcoast

                ... to myself about the thought of the stuff getting "shipped to Utah" on the internet and having our government's computers "recapturing" the data and trying to process it again.

                Anyway, no need to worry about using GPU's and the like for this kind of stuff.  I read somewhere that the government has a couple of Cray's used for this process.  Knowing the company that I used to work for, Watson is probably on the case as well (hopefully it doesn't think it's still playing Jeopardy).

                Looking through the bent backed tulips, To see how the other half lives, Looking through a glass onion - John Lennon and Paul McCartney

                by Hey338Too on Wed Jul 31, 2013 at 05:33:10 PM PDT

                [ Parent ]

          •  Given these calculations ... (0+ / 0-)

            the 20T number is obviously wrong (unless it's 20T a second).

    •  Petaflop computing... (2+ / 0-)
      Recommended by:
      Claudius Bombarnac, nchristine

      I got 10 TB sitting on my desk right now. I can find a file in 5 seconds. I can find a word in 15.

      20 TB / day is a grain of sand on a California beach to billion dollar budgets.

      Yottabytes.

      Democracy - 1 person 1 vote. Free Markets - More dollars more power.

      by k9disc on Wed Jul 31, 2013 at 03:49:13 PM PDT

      [ Parent ]

Subscribe or Donate to support Daily Kos.

Click here for the mobile view of the site