The article quoted in that reclisted diary makes it sound so terrible:
Unfortunately, the code that powers Facebook still knows what you typed – even if you decide not to publish it. It turns out the things you explicitly choose not to share aren't entirely private.
Facebook calls these unposted thoughts "self-censorship", and insights into how it collects these non-posts can be found in a recent paper written by two Facebookers. Sauvik Das, a PhD student at Carnegie Mellon and summer software engineer intern at Facebook, and Adam Kramer, a Facebook data scientist, have put online an article presenting their study of the self-censorship behaviour collected from 5 million English-speaking Facebook users. It reveals a lot about how Facebook monitors our unshared thoughts and what it thinks about them...
Facebook is keeping track of the things you don't even publish
, those dark thoughts from deep inside the human id that you regret so much that you never even finished them before consigning them to what you thought was digital oblivion. Facebook keeps track of every last character of it for God only knows how long, promiscuously handing it over to academics to practice repeated and wanton analysis on. It's all just so dystopic!
Except it isn't, because none of it is true. None of it. Every word is false.
This "story" originated with an academic paper written by Sauvik Das of Carnegie Mellon University and Adam Kramer of Facebook and published at the International Conference on Weblogs and Social Media. After tangling with people for a few hours over in that other diary, I had the silly idea of actually going and reading the paper for myself. Now I wish I'd read it a lot earlier.
Contrary to the hysteria about it, it's actually a pretty interesting paper about self-censorship behavior--when and how often people discard half-written posts, and how that behavior relates to demographics and the social graph. Like all good research papers, it includes some useful information about the study's methodology, and it is here that we shall begin.
This research was conducted at Facebook by Facebook researchers. We collected self-censorship data from a random sample of approximately 5 million English-speaking Facebook users who lived in the U.S. or U.K. over the course of 17 days (July 6-22, 2012).
Five million users, my goodness! Isn't that awful? Well, read on. The researchers explain that they considered two userface elements on the Facebook website, the composer (used for entering status updates and sharing links, pictures, and similar things) and the comment window (used for responding to other content). These two UI elements should be familiar to anyone who's used Facebook. The researchers then explain how they went about collecting data on abandoned messages:
To mitigate noise in our data, content was tracked only if at least five characters were entered into the composer or comment box. Content was then marked as “censored” if it was not shared within the subsequent ten minutes; using this threshold allowed us to record only the presence or absence of text entered, not the keystrokes or content. If content entered were to go unposted for ten minutes and then be posted, we argue that it was indeed censored (albeit temporarily).
Well now, that sure does sound pretty bad, what with Facebook paying attention to me writing stuff before I even send it anywhere, right? But now we get to the most important part:
These analyses were conducted in an anonymous manner, so researchers were not privy to any specific user’s activity. Furthermore, all instrumentation was done on the client side. In other words, the content of self-censored posts and comments was not sent back to Facebook’s servers: Only a binary value that content was entered at all.
(Emphasis mine.) So now we get to the true nature of what's actually happened here. And what's happened is that that reclisted diary and the alarmist article behind it just plain got everything about this story wrong.
[begin boring background information - you may skip this if you like]
If you've used Facebook much, you know that entering certain things in the composer or comment box can cause other things to happen, even while you're still typing your message. For example, if you enter a URL, the browser will visit the URL behind the scenes to get a summary of the page and display it below your message as you're typing it. Pretty cool, and utterly uncontroversial. Using a common web page debugging tool called Firebug, I can eavesdrop on this and any other traffic that gets sent back and forth between this page and any web servers as I type, or click around the page, or just sit and do nothing. As it turns out, a lot goes on behind the scenes when you use Facebook, with the browser making several requests to a number of different servers.
To a layperson this might look a little scary, but experienced web developers should recognize this as being entirely normal and expected behavior. While you sit and chuckle at the latest cat and baby pictures, the page is fetching advertisements, updating your user account data, downloading cookies, updating your newsfeed with new stories and friend activity, and performing any number of other boring maintenance tasks. The requests to akamaihd.net all pertain to advertising. The request that includes "scrape_url" appeared after I typed a URL in the composer; it's the thing that fetches a summary from the URL to display below my message. You may well find a lot of what Facebook does to be bothersome and invasive of privacy, and that's fine. The point is that this is how those things are done.
[end boring background information]
Reading the story, which originally appeared in Slate, I'm astonished at how wrong author Jennifer Golbeck gets this. Facebook does not "monitor our unshared thoughts." It does not "call these unposted thoughts 'self-censorship'" (that's a term used by the researchers for the purposes of their paper). It does not "collect the text you type" or "automatically analyze" your "unposted thoughts." There is no rational way to "connect this to all the recent news about NSA surveillance." God, I could keep going here but this diary is already way too long. There is not a single significant word or phrase in this story that is supported by the information provided. It is completely, categorically, profoundly, utterly wrong.
I assign a lot less blame to the Kossack who posted that other diary than I do to the author of the Slate story. It is unfortunate that our Kossack friend did not look into this more closely, but it's hardly fair to blame someone for taking an article from a reputable publication at face value. The lion's share of the blame goes to Jennifer Golbeck, for taking an innocent and entirely beneficial research paper and twisting it into something it is not.