I think I’ll let others tell the story for me …
September 25:
In collaboration with Harvard sociology graduate students Kevin Lewis and Marco Gonzalez, and with UCLA professor Andreas Wimmer and Harvard professor Nicholas Christakis, Berkman Fellow Jason Kaufman has made available a first wave of Facebook.com data through the Dataverse Network Project.
The dataset comprises machine-readable files of virtually all the information posted on approximately 1,700 FB profiles by an entire cohort of students at an anonymous, northeastern American university.
— Tastes, Ties, and Time: Facebook data release, Berkman Center for Internet and Society, Harvard University
September 29:
The “non-identifiability†of such a dataset is up for debate…. According to the authors, the collection of the dataset was approved by the IRB, Facebook and the individual college. The dissemination of the dataset appears to be approved by the IRB.
— Facebook Datasets and Private Chrome, Fred Stutzman, Unit Structures
September 30:
Of course, this sounds like an AOL-search-data-release-style privacy disaster waiting to happen.
— On the “Anonymity†of the Facebook Dataset, Michael Zimmer, michaelzimmer.org
October 2:
I think it’s hard to imagine that some of this anonymity wouldn’t be breached with some of the participants in the sample. For one thing, some nationalities are only represented by one person.
— Eszter Hargittai, in a comment on Unit Structures
We did not consult w/ privacy experts on how to do this, but we did think long and hard about what and how this should be done.
— Jason Kauffman, as a comment on michaelzimmer.org
OK, OK, I’ve held my tongue long enough. The arrogant attitude of “we’re smart and we thought about it so we didn’t bother to ask the experts” is a well-known recipe for disaster in privacy (or security or software engineerig or …). People like Cynthia Dwork of Microsoft Research and Latanya Sweeney of Carnegie-Mellon University have been studying data anonymization and reidentification for years; this stuff is hard.  How can the Berkman Center not know that? And how can Facebook and Harvard be so cavalier as to share data with a research team with an attitude like this?
October 3:
Well, I’m pretty sure this “anonymous, northeastern American university†is Harvard College. And I didn’t even have to download the dataset to figure it out. Here’s how.
— More On the “Anonymity†of the Facebook Dataset – It’s Harvard College, Michael Zimmer, michaelzimmer.org
See, I told you this stuff is hard.
October 7:
In the comments, Jason Kaufman implies that the data really isn’t that private, asking what could go wrong, and why would someone post it to Facebook expecting it to remain private.
I have just one question on all of this. If the data isn’t private, why did they attempt to anonymize it?
I believe they attempted to anonymize it because it’s fairly obvious that the data is private, and releasing it with names obviously attached would be pretty shocking.
— Researchers Two-Faced on the Facebook Data Release, Adam Shostack, Emergent Chaos
Yeah, really.
The original research mission (to collect and analyze a set with proper safeguards) was within bounds; the follow-up distribution is the element that clearly poses risk.
— Facebook Dataset Identified, Fred Stuzman, Unit Structures
Well, except it turns out that the original research mission also clearly posed risk: for example, the proper safeguard might not be in place. Did the IRB (Institutional Review Board) look at this? Did Facebook and Harvard?
Fred goes on to make the excellent point that the researchers should have convened a panel to discuss before releasing the information, and suggests as a potential takeaway “Research that pushes the boundaries of technology and privacy provide IRB’s with unique challenges.” True enough, and his post and the comments — along with all the other ones I’ve linked to — are well worth reading.
But it seems to me that this is letting the Berkman Center, Facebook, and Harvard off the hook a little too easily. They just put information about 1700 students, at least some of whom (and probably most) are likely to be identifiable, up on the internet … without even asking their permission.
It’s late at night and so maybe I’m feeling irritable but I find myself asking questions like: In what universe is this supposed to be okay?
The Berkman Center’s mission is to explore and understand cyberspace; to study its development, dynamics, norms, and standards; and to assess the need or lack thereof for laws and sanctions.
The Berkman Center recently hosted a conference and gala on The Future of the Internet. People look to them as authorities. Is this the future they want to create?
As far as I know, none of the Berkman Center faculty have weighed in on this yet. It’ll be interesting to hear what Yochai Benkler, William Fisher, Charles Nesson, John Palfrey, Jonathan Zittrain, John Deighton, Jack Goldsmith, Alexander Keysser, Charles Ogletree and Stuart Scheiber have to say about what this episode says about the “need or lack thereof for laws and sanctions.”
And in terms of understanding, given the potential for gender-, race- and culture-based differences in attitudes towards privacy, I’m also looking forward to what they — and others — think about how events might have been influenced by the Berkman Center’s, and the research team’s, diversity.  Or lack thereof.
jon
Facebook graphic from AJC1’s flickr site, licensed under Creative Commons
Michael Zimmer | 09-Oct-08 at 9:39 am | Permalink
To be fair, while the Berkman Center appears to be the current institutional “home” for this research project, my understanding is that they had little (if anything) to do with the research design or the structure of the data release. As recent as June 2008, the PI for the project (Jason Kaufman) gave a presentation about the research at Berkman, apparently hoping that Berkman might agree to take him into their fold.
Yet, as you suggest, the folks at Berkman, assuming they were properly informed, should have seen some red flags pop up.
I know many at the Berkman Center well, I have the utmost respect for that institution, and I’m confident they are trying to find ways to deal with the issues we have raised. Perhaps we’ll hear something from them soon….
Liminal states » Petitions are soooooo 20th century: Obama supporters AGAINST Larry Summers (DRAFT) | 08-Nov-08 at 7:50 pm | Permalink
[…] very sympathetic with, such as Facebook’s creepy and Orwellian vibe and horrible privacy practices. For that matter, a lot of people just plain prefer email. So petitions are a valuable […]
Liminal states » Open for Questions at change.gov: What about privacy? | 14-Dec-08 at 10:27 am | Permalink
[…] giving everybody involved the benefit of the doubt that this information can’t be disaggregated and used to identify individuals, I don’t much care for our government giving Google control over which third parties it […]
Liminal states » Facebook: all your content are belong to us. FOREVER! Protests ensue. | 16-Feb-09 at 11:55 am | Permalink
[…] notification or discussion and ignore feedback fits in with their overall pattern (1, 2, 3, 4, 5, 6 …). Presumably other commercial social networks are taking notice of the opportunities […]
jon | 12-Jul-11 at 5:31 pm | Permalink
The Chronicle of Higher Education looks at the incident in Harvard Researchers Accused of Breaching Students’ Privacy, and has some great quotes from Jason Kaufman, including
Kaufman, by the way, is still a Berkman Fellow. Looks like his hopes that this research would take them into their fold were realized. And the Berkman Center faculty is still all-male.