Berkman Center researcher publishes 1700 students’ Facebook data: “We did not consult w/ privacy experts on how to do this, but we did think long and hard ….”

facebook logoI think I’ll let others tell the story for me …

September 25:

In collaboration with Harvard sociology graduate students Kevin Lewis and Marco Gonzalez, and with UCLA professor Andreas Wimmer and Harvard professor Nicholas Christakis, Berkman Fellow Jason Kaufman has made available a first wave of data through the Dataverse Network Project.

The dataset comprises machine-readable files of virtually all the information posted on approximately 1,700 FB profiles by an entire cohort of students at an anonymous, northeastern American university.

Tastes, Ties, and Time: Facebook data release, Berkman Center for Internet and Society, Harvard University

September 29:

The “non-identifiability” of such a dataset is up for debate….  According to the authors, the collection of the dataset was approved by the IRB, Facebook and the individual college.  The dissemination of the dataset appears to be approved by the IRB.

Facebook Datasets and Private Chrome, Fred Stutzman, Unit Structures

September 30:

Of course, this sounds like an AOL-search-data-release-style privacy disaster waiting to happen.

On the “Anonymity” of the Facebook Dataset, Michael Zimmer,

October 2:

I think it’s hard to imagine that some of this anonymity wouldn’t be breached with some of the participants in the sample. For one thing, some nationalities are only represented by one person.

— Eszter Hargittai, in a comment on Unit Structures

We did not consult w/ privacy experts on how to do this, but we did think long and hard about what and how this should be done.

— Jason Kauffman, as a comment on

OK, OK, I’ve held my tongue long enough.  The arrogant attitude of “we’re smart and we thought about it so we didn’t bother to ask the experts” is a well-known recipe for disaster in privacy (or security or software engineerig or …).  People like Cynthia Dwork of Microsoft Research and Latanya Sweeney of Carnegie-Mellon University have been studying data anonymization and reidentification for years; this stuff is hard.   How can the Berkman Center not know that?  And how can Facebook and Harvard be so cavalier as to share data with a research team with an attitude like this?

October 3:

Well, I’m pretty sure this “anonymous, northeastern American university” is Harvard College. And I didn’t even have to download the dataset to figure it out. Here’s how.

More On the “Anonymity” of the Facebook Dataset – It’s Harvard College, Michael Zimmer,

See, I told you this stuff is hard.

October 7:

In the comments, Jason Kaufman implies that the data really isn’t that private, asking what could go wrong, and why would someone post it to Facebook expecting it to remain private.

I have just one question on all of this. If the data isn’t private, why did they attempt to anonymize it?

I believe they attempted to anonymize it because it’s fairly obvious that the data is private, and releasing it with names obviously attached would be pretty shocking.

Researchers Two-Faced on the Facebook Data Release, Adam Shostack, Emergent Chaos

Yeah, really.

The original research mission (to collect and analyze a set with proper safeguards) was within bounds; the follow-up distribution is the element that clearly poses risk.

Facebook Dataset Identified, Fred Stuzman, Unit Structures

Well, except it turns out that the original research mission also clearly posed risk: for example, the proper safeguard might not be in place.  Did the IRB (Institutional Review Board) look at this?  Did Facebook and Harvard?

Fred goes on to make the excellent point that the researchers should have convened a panel to discuss before releasing the information, and suggests as a potential takeaway “Research that pushes the boundaries of technology and privacy provide IRB’s with unique challenges.”  True enough, and his post and the comments — along with all the other ones I’ve linked  to — are well worth reading.

But it seems to me that this is letting the Berkman Center, Facebook, and Harvard off the hook a little too easily.  They just put information about 1700 students, at least some of whom (and probably most) are likely to be identifiable, up on the internet … without even asking their permission.

It’s late at night and so maybe I’m feeling irritable but I find myself asking questions like: In what universe is this supposed to be okay?

The Berkman Center’s mission is to explore and understand cyberspace; to study its development, dynamics, norms, and standards; and to assess the need or lack thereof for laws and sanctions.

— the Berkman Center’s mission statement

The Berkman Center recently hosted a conference and gala on The Future of the Internet.  People look to them as authorities.  Is this the future they want to create?

As far as I know, none of the Berkman Center faculty have weighed in on this yet.  It’ll be interesting to hear what Yochai Benkler, William Fisher, Charles Nesson, John Palfrey, Jonathan Zittrain, John Deighton, Jack Goldsmith, Alexander Keysser, Charles Ogletree and Stuart Scheiber have to say about what this episode says about the “need or lack thereof for laws and sanctions.”

And in terms of understanding, given the potential for gender-, race- and culture-based differences in attitudes towards privacy, I’m also looking forward to what they — and others — think about how events might have been influenced by the Berkman Center’s, and the research team’s, diversity.   Or lack thereof.


Facebook graphic from AJC1’s flickr site, licensed under Creative Commons