De-anonymizing Social Networks

March 19, 2009 at 11:09 am 18 comments

Our social networks paper is finally officially out! It will be appearing at this year’s IEEE S&P (Oakland).

Download: PDF | PS | HTML

Please read the FAQ about the paper.

Abstract:

Operators of online social networks are increasingly sharing potentially sensitive information about users and their relationships with advertisers, application developers, and data-mining researchers. Privacy is typically protected by anonymization, i.e., removing names, addresses, etc.

We present a framework for analyzing privacy and anonymity in social networks and develop a new re-identification algorithm targeting anonymized social-network graphs. To demonstrate its effectiveness on real-world networks, we show that a third of the users who can be verified to have accounts on both Twitter, a popular microblogging service, and Flickr, an online photo-sharing site, can be re-identified in the anonymous Twitter graph with only a 12% error rate.

Our de-anonymization algorithm is based purely on the network topology, does not require creation of a large number of dummy “sybil” nodes, is robust to noise and all existing defenses, and works even when the overlap between the target network and the adversary’s auxiliary information is small.

The HTML version was produced using  my Project Luther software, which in my opinion produces much prettier output than anything else (especially math formulas). Another big benefit is the handling of citations: it automatically searches various bibliographic databases and adds abstract/bibtex/download links and even finds and adds links to author homepages in the bib entries.

I have never formally announced or released Luther; it needs more work before it can be generally usable, and my time is limited. Drop me a line if you’re interested in using it.

Entry filed under: Uncategorized. Tags: , , , .

Anonymous Data Collection: Lessons from the A-Rod Affair Is Anonymity Research Ethical?

18 Comments Add your own

  • 1. Researchers can ID anonymous Twitterers  |  March 27, 2009 at 11:21 am

    [...] look at the way anonymous data can be analyzed and have come to some troubling conclusions. In a paper set to be delivered at an upcoming security conference, they showed how they were able to map out [...]

    Reply
  • [...] their paper “De-Anonymising Social Networks“, Arvind Narayanan and Dr Vitaly Shmatikov from the University Of Texas at Austin present a [...]

    Reply
  • 3. riot  |  March 27, 2009 at 2:24 pm

    Somehow i’m not that much impressed. I could do that with multiple jabber accounts 3 years ago.

    Well you didn’t patent it, so i still have to say: nice work ;)

    Reply
    • 4. Arvind  |  March 27, 2009 at 2:28 pm

      I suspect you haven’t read the paper :-)

      Reply
  • 5. riot  |  March 27, 2009 at 2:42 pm

    Wee, i came here from /. so what did you expect? ;)

    Nah, i read parts of it and it seems quite similar. Yet you’re right, it does something different.. but imho the basic concepts are really quite similar. Gonna read the full paper when i have time this evening.
    Are you going to hold a lecture at some european conference? Like 26c3 or HAR2009 or something?

    Reply
  • 6. Arvind  |  March 27, 2009 at 2:50 pm

    The underlying concept is relatively simple, but the hard part was to pull it off at scale, in a fully automated way, with very noisy data.

    Re. conferences, that all depends on visas and funding :-)

    Reply
  • [...] look at the way anonymous data can be analyzed and have come to some troubling conclusions. In a paper set to be delivered at an upcoming security conference, they showed how they were able to map out [...]

    Reply
  • [...] a real person. It looks like there’s now some research to support that. Steven Hoy points us to a new paper where some researchers wrote an algorithm that takes anonymized data from social networks and [...]

    Reply
  • [...] a real person. It looks like there’s now some research to support that. Steven Hoy points us to a new paper where some researchers wrote an algorithm that takes anonymized data from social networks and [...]

    Reply
  • [...] person. It looks like there’s now some research to support that. Steven Hoy points us to a new paper where some researchers wrote an algorithm that takes anonymized data from social networks and [...]

    Reply
  • 11. staycek  |  March 30, 2009 at 12:47 pm

    I enjoyed your paper, I’m grateful for your research, and I think it is important to raise awareness that anonymity /= privacy; however I personally do not find this shocking nor do I perceive any personal threat.

    I realize the threats you cited are all plausible scenarios for large-scale attacks, but I fail to see the personal threat. If my birthday, gender or relationship status were accidentally shared with strangers, I would not care in the slightest. If it was truly personal, I would never have published it on Facebook, even if I trusted their privacy policy.

    Reply
  • [...] They’re actually looking at something that is an issue that is a lot more delicate: are anonymised data from social networks truly anonymous? Operators of online social networks are increasingly sharing potentially sensitive information [...]

    Reply
  • [...] Narayanan and Dr Vitaly Shmatikov (University of Texas at Austin) have a fascinating new paper on the impact of social networks on the anonymisation of personal data (thanks, Mo!): Operators of [...]

    Reply
  • 14. Socal Networks in the News  |  March 30, 2009 at 3:56 pm

    [...] De-anonymizing Social Networks – Arvind Narayanan & Vitaly Shmatikov (http://33bits.org/2009/03/19/de-anonymizing-social-networks/) [...]

    Reply
  • [...] trail of the user/vehicle unknown even to the service provider — unlike in the context of social networks, people often don’t even trust the service provider. There are several papers on anonymizing [...]

    Reply
  • 16. Graduation and plans « 33 Bits of Entropy  |  May 20, 2009 at 6:36 am

    [...] presented the social network de-anonymization paper at the S&P conference today at Oakland. Email me for the [...]

    Reply
  • [...] on my work on de-anonymizing social networks with Shmatikov, and other research such as Bonneau & Preibusch’s survey of the dismal [...]

    Reply
  • 18. Anonymous Data? « Virtual Shadows  |  January 25, 2010 at 9:22 am

    [...] logs of over a half million of their users (here) and in 2009 by researchers in social networks (here). Stripping personal identifiable information such as usernames from data sets is an insufficient [...]

    Reply

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Trackback this post  |  Subscribe to the comments via RSS Feed


About 33bits.org

I'm an assistant professor of computer science at Princeton. I research (and teach) information privacy and security, and moonlight in technology policy.

This is a blog about my research on breaking data anonymization, and more broadly about information privacy, law and policy.

For an explanation of the blog title and more info, see the About page.

Subscribe

Be notified when there's a new post — subscribe to the feed, follow me on Google+ or twitter or use the email subscription box below.

Enter your email address to subscribe to this blog and receive notifications of new posts by email.

Join 211 other followers


Follow

Get every new post delivered to your Inbox.

Join 211 other followers