Article about Netflix paper in law journal
David Molnar pointed me to an article in the Shidler Journal of Law that prominently cites the Netflix dataset de-anonymization paper. I’m very happy to see this; when we wrote our paper, we were hoping to see the legal community analyze the implications of our work for privacy laws. As the article notes:
Re-identification of anonymized data with individual consumers may expose companies to increased liability. If data is re-identified, this may be due to the failure of companies to take reasonable precautions to protect consumer data. In addition, companies may violate their own privacy policies by releasing anonymous information to third parties that can be easily re-identified with individual users.
New lines will need to be drawn defining what is acceptable data-release policy, and in a way that takes into account the actual re-identification risk instead of relying on syntactic crutches such as removing “personally identifiable” information. Perhaps there will need to be a constant process of evaluating and responding to continuing improvements in re-identification algorithms.
Perhaps the ability of third parties to discover information about an individual’s movie rankings is not too disturbing, as movie rankings are not generally considered to be sensitive information. But because these same techniques can lead to the re-identification of data, far greater privacy concerns are implicated.
Indeed, since we wrote our paper, there have been several high profile cases in the news or in the courts where our re-identification techniques can be used to cause much more sensitive privacy breaches, including the Google-Viacom lawsuit involving Youtube viewer logs and the targeted advertising companies Phorm and Nebuad. While the lessons of our paper have begun to propagate “downstream” to the realms of law, advocacy and policy, it has come too late to make a difference in the above examples.
Part of the reason why I started this blog is in the hope of accelerating this process by reaching out to people outside the computer science community. While our papers might be couched in technical language, the results of our research are general enough to be easily accessible to a broad audience, and I hope that this blog will become a central point for disseminating information more broadly.