An open letter to Netflix from the authors of the de-anonymization paper
Today is a sad day. It is also a day of hope.
It is a sad day because the second Netflix challenge had to be cancelled. We never thought it would come to this. One of us has publicly referred to the dampening of research as the “worst possible outcome” of privacy studies. As researchers, we are true believers in the power of data to benefit mankind.
We published the initial draft of our de-anonymization study just two weeks after the dataset for the first Netflix Prize became public. Since we had the math to back up our claims, we assumed that lessons would be learned, and that if there were to be a second data release, it would either involve only customers who opted in, or a privacy-preserving data analysis mechanism. That was three and a half years ago.
Instead, you brushed off our claims, calling them “absolutely without merit,” among other things. It has taken negative publicity and an FTC investigation to stop things from getting worse. Some may make the argument that even if the privacy of some of your customers is violated, the benefit to mankind outweighs it, but the “greater good” argument is a very dangerous one. And so here we are.
We were pleasantly surprised to read the plain, unobfuscated language in the blog post announcing the cancellation of the second contest. We hope that this signals a change in your outlook with respect to privacy. We are happy to see that you plan to “continue to explore ways to collaborate with the research community.”
Running something like the Netflix Prize competition without compromising privacy is a hard problem, and you need the help of privacy researchers to do it right. Fortunately, there has been a great deal of research on “differential privacy,” some of it specific to recommender systems. But there are practical challenges, and overcoming them will likely require setting up an online system for data analysis rather than an “anonymize and release” approach.
Data privacy researchers will be happy to work with you rather than against you. We believe that this can be a mutually beneficial collaboration. We need someone with actual data and an actual data-mining goal in order to validate our ideas. You will be able to move forward with the next competition, and just as importantly, it will enable you to become a leader in privacy-preserving data analysis. One potential outcome could be an enterprise-ready system which would be useful to any company or organization that outsources analysis of sensitive customer data.
It’s not often that a moral imperative aligns with business incentives. We hope that you will take advantage of this opportunity.
Arvind Narayanan and Vitaly Shmatikov
For background, see our paper and FAQ.