Posts Tagged ethics

Is Making Public Data “More Public” a Privacy Violation?

What on earth does more public mean? Technologists draw a simple distinction between data that is public and data that is not. Under this view, the notion of making data more public is meaningless. But common sense tells us otherwise: it’s hard to explain the opposition to public surveillance if you assume that it’s OK to collect, store and use “public” information indiscriminately.

There are entire philosophical theories devoted to understanding what one can and cannot do with public data in different contexts. Recently, danah boyd argued in her SXSW keynote in support of “privacy through obscurity” and how technology is destroying this comfort. According to boyd, most public data is “quasi-public” and technologists don’t have the right to “publicize” it.

Some examples. One can debate the point in the abstract, but there is no question that companies and individuals have repeatedly been bitten when applying the “it’s already public” rule. Let’s look at some examples (the list and the discussion is largely concerned with data on the web).

  1. The availability of the California Birth Index on the web caused considerable consternation about a decade ago, despite the fact that birth records in the state are public and anyone’s birth record can be obtained through official channels albeit in a cumbersome manner.
  2. IRSeek planned to launch a search engine for IRC in 2007 by monitoring and indexing public channels (chatrooms). There was a predictable privacy outcry and they were forced to shut down.
  3. The Infochimps guys crawled the Twitter graph back in 2008 and posted it on their site. Twitter forced them to take the dataset down.
  4. The story was repeated with Pete Warden and Facebook; this time it was nastier and involved the threat of a lawsuit.
  5. MySpace recently started selling user data in bulk on Infochimps. As MySpace has pointed out, the data is already public, but privacy concerns have nevertheless been raised.
  6. One reason for the backlash against Google Buzz was auto-connect: it connected your activity on Google Reader and other services and streamed it to your friends. Your Google Reader activities were already public, but Buzz took it further by broadcasting it.
  7. Spokeo is facing similar criticism. As Snopes explains, “Spokeo displays listings that sometimes contain more personal information than many people are comfortable having made publicly accessible through a single, easy-to-use search site.”

The latter four examples are all from the last couple of months. For some reason the issue has suddenly started cropping up all the time. The current situation is bad for everyone: data trustees and data analysts have no clear guidelines in place, and users/consumers are in a position of constantly having to fight back against a loss of privacy. We need to figure out some ground rules to decide what uses of public data on the web are acceptable.

Why not “none?” I don’t agree with a blanket argument against using data for purposes other than originally intended, for many reasons. The first is that users’ privacy expectations, when they go beyond the public/private dichotomy, are generally poorly articulated, frequently unreasonable and occasionally self-contradictory. (An unfortunate but inevitable consequence of the complexity of technology.) The second reason is that these complex privacy rules, even if they can be figured out, often need to be communicated to the machine.

The third reason is the “greater good.” I’ve opposed that line of reasoning when used to justify reneging on an explicit privacy promise. But when it comes to a promise that was never actually made but merely intuitively understood (or mis-understood) by users, I think the question is different, and my stance is softer. Privacy needs to be weighed against the benefit to society from “publicizing” data — disseminating, aggregating and analyzing it.

In the next article of this series, I will give a rigorous technical characterization of what constitutes publicizing data. My hope is that this will go a long way towards determining what is and is not a violation of privacy. In the meanwhile, I look forward to hearing different opinions.

Thanks to Pete Warden and Vimal Jeyakumar for comments on a draft.

To stay on top of future posts, subscribe to the RSS feed or follow me on Twitter.

13 comments April 5, 2010

Is Anonymity Research Ethical?

A researcher who is working on writing style analysis (“stylometry”), after reading my post on related de-anonymization techniques, wonders what the positive impact of such research could be, given my statement that the malicious uses of the technology are far greater than the beneficial ones. He says:

Sometimes when I’m thinking of an interesting research topic it’s hard to forget the Patton Oswalt line “Hey, we made cancer airborne and contagious! You’re welcome! We’re science: we’re all about coulda, not shoulda.”

This was my answer:

To me, generic research on algorithms always has a positive impact (if you’re breaking a specific website or system, that’s a different story; a bioweapon is a whole different category.) I do not recognize a moral question here, and therefore it does not affect what I choose to work on.

My belief that the research will have a positive impact is not at odds with my belief that the uses of the technology are predominantly evil.  In fact, the two are positively correlated. If we’re talking about web search technology, if academics don’t invent it, then (benevolent) companies will. But if we’re talking about de-anonymization technology, if we don’t do it, then malevolent entities will invent it (if they haven’t already), and of course, keep it to themselves. It comes down to a choice between a world where everyone has access to de-anonymization techniques, and hopefully defenses against it, versus one in which only the bad guys do. I think it’s pretty clear which world most people will choose to live in.

I realize I lean toward the “coulda” side of the question of whether Science is—or should be—amoral. Someone like Prof. Benjamin Kuipers here at UT seems to be close to the other end of the spectrum: he won’t take any DARPA money.

Part of the problem with allowing morality to affect the direction of science is that it is often arbitrary. The Patton Oswalt quote above is a perfect example: he apparently said that in response to news of science enabling a 63 year old woman to give birth. The notion that something is wrong simply because it is not “natural” is one that I find most repugnant. If the freedom of a 63 year old woman to give birth is not an important issue to you, let me note that more serious issues such as stem cell research, that could save lives, fall under the same category.

Going back to anonymity, it is interesting that tools like Tor face much criticism, but for enabling the anonymity of “bad” people rather than breaking the anonymity of “good” people. Who is to be the arbiter of the line between good and bad? I share the opinion of most techies that Tor is a wonderful thing for the world to have.

There are many sides to this issue and many possible views. I’d love to hear your thoughts.

8 comments April 9, 2009


Me, elsewhere

Get notified

Be notified when there's a new post — subscribe to the feed, follow me on twitter or use the email subscription box below.

Enter your email address to subscribe to this blog and receive notifications of new posts by email.

Recent comments

Tags

a-rod academia aggregation algorithm algorithms anonymity author recognition censorship conference data de-anonymization DNA DNA profiling eccentricity entropy ethics facebook forensics free speech FTC genome Google google buzz Google docs graph isomorphism history stealing Internet k-anonymity law lending club livejournal location meta netflix privacy privacy by design privacy policy re-identification social network analysis social networks stylometry theory ubercookies web browsers web security