Posts tagged ‘Internet’

The Many Ways in Which the Internet Has Given Us More Privacy

There are many, many things that digital technology allows us to do more privately today than we ever could. Consider:

The ability of marginalized or oppressed individuals to leverage the privacy of online communication tools to unite in support of a cause, or simply to find each other, has been earth-shattering.

  • It has played a key role in the ongoing Middle East uprisings. The Internet helps primarily by enabling rapid communication and coordination, but being able to do it covertlyclumsy governmental hacking attempts notwithstanding—is an equally important aspect.
  • Clay Shirky tells the story of how some of’s most popular groups were (ir)religious communities that don’t find support in broader U.S. culture — Pagans, ex-Jehovah’s witnesses, atheists, etc.
  • STD-positive individuals can use online dating sites targeted at their group. Can you imagine the Sisyphean frustration of trying to date offline and find a compatible partner if you have an STD?

In the political realm, the anonymity afforded by Wikileaks is leading to a challenge to the legitimacy of high-level government actors, if not entire governments. Bitcoin is another anonymity technology that shows the potential to have serious political effects. [1]

Most of us benefit at an everyday level from improved privacy. When we read, search, or buy online, people around us don’t find out about it. This is vastly more private than checking out a book from a library or buying something at a store. [2]

We’ve benefited not only in our mundane activities, but our kinky ones as well. We take and exchange naked pictures all the time, never having been able to do so back when it involved getting it developed at the store. And slightly over half of us have taken advantage of the fact that “hiding one’s porn” is trivial today compared to the bad old days of magazines.

I could go on—I haven’t even mentioned the uses of Tor or encryption, freely available to anyone willing to invest a little effort—but I’ve made my point. Of course, I’ve only presented one half of the story. The other half, that technology is also allowing us to expose ourselves in ways never before, has been told so many times by so many people, and so loudly, that it is drowning out meaningful conversation about privacy.

Having presented the above evidence, I posit that technology by itself is actually largely neutral with respect to privacy, in that it enhances the privacy of some types of actions and encumbers that of others. Which direction society takes is up to us. In other words, I’m asserting the negation of technological determinism, applied to privacy.

While I do believe that privacy-infringing technologies have been adopted more pervasively than privacy-enhancing ones, I would say that the disparity is far smaller than it is generally thought to be. Why the mismatch in perception? A curious collective cognitive bias. Observe that almost every one of the examples above is generally seen as a new kind of activity enabled by technology whereas they are really examples of technology allowing us to do a familiar activity, but with more privacy (among other benefits).

Another reason for the cognitive bias is our tendency to focus on the dangers and the negatives of technology. Let’s go back do the nude pictures example: just about everyone does it, but only a small number—perhaps 1%?—suffer some harm from it. Like Schneier says, if it’s in the news, don’t worry about it.

To the extent that privacy-infringing technologies have been more successful, it’s a choice we’ve collectively made. Demand for social networking has been so strong that the sector has somehow invented a halfway workable business model, even though it took several tries to get there. But demand for encryption has been so weak that the market never matured enough to make it usable to the general public.

The disparity could be because we don’t know what’s good for us—volumes have been written about this—but it could also be partly because there are costs and benefits to giving up our privacy, and the benefits, in proportion to the costs, are rather higher than is generally made out to be.

Those are all questions worth pondering, but I hope I have convinced you of this: the idea that information technology inherently invades privacy is oversimplified and misleading. If we’re giving up privacy, we have only ourselves to blame.

[1] Many privacy-enhancing technologies are morally ambiguous. I’m merely listing the ways in which people benefit from privacy, regardless of whether they’re using it for good or evil.

[2] It is probably true that the Internet has made it easier for government, advertisers etc. to track your activities. But it doesn’t change the fact that there’s a privacy benefit to regular people in an everyday context, who are far more concerned about keeping secrets from their family, friends and neighbors than about abstract threats.

[ETA] This essay examines the role of consumers in shaping the direction of technology, whereas the next one looks at the role of creators.

Thanks to Ann Kilzer for comments on a draft.

To stay on top of future posts, subscribe to the RSS feed or follow me on Twitter.

June 8, 2011 at 6:52 pm Leave a comment

The Master Switch and the Centralization of the Internet

One of the most important trends in the recent evolution of the Internet has been the move towards centralization and closed platforms. I’m interested in this question in the context of social networks—analyzing why no decentralized social network has yet taken off, whether one ever will, and whether a decentralized social network is important for society and freedom. With this in mind, I read Tim Wu’s ‘The Master Switch: The Rise and Fall of Information Empires,’ a powerful book that will influence policy debates for some time to come. My review follows.

‘The Master Switch’ has two parts. The former discusses the history of communications media through the twentieth century and shows evidence for “The Cycle” of open innovation → closed monopoly → disruption. The latter, shorter part is more speculative and argues that the same fate will befall the Internet, absent aggressive intervention.

The first part of the book is unequivocally excellent. There are so many grand as well as little historical facts buried in there. Wu makes his case well for the claim that radio, telephony, film and television have all taken much the same path.

A point that Wu drives home repeatedly is that while free speech in law is always spoken of in the context of Governmental controls, the private entities that own or control the medium of speech play a far bigger role in practice in determining how much freedom of speech society has. In the U.S., we are used to regulating Governmental barriers to speech but not private ones, and a lot of the book is about exposing the problems with this approach.

An interesting angle the author takes is to look at the motives of the key men that shaped the “information industries” of the past. This is apposite given the enormous impact on history that each of these few has had, and I felt it added a layer of understanding compared to a purely factual account.

But let’s cut to the chase—the argument about the future of the Internet. I wasn’t sure whether I agreed or disagreed until I realized Wu is making two different claims, a weak one and a strong one, and does not separate them clearly.

The weak claim is simply that an open Internet is better for society in the long run than a closed one. Open and closed here are best understood via the exemplars of Google and Apple. Wu argues this reasonably well, and in any case not much argument is needed—most of us would consider it obvious on the face of it.

The strong claim, and the one that is used to justify intervention, is that a closed Internet will have such crippling effects on innovation and such chilling effects on free speech that it is our collective duty to learn from history and do something before the dystopian future materializes. This is where I think Wu’s argument falls short.

To begin with, Wu doesn’t have a clear reason why the Internet will follow the previous technologies, except, almost literally, “we can’t be sure it won’t.” He overstates the similarities and downplays the differences.

Second, I believe Wu doesn’t fully understand technology and the Internet in some key ways. Bizarrely, he appears to believe that the Internet’s predilection for decentralization is due to our cultural values rather than technological and business realities prevalent when these systems were designed.

Finally, Wu has a tendency to see things in black and white, in terms of good and evil, which I find annoying, and more importantly, oversimplified. He quotes this sentence approvingly: “Once we replace the personal computer with a closed-platform device such as the iPad, we replace freedom, choice and the free market with oppression, censorship and monopoly.” He also says that “no one denies that the future will be decided by one of two visions,” in the context of iOS and Android. It isn’t clear why he thinks they can’t coexist the way the Mac and PC have.

Regardless of whether one buys his dystopian prognostications, Wu’s paradigm of the “separations principle” is to be taken seriously. It is far broader than even net neutrality. There appear to be two key pillars: a separation of platforms and content, and limits on corporate structures to faciliate this—mainly vertical, but also horizontal, such as in the case of media conglomerates.

Interestingly, Wu wants the separations principle to be more of a societal-corporate norm than Governmental regulation. That said, he does call for more powers to the FCC, which is odd given that he is clear on the role that State actors have played in the past in enabling and condoning monopoly abuse:

Again and again in the histories I have recounted, the state has shown itself an inferior arbiter of what is good for the information industries. The federal government’s role in radio and television from the 1920s to the 1960s, for instance, was nothing short of a disgrace. In the service of chain broadcasting, it wrecked a vibrant, decentralized AM marketplace. At the behest of the ascendant radio industry, it blocked the arrival and prospects of FM radio, and then it put the brakes on television, reserving it for the NBC-CBS duopoly. Finally, from the 1950s through the 1960s, it did everything in its power to prevent cable television from challenging the primacy of the networks.

To his credit, Wu does seem to be aware of the contradiction, and appears to argue that the Government agencies can learn and change. It does seem like a stretch, however.

In summary, Wu deserves major kudos both for the historical treatment and for some very astute insights about the Internet. For example, in the last 2-3 years, Apple, Facebook, and Twitter have all made dramatic moves toward centralization, control and closed platforms. Wu seems to have foreseen this general trend more clearly than most techies did.[1] The book does have drawbacks, and I don’t agree that the Internet will go the way of past monopolies without intervention. It should be very interesting to see what moves Wu will make now that he will be advising the FTC.

[1] While the book was published in late 2010, I assume that Wu’s ideas are much older.

To stay on top of future posts, subscribe to the RSS feed or follow me on Twitter.

March 23, 2011 at 7:51 pm Leave a comment

Women in Tech: How Anonymity Contributes to the Problem

Like Michael Arrington, I too have sat on the sidelines of the debate on women in tech. Unlike Michael Arrington, I did so because nobody asked for my opinion. There is, however, one aspect of the debate that I’m qualified to comment on.

The central issue seems to be whether the low participation rate of women in technology is due to a hostile environment in the tech industry (e.g., sexism, overt or covert) or due to external factors, whether genetic or social, that influence women to pick career paths other than technology without even giving it a shot.

Arrington thinks it’s the latter, and makes a strong case for his position. In response, many have pointed out various behaviors common in the tech industry that make it unappealing to women. Jessica B. Hamrick talks about rampant elitism which affects women disproportionately. What I’m more interested in today is Michelle Greer’s account of being viciously attacked for a relatively innocuous comment on Arrington’s post.

Let me come right out and say it: while I am a defender of the right to anonymous speech, I believe it has no place whatsoever in the vast majority of discussion forums. The reason is simple: there is something about anonymity that completely dismantles our evolved social norms and civility and makes us behave like apes. Not all of us, to be sure, but it only takes a few to ruin it for everyone. Or to put it in plainer terms:

There is no doubt that sexist comments online — the vast majority of them anonymous — contribute hugely to the problem of tech being a hostile environment for women. While there are rude comments directed at everyone, just look around if you need convincing that the ones that attack someone specifically for being female tend to be much more depraved. It is also true that rude behavior online is not limited to tech fields, but it creates more of a barrier there because online participation is essential for being relevant.

Here’s my suggestion to everyone who’d like to do something to make tech less hostile to women: perhaps the best return on your time that you can get is by making anonymous, unmoderated comments a thing of the past. Abolish it on your own sites, and write to other site admins and educate them about the importance of this issue. And when you see an uncivil comment, either educate or ignore the person, but try not to get enraged — you’d be feeding the troll.

Thanks to Ann Kilzer for reviewing a draft.

To stay on top of future posts, subscribe to the RSS feed or follow me on Twitter.

August 30, 2010 at 10:37 pm 9 comments

De-anonymizing the Internet

I’ve been thinking about this problem for quite a while: is it possible to de-anonymize text that is posted anonymously on the Internet by matching the writing style with other Web pages/posts where the authorship is known? I’ve discussed this with many privacy researchers but until recently never written anything down. When someone asked essentially the same question on Hacker News, I barfed up a stream of thought on the subject :-) Here it is, lightly edited.

Each one of us has a writing style that is idiosyncratic enough to have a unique “fingerprint”. However, it is an open question whether it can be efficiently extracted.

The basic idea for constructing a fingerprint is this. Consider two words that are nearly interchangeable, say ‘since’ and ‘because’. Different people use the two words in a differing proportion. By comparing the relative frequency of the two words, you get a little bit of information about a person, typically under 1 bit. But by putting together enough of these ‘markers’, you can construct a profile.

The beginning of modern, rigorous research in this field was by Mosteller and Wallace in 1964: they identified the author of the disputed Federalist papers, almost 200 years after they were written (note that there were only three possible candidates!). They got on the cover of TIME, apparently. Other “coups” for writing-style de-anonymization are the identification of the author of Primary Colors, as well as the unabomber (his brother recognized his style, it wasn’t done by statistical/computational means).

The current state of the art is summarized in this bibliography. Now, that list stops at 2005, but I’m assuming there haven’t been earth-shattering changes since then. I’m familiar with the results from those papers; the curious thing is that they stop at corpuses of a couple hundred authors or so — i.e, identifying one anonymous poster out of say 200, rather than a million. This is probably because they had different applications in mind, such as identification within a company, instead of Internet-scale de-anonymization. Note that the amount of information you need is always logarithmic in the potential number of authors, and so if you can do 200 authors you can almost definitely push it to a few tens of thousands of authors.

The other interesting thing is that the papers are fixated with ‘topic-free’ identification, where the texts aren’t about a particular topic, making the problem harder. The good news is that when you’re doing this Internet-scale, nobody is stopping you from using topic information, making it a lot easier.

So my educated guess is that Internet-scale writing style de-anonymization is possible. However, you’d need fairly long texts, perhaps a page or two. It’s doubtful that anything can be done with a single average-length email.

Another potential de-anonymization strategy is to use typing pattern fingerprinting (keystroke dynamics), i.e, analyzing the timing between our keystrokes (yes, this works even for non-touch typists.) This is already used in commercial products as an additional factor in password authentication. However, the implications for de-anonymization have not been explored, and I think it’s very, very feasible. i.e, if google were to insert javascript into gmail to fingerprint you when you were logged in, they could use the same javascript to identify you on any web page where you type in text even if you don’t identify yourself. Now think about the de-anonymization possibilities you can get by combining analysis of writing style and keystroke dynamics…

By the way, make no mistake: the malicious uses of this far overwhelm the benevolent uses. Once this technology becomes available, it will be very hard to post anonymously at all. Think of the consequences for political dissent or whistleblowers. The great firewall of China could simply insert a piece of javascript into every web page, and poof, there goes the anonymity of everyone in China.

It think it’s likely that one can build a tool to protect anonymity by taking a chunk of writing and removing your fingerprint from it, but it will need a lot of work, and will probably lead to a cat-and-mouse game between improved de-anonymization and obfuscation techniques. Note the caveats, however: most ordinary people will not have the foreknowledge to find and use such a tool. Second, think of all the compromising posts — rants about employers, accounts from cheating spouses, political dissent, etc. — that have already been written. The day will come when some kid will download a script, let a crawler loose on the web, and post the de-anonymized results for all to see. There will be interesting consequences.

If you’re interested in working on this problem–either writing style analysis for breaking anonymity or obfuscation techniques for protecting anonymity–drop me a line.

January 15, 2009 at 3:16 am 21 comments


I'm an assistant professor of computer science at Princeton. I research (and teach) information privacy and security, and moonlight in technology policy.

This is a blog about my research on breaking data anonymization, and more broadly about information privacy, law and policy.

For an explanation of the blog title and more info, see the About page.


Be notified when there's a new post — subscribe to the feed, follow me on Google+ or twitter or use the email subscription box below.

Enter your email address to subscribe to this blog and receive notifications of new posts by email.

Join 245 other followers