Posts tagged ‘anonymity’
Like Michael Arrington, I too have sat on the sidelines of the debate on women in tech. Unlike Michael Arrington, I did so because nobody asked for my opinion. There is, however, one aspect of the debate that I’m qualified to comment on.
The central issue seems to be whether the low participation rate of women in technology is due to a hostile environment in the tech industry (e.g., sexism, overt or covert) or due to external factors, whether genetic or social, that influence women to pick career paths other than technology without even giving it a shot.
Arrington thinks it’s the latter, and makes a strong case for his position. In response, many have pointed out various behaviors common in the tech industry that make it unappealing to women. Jessica B. Hamrick talks about rampant elitism which affects women disproportionately. What I’m more interested in today is Michelle Greer’s account of being viciously attacked for a relatively innocuous comment on Arrington’s post.
Let me come right out and say it: while I am a defender of the right to anonymous speech, I believe it has no place whatsoever in the vast majority of discussion forums. The reason is simple: there is something about anonymity that completely dismantles our evolved social norms and civility and makes us behave like apes. Not all of us, to be sure, but it only takes a few to ruin it for everyone. Or to put it in plainer terms:
There is no doubt that sexist comments online — the vast majority of them anonymous — contribute hugely to the problem of tech being a hostile environment for women. While there are rude comments directed at everyone, just look around if you need convincing that the ones that attack someone specifically for being female tend to be much more depraved. It is also true that rude behavior online is not limited to tech fields, but it creates more of a barrier there because online participation is essential for being relevant.
Here’s my suggestion to everyone who’d like to do something to make tech less hostile to women: perhaps the best return on your time that you can get is by making anonymous, unmoderated comments a thing of the past. Abolish it on your own sites, and write to other site admins and educate them about the importance of this issue. And when you see an uncivil comment, either educate or ignore the person, but try not to get enraged — you’d be feeding the troll.
Thanks to Ann Kilzer for reviewing a draft.
Some people claim that re-identification attacks don’t matter, the reasoning being: “I’m not important enough for anyone to want to invest time on learning private facts about me.” At first sight that seems like a reasonable argument, at least in the context of the re-identification algorithms I have worked on, which require considerable human and machine effort to implement.
The argument is nonetheless fallacious, because re-identification typically doesn’t happen at the level of the individual. Rather, the investment of effort yields results over the entire database of millions of people (hence the emphasis on “large-scale” or “en masse”.) On the other hand, the harm that occurs from re-identification affects individuals. This asymmetry exists because the party interested in re-identifying you and the party carrying out the re-identification are not the same.
In today’s world, the entities most interested in acquiring and de-anonymizing large databases might be data aggregation companies like ChoicePoint that sell intelligence on individuals, whereas the party interested in using the re-identified information about you would be their clients/customers: law enforcement, an employer, an insurance company, or even a former friend out to slander you.
Data passes through multiple companies or entities before reaching its destination, making it hard to prove or even detect that it originated from a de-anonymized database. There are lots of companies known to sell “anonymized” customer data: for example Practice Fusion “subsidizes its free EMRs by selling de-identified data to insurance groups, clinical researchers and pharmaceutical companies.” On the other hand, companies carrying out data aggregation/de-anonymization are a lot more secretive about it.
Another piece of the puzzle is what happens when a company goes bankrupt. Decode genetics recently did, which is particularly interesting because they are sitting on a ton of genetic data. There are privacy assurances in place in their original Terms of Service with their customers, but will that bind the new owner of the assets? These are legal gray areas, and are frequently exploited by companies looking to acquire data.
At the recent FTC privacy roundtable, Scott Taylor of Hewlett Packard said his company regularly had the problem of not being able to determine where data is being shared downstream after the first point of contact. I’m sure the same is true of other companies as well. (How then could we possibly expect third-party oversight of this process?) Since data fuels the modern Web economy, I suspect that the process of moving data around will continue to become more common as well as more complex, with more steps in the chain. We could use a good name for it — “data laundering,” perhaps?
It is futile to try to stay anonymous by getting your name or data purged from the Internet, once it is already out there. Attempts at such censorship have backfired repeatedly and spectacularly, giving rise to the term Streisand effect. A recent lawsuit provides the latest demonstration: two convicted German killers (who have completed their prison sentences) are attempting to prevent Wikipedia from identifying them.
The law in Germany tries to “protect the name and likenesses of private persons from unwanted publicity.” Of course, the Wikimedia foundation is based in the United States, and this attempt runs head-on into the First Amendment, the right to Free Speech. European countries have a variety of restrictions on speech—Holocaust denial is illegal, for instance. But there is little doubt about how U.S. courts will see the issue; Jennifer Granick of the EFF has a nice write-up.
The aspect that interests me is that even if there weren’t a Free Speech issue, it would be utterly impossible for the court system to keep the names of these men from the Internet. I wonder if the German judge who awarded a judgment against the Wikimedia foundation was aware that it would achieve exactly the “unwanted publicity” that the law was intended to avoid. He would probably have ruled as he did in any case, but it is interesting to speculate.
Legislators, on the other hand, would do well to be aware of the limitations of censorship, and the need to update laws to reflect the rules of the information age. There are always alternatives, although they usually involve trade-offs. In this instance, perhaps one option is a state-supplied alternate identity, analogous to the Witness Protection Program?
Returning to the issue of enforceability, the European doctrine apparently falls under “rights of the personality,” specifically the “right to be forgotten,” according to this paper that discusses the trans-atlantic clash. I find the very name rather absurd; it reminds me of attempting not to think of an elephant (try it!)
The above paper, written from the European perspective, laments the irreconcilable differences between the two viewpoints on the issue of Free Speech vs. Privacy. However, there is no discussion of enforceability. The author does suspect, in the final paragraph, that the European doctrine will become rather meaningless due to the Internet, but he believes this to be purely a consequence of the fact that the U.S. courts have put Free Speech first.
I don’t buy it—even if the U.S. courts joined Europe in recognizing a “right to be forgotten,” it would still be essentially unenforceable. Copyright-based rather than privacy-based censorship attempts offer us a lesson here. Copyright law has international scope, due to being standardized by the WIPO, and yet the attempt to take down the AACS encryption key was pitifully unsuccessful.
Taking down a repeat offender (such as a torrent tracker) or a large file (the Windows 2000 source code leak) might be easier. But if we’re talking about a small piece of data, the only factor that seems to matter is the level of public interest in the sensitive information. The only times when censorship of individual facts has been (somewhat) successful in the face of public sentiment is within oppressive regimes with centralized Internet filters.
There are many laws, particularly privacy laws, that need to be revamped for the digital age. What might appear obvious to technologists might be much less apparent to law scholars, lawmakers and the courts. I’ve said it before on this blog, but it bears repeating: there is an acute need for greater interdisciplinary collaboration between technology and the law.
The State of Oklahoma just passed legislation requiring that detailed information about every abortion performed in the state be submitted to the State Department of Health. Reports based on this data are to be made publicly available. The controversy around the law gained steam rapidly after bloggers revealed that even though names and addresses of mothers obtaining abortions were not collected, the women could nevertheless be re-identified from the published data based on a variety of other required attributes such as the date of abortion, age and race, county, etc.
As a computer scientist studying re-identification, this was brought to my attention. I was as indignant on hearing about it as the next smug Californian, and I promptly wrote up a blog post analyzing the serious risk of re-identification based on the answers to the 37 questions that each mother must anonymously report. Just before posting it, however, I decided to give the text of the law a more careful reading, and realized that the bloggers have been misinterpreting the law all along.
While it is true that the law requires submitting a detailed form to the Department of Health, the only information that is made public are annual reports with statistical tallies of the number of abortions performed under very broad categories, which presents a negligible to non-existent re-identification risk.
I’m not defending the law; that is outside my sphere of competence. There do appear to be other serious problems with it, outlined in a lawsuit aimed at stopping the law from going into effect. The text of this complaint, as Paul Ohm notes, does not raise the “public posting” claim. Besides, the wording of the law is very ambiguous, and I can certainly see why it might have been misinterpreted.
But I do want to lament the fact that bloggers and special interest groups can start a controversy based on a careless (or less often, deliberate) misunderstanding, and have it amplified by an emerging category of news outlets like the Huffington post, which have the credibility of blogs but a readership approaching traditional media. At this point the outrage becomes self-sustaining, and the factual inaccuracies become impossible to combat. I’m reminded of the affair of the gay sheep.