Last October I gave a talk titled “What Happened to the Crypto Dream?” where I looked at why crypto seems to have done little for personal privacy. The reaction from the audience (physical and online) was quite encouraging — not that everyone agreed, but they seemed to find it thought provoking — and several people asked me if I’d turn it into a paper. So when Prof. Alessandro Acquisti invited me to contribute an essay to the “On the Horizon” column in IEEE S&P magazine, I jumped at the chance, and suggested this topic.
While I’m not saying anything earth shaking, I do make a somewhat nuanced argument — I distinguish between “crypto for security” and “crypto for privacy,” and further subdivide the latter into a spectrum between what I call “Cypherpunk Crypto” and “Pragmatic Crypto.” I identify different practical impediments that apply to those two flavors (in the latter case, a complex of related factors), and lay out a few avenues for action that can help privacy-enhancing crypto move in a direction more relevant to practice.
I’m aware that this is a contentious topic, especially since some people feel that the time is ripe for a resurgence of the cypherpunk vision. I’m happy to hear your reactions.
Last semester I taught a course on privacy technologies here at Princeton. Earlier I discussed how I refuted privacy myths that students brought into class. In this post I’d like to discuss the contents of the course. I hope it will be useful to other instructors who are interested in teaching this topic as well as for students undertaking self-study of privacy technologies. Beware: this post is quite long.
What should be taught in a class on privacy technologies? Before we answer that, let’s take a step back and ask, how does one go about figuring out what should be taught in any class?
I’ve seen two approaches. The traditional, default, overwhelmingly common approach is to think of it in terms of “covering content” without much consideration to what students are getting out of it. The content that’s deemed relevant is often determined by what the fashionable research areas happen to be, or historical accident, or some combination thereof.
A contrasting approach, promoted by authors like Bain, applies a laser focus on skills that students will acquire and how they will apply them later in life. On teaching orientation day at Princeton, our instructor, who clearly subscribed to this approach, had each professor describe what students would do in the class they are teaching, then wrote down only the verbs from these descriptions. The point was that our thinking had to be centered around skills that students would take home.
I prefer a middle ground. It should be apparent from my description of the traditional approach above that I’m not a fan. On the other hand, I have to wonder what skills our teaching coach would have suggested for a course on cosmology — avoiding falling into black holes? Alright, I’m exaggerating to make a point. The verbs in question are words like “synthesize” and “evaluate,” so there would be no particular difficulty in applying them to cosmology. But my point is that in a cosmology course, I’m not sure the instructor should start from these verbs.
Sometimes we want students to be exposed to knowledge primarily because it is beautiful, and being able to perceive that beauty inspires us, instills us with a love of further learning, and I dare say satisfies a fundamental need. To me a lot of the crypto “magic” that goes into privacy technologies falls into that category (not that it doesn’t have practical applications).
With that caveat, however, I agree with the emphasis on skills and life impact. I thought of my students primarily as developers of privacy technologies (and more generally, of technological systems that incorporate privacy considerations), but also as users and scholars of privacy technologies.
I organized the course into sections, a short introductory section followed by five sections that alternated in the level of math/technical depth. Every time we studied a technology, we also discussed its social/economic/political aspects. I had a great deal of discretion in guiding where the conversation around the papers went by giving them questions/prompts on the class Wiki. Let us now jump in. The italicized text is from the course page, the rest is my annotation.
Goals of this section: Why are we here? Who cares about privacy? What might the future look like?
- Dan Solove. Why Privacy Matters Even if You Have ‘Nothing to Hide’ (Chronicle)
- David Brin. The Transparent Society (WIRED, circa 1996, later expanded into a book)
In addition to helping flesh out the foundational assumptions of this course that I discussed in the previous post, pairing these opposing views with each other helped make the point that there are few absolutes in this class, that privacy scholars may disagree with each other, and that the instructor doesn’t necessarily agree with the viewpoints in the assigned reading, much less expects students to.
1. Cryptography: power and limitations
Goals. Travel back in time to the 80s and early 90s, understand the often-euphoric vision that many crypto pioneers and hobbyists had for the impact it would have. Understand how cryptographic building blocks were thought to be able to support this restructuring of society. Reason about why it didn’t happen.
Understand the motivations and mathematical underpinnings of the modern research on privacy-preserving computations. Experiment with various encryption tools, discover usability problems and other limitations of crypto.
- David Chaum. Security without Identification: Card Computers to make Big Brother Obsolete (1985)
- Steven Levy. Crypto Rebels (WIRED, 1993; later a 2001 book)
- Eric Hughes. A cypherpunk’s manifesto. (short essay, 1993.)
I think the Chaum paper is a phenomenal and underutilized resource for teaching. My goal was to really immerse students in an alternate reality where the legal underpinnings of commerce were replaced by cryptography, much as Chaum envisioned (and even going beyond that). I created a couple of e-commerce scenarios for Wiki discussion and had them reason about how various functions would be accomplished.
My own views on this topic are set forth in this talk (now a paper; coming soon). In general I aimed to shield students from my viewpoints, and saw my role as helping them discover (and be able to defend) their own. At least in this instance I succeeded. Some students took the position that the cypherpunk dream is just around the corner.
- The ‘Garbled Circuit Protocol’ (Yao’s theorem on secure two-party computation) and its implications (lecture)
This is one of the topics that sadly suffers from a lack of good expository material, so I instead lectured on it.
- Alma Whitten and Doug Tygar. Why Johnny Can’t Encrypt: A Usability Evaluation of PGP 5.0
- Nikita Borisov, Ian Goldberg, Eric Brewer. Off-the-Record Communication, or, Why Not To Use PGP
One of the exercises here was to install and use various crypto tools and rediscover the usability problems. The difficulties were even worse than I’d anticipated.
2. Data collection and data mining, economics of personal data, behavioral economics of privacy
Goals. Jump forward in time to the present day and immerse ourselves in the world of ubiquitous data collection and surveillance. Discover what kinds of data collection and data mining are going on, and why. Discuss how and why the conversation has shifted from Government surveillance to data collection by private companies in the last 20 years.
Theme: first-party data collection.
- New York Times. How Companies Learn Your Secrets
- Andrew Odlyzko. Privacy, Economics, and Price Discrimination on the Internet
Theme: third-party data collection.
- Julia Angwin. The Web’s New Gold Mine: Your Secrets (First in the Wall Street Journal’s What They Know series)
- Jonathan R. Mayer and John C. Mitchell. Third-Party Web Tracking: Policy and Technology
Theme: why companies act the way they do.
- Joseph Bonneau and Sören Preibusch. The Privacy Jungle: On the Market for Data Protection in Social Networks
- Bruce Schneier. How Security Companies Sucker Us With Lemons (WIRED)
Theme: why people act the way they do.
- Alessandro Acquisti and Jens Grossklags. What Can Behavioral Economics Teach Us About Privacy?
- Alessandro Acquisti. Privacy in Electronic Commerce and the Economics of Immediate Gratification
This section is rather self-explanatory. After the math-y flavor of the first section, this one has a good amount of economics, behavioral economics, and policy. One of the thought exercises was to project current trends into the future and imagine what ubiquitous tracking might lead to in five or ten years.
3. Anonymity and De-anonymization
Important note: communications anonymity (e.g., Tor) and data anonymity/de-anonymization (e.g., identifying people in digital databases) are technically very different, but we will discuss them together because they raise some of the same ethical questions. Also, Bitcoin lies somewhere in between the two.
- Roger Dingledine, Nick Mathewson, Paul Syverson. Tor: The Second-Generation Onion Router
- Satoshi Nakamoto. Bitcoin: A Peer-to-Peer Electronic Cash System
Tor and Bitcoin (especially the latter) were the hardest but also the most rewarding parts of the class, both for them and for me. Together they took up 4 classes. Bitcoin is extremely challenging to teach because it is technically intricate, the ecosystem is rapidly changing, and a lot of the information is in random blog/forum posts.
In a way, I was betting on Bitcoin by deciding to teach it — if it had died with a whimper, their knowledge of it would be much less relevant. In general I think instructors should choose to make these such bets more often; most curricula are very conservative. I’m glad I did.
- Nils Homer at al. Resolving Individuals Contributing Trace Amounts of DNA to Highly Complex Mixtures Using High-Density SNP Genotyping Microarrays
- [Optional] Arvind Narayanan, Elaine Shi, Benjamin I. P. Rubinstein. Link Prediction by De-anonymization: How We Won the Kaggle Social Network Challenge
It was a challenge to figure out which deanonymization paper to assign. I went with the DNA one because I wanted them to see that deanonymization isn’t a fact about data, but a fact about the world. Another thing I liked about this paper is that they’d have to extract the not-too-complex statistical methodology in this paper from the bioinformatics discussion in which it is embedded. This didn’t go as well as I’d hoped.
I’ve co-authored a few deanonymization papers, but they’re not very well written and/or are poorly suited for pedagogical purposes. The Kaggle paper is one exception, which I made optional.
- Paul Ohm. Broken Promises of Privacy: Responding to the Surprising Failure of Anonymization
- [Optional] Jane Yakowitz Bambauer. Tragedy of the Data Commons
This is another pair of papers with opposing views. Since the latter paper is optional, knowing that most of them wouldn’t have read it, I used the Wiki prompts to raise many of the issues that the author raises.
4. Lightweight Privacy Technologies and New Approaches to Information Privacy
While cryptography is the mechanism of choice for cypherpunk privacy and anonymity tools like Tor, it is too heavy a weapon in other contexts like social networking. In the latter context, it’s not so much users deploying privacy tools to protect themselves against all-powerful adversaries but rather a service provider attempting to cater to a more nuanced understanding of privacy that users bring to the system. The goal of this section is to consider a diverse spectrum of ideas applicable to this latter scenario that have been proposed in recent years in the fields of CS, HCI, law, and more. The technologies here are “lightweight” in comparison to cryptographic tools like Tor.
- Scott Lederer, Jason Hong et al. Personal Privacy through Understanding and Action: Five Pitfalls for Designers
- Franziska Roesner et al. User-Driven Access Control: Rethinking Permission Granting in Modern Operating Systems
- Fred Stutzman and Woodrow Hartzog. Obscurity by Design: An Approach to Building Privacy into Social Media
- Woodrow Hartzog and Fred Stutzman. The Case for Online Obscurity
- Jerry Kang et al. Self-surveillance Privacy
- [Optional] Ryan Calo. Against Notice Skepticism In Privacy (And Elsewhere)
- Helen Nissenbaum. A Contextual Approach to Privacy Online
5. Purely technological approaches revisited
This final section doesn’t have a coherent theme (and I admitted as much in class). My goal with the first two papers was to contrast a privacy problem which seems amenable to a purely or primarily technological formulation and solution (statistical queries over databases of sensitive personal information) with one where such attempts have been less successful (the decentralized, own-your-data approach to social networking and e-commerce).
- Differential Privacy. (Lecture)
- Cynthia Dwork. Differential Privacy.
Differential privacy is another topic that is sorely lacking in expository material, especially from the point of view of students who’ve never done crypto before. So this was again a lecture.
- Arvind Narayanan et al. A Critical Look at Decentralized Personal Data Architectures
- John Perry Barlow A Declaration of the Independence of Cyberspace (short essay, 1996)
- James Grimmelmann. Sealand, HavenCo, and the Rule of Law
These two essays aren’t directly related to privacy. One of the recurring threads in this course is the debate between purely technological and legal or other approaches to privacy; the theme here is to generalize it to a context broader than privacy. The Barlow essay asserts the exceptionalism of Cyberspace as an unregulable medium, whereas the Grimmelmann paper provides a much more nuanced view of the relationship between the law and new technological frontiers.
I’m making available the entire set of Wiki discussion prompts for the class (HTML/PDF). I consider this integral to the syllabus, for it shapes the discussion very significantly. I really hope other instructors and students find this useful as a teaching/study guide. For reference, each set of prompts (one set per class) took me about three hours to write on average.
There are many more things I want to share about this class: the major take-home ideas, the rationale for the Wiki discussion format, the feedback I got from students, a description of a couple of student projects, some thoughts on the sociology of different communities studying privacy and how that impacted the class, and finally, links to similar courses that are being taught elsewhere. I’ll probably close this series with a round-up post including as many of the above topics as I can.
Last semester I taught a course on privacy technologies. Since it was a seminar, the class was a small, self-selected group of very motivated students. Based on the feedback, it seems to have been a success; it was certainly quite personally gratifying for me. This is the first in a series of posts on what I learnt from teaching this course. In this post I will discuss some major misconceptions about privacy, how to refute them, and why it is important to do this right at the beginning of the course.
Privacy’s primary pitfalls
Instructors are often confronted with breaking down faulty mental models that students bring into class before actual learning can happen. This is especially true of the topic at hand. Luckily, misconceptions about privacy are so pervasive in the media and among the general public that it wasn’t too hard to identify the most common ones before the start of the course. And it didn’t take much class discussion to confirm that my students weren’t somehow exempt from these beliefs.
One cluster of myths is about the supposed lack of importance of privacy. 1. “There is no privacy in the digital age.” This is the most common and perhaps the most grotesquely fallacious of the misconceptions; more on this below. 2. “No one cares about privacy any more” (variant: young people don’t care about privacy.) 3. “If you haven’t done anything wrong you have nothing to hide.”
A second cluster of fallacious beliefs is very common among computer scientists and comes from the tendency to reduce everything to a black-and-white technical problem. In this view, privacy maps directly to access control and cryptography is the main technical mechanism for achieving privacy. It’s a view in which the world is full of adversaries and there is no room for obscurity or nontechnical ways of improving privacy.
The first step in learning is to unlearn
Why is it important to spend time confronting faulty mental models? Why not simply teach the “right” ones? In my case, there was a particularly acute reason — to the extent that students believe that privacy is dead and that learning about privacy technologies is unimportant, they are not going to be invested in the class, which would be really bad. But even in the case of misconceptions that don’t lead to students doubting the fundamental premise of the class, there is a surprising reason why unlearning is important.
A famous experiment in the ’80s (I really really recommend reading the linked text) demonstrated what we now know about the ineffectiveness of the “information transmission” model of teaching. The researchers interviewed students after any of four introductory physics courses, and determined that they hadn’t actually learned what had been taught, such as Newton’s laws of motion; instead they just learned to pass the tests. When the researchers sat down with students to find out why, here’s what they found:
What they heard astonished them: many of the students still refused to give up their mistaken ideas about motion. Instead, they argued that the experiment they had just witnessed did not exactly apply to the law of motion in question; it was a special case, or it didn’t quite fit the mistaken theory or law that they held as true.
A special case! Ha. What’s going on here? Well, learning new facts is easy. On the other hand, updating mental models is so cognitively expensive that we go to absurd lengths to avoid doing so. The societal-scale analog of this extreme reluctance is well-illustrated by the history of science — we patched the Ptolemaic model of the Universe, with the Earth at the center, for over a millennium before we were forced to accept that the Copernican system fit observations better.
The instructor’s arsenal
The good news is that the instructor can utilize many effective strategies that fall under the umbrella of active learning. Ken Bain’s excellent book (which the preceding text describing the experiment is from) lays out a pattern in which the instructor creates an expectation failure, a situation in which existing mental models of reality will lead to faulty expectations. One of the prerequisites for this to work, according to the book, is to get students to care.
Bain argues that expectation failure, done right, can be so powerful that students might need emotional support to cope. Fortunately, this wasn’t necessary in my class, but I have no doubt of it based on my personal experiences. For instance, back when I was in high school, learning how the Internet actually worked and realizing that my intuitions about the network had to be discarded entirely was such a disturbing experience that I remember my feelings to this day.
Let’s look at an example of expectation failure in my privacy class. To refute the “privacy is dying” myth, I found it useful to talk about Fifty Shades of Grey — specifically, why it succeeded even though publishers initially passed on it. One answer seems to be that since it was first self-published as an e-book, it allowed readers to be discreet and avoid the stigma associated with the genre. (But following its runaway success in that form, the stigma disappeared, and it was released in paper form and flew off the shelves.)
The relative privacy of e-books from prying strangers is one of the many ways in which digital technology affords more privacy for specific activities. Confronting students with an observed phenomenon whose explanation involves a fact that seems starkly contrary to the popular narrative creates an expectation failure. Telling personal stories about how technology has either improved or eroded privacy, and eliciting such stories from students, gets them to care. Once this has been accomplished, it’s productive to get into a nuanced discussion of how to reconcile the two views with each other, different meanings of privacy (e.g., tracking of reading habits), how the Internet has affected each, and how society is adjusting to the changing technological landscape.
I’m quite new to teaching — this is only my second semester at Princeton — but it’s been exciting to internalize the fact that learning is something that can be studied scientifically and teaching is an activity that can vary dramatically in effectiveness. I’m looking forward to getting better at it and experimenting with different methods. In the next post I will share some thoughts on the content of my course and what I tried to get students to take home from it.
Thanks to Josh Hug for reviewing a draft.
This is the second in a series of posts with advice for computer science academic job candidates.
One shot, one opportunity
The philosopher Marshall Mathers once asked rhetorically, “Look, if you had one shot, or one opportunity / To seize everything you ever wanted in one moment / Would you capture it or just let it slip?”
He added, “Yo.” 
I don’t mean to imply that an academic position is everything you ever wanted, but it’s a pretty good life (although not for these reasons). Like it or not, it’s set up so that your career up until this point comes down to one moment. After years of hard work, your ability as a researcher will be judged primarily based on how you sell yourself in the fleeting span of an hour. Of course, you’ll (hopefully) give your talk at many places, but it’s going to be the same talk!
There’s a reason I’m saying this, and it’s not to stress you out even more. Rather, if at any point the level of preparedness that I suggest seems excessive or disproportionate, remember the wise words quoted above.
Public speaking is a performance
My first piece of advice is to read the book Confessions of a Public Speaker. As in, don’t even think about giving your job talk without having read it. You can read it in a sitting; putting it into practice will of course take longer. I cannot overstate the impact this book had on my talk (and my public speaking in general). There are probably other books that capture much of the wisdom in Confessions, and I’d love to hear other recommendations, but if you’re going to read one book it would have to be this one.
There are numerous very useful little details in the book, but it has one central idea that can be boiled down to the phrase “public speaking is a performance.” Job talks are are even more of a performance than public speaking in general, since the audience is specifically there to judge you. This is a generative metaphor — it allows you predict things about your job talk based on what you know about performing. Fully appreciating the metaphor will require reading the book, but here are two such predictions that might otherwise be surprising.
Your first priority is to entertain
Certainly you must both entertain and inform, but the point is that you don’t really have a shot at the latter if you fail at the former. Sitting in a lecture, as everyone who remembers their student days is surely aware, can be excruciating; it’s an extremely unnatural situation from an evolutionary perspective (again, read Confessions to appreciate why.) The chart below from the book What’s the Use of Lectures? shows students’ heart rate over time as they sat in a lecture. It’s only a drop of a few beats per minute, but it translates to an enormous difference in alertness.If you don’t do anything different in your talk and simply present your material, your audience’s attention level will be greatly diminished by the half-hour mark, and by the end of your talk people will basically be comatose. Anything you can do to break the routine, linear, hyper-boring pattern of a lecture will help jolt the audience out of their stupor. (That includes asking questions — I usually asked two or three in my talk.) Otherwise they won’t be excited about you nor remember much after the talk.
Rehearse, rehearse, rehearse
You may have heard “practice, practice, practice.” I’d rather cast it in the language of performance, as there are some subtle differences. For example, when people tell you to practice, they tell you not to overdo it because you’d lose your spontaneity. I disagree. In a rehearsal, everything is practiced down to the last detail. In fact, the apparently spontaneous things that I said my talk were the most well-rehearsed parts.
Rehearsal should include videotaping yourself and watching it. Yes, it’s painful and majorly cringe-inducing, but it’s absolutely, absolutely essential. In addition to all the obvious facets of good presentation style that I won’t repeat, one of the subtle but important things you should watch for is nervous tics or other repetitive behaviors — almost everyone has one or more of those, and they can almost derail your talk by distracting your audience.
The reason rehearsal makes such a huge difference is that when you’re delivering a rehearsed talk, your every word and gesture is subconscious, freeing up your mental bandwidth for observing and reacting in real-time to the facial expressions of your audience. There are never more than 40-50 people in these talks, a small enough number that you can instantly notice if someone looks confused, skeptical, or bored. But this won’t be possible if you have to think through your slides instead. The reduction in cognitive load also minimizes the chance of “hitting the wall,” a phenomenon of sudden mental fatigue that’s a serious danger in long-ish talks and can leave you helpless.
Let me close with an example of a little theatrical thing I did that shows the value of rehearsal and the performance metaphor. One of the goals in my location privacy project is to minimize smartphone power consumption. When I got to that part, I’d say, “those of you with Android phones know how bad the battery life is. In fact I usually carry a spare battery around… actually, I think I have it on me.” Then I’d pull a smartphone battery out of my jacket pocket with a bit of a dramatic touch. Somehow the use of a physical prop seemed to reframe their thinking from “yet another academic paper” to “solving a real problem.” It would also usually elicit a laugh and elevate their attention level.
There is so much more to say about job talks, not to mention other aspects of the job interview. I might do follow up posts on a mathematical model of audience behavior and/or an explanation of why slide transitions are (by far) the most important part of your slides.
 This post was written while listening to Lose Yourself in a loop.
 If it needs to be said, I have no stake in the book, financial or otherwise.
 I hasten to add that teaching is very different from public speaking and is emphatically not a performance.
In my previous article I pointed out that online price discrimination is suspiciously absent in directly observable form, even though covert price discrimination is everywhere. Now let’s talk about why that might be.
By “covert” I don’t mean that the firm is trying to keep price discrimination a secret. Rather, I mean that the differential treatment isn’t made explicit — e.g., by not basing it directly on a customer attribute — and thereby avoiding triggering the perception of unfairness or discrimination. A common example is selective distribution of coupons instead of listing different prices. Such discounting may be publicized, but it is still covert.
The perception of fairness
The perception of fairness or unfairness, then, is at the heart of what’s going on. Going back to the WSJ piece, I found it interesting to see the reaction of the customer to whom Staples quoted $1.50 more for a stapler based on her ZIP code: “How can they get away with that?” she asks. To which my initial reaction was, “Get away with what, exactly? Supply and demand? Econ 101?”
Even though some of us might not feel the same outrage, I think all of us share at least a vague sense of unease about overt price discrimination. So I decided to dig deeper into the literature in psychology, marketing, and behavioral economics on the topic of price fairness and understand where this perception comes from. What I found surprised me.
First, the fairness heuristic is quite elaborate and complex. In a vast literature spanning several decades, early work such as the “principle of dual entitlement” by Kahneman and coauthors established some basics. Quoting Anderson and Simester: “This theory argues that customers’ have perceived fairness levels for both ﬁrm proﬁts and retail prices. Although ﬁrms are entitled to earn a fair proﬁt, customers are also entitled to a fair price. Deviations from a fair price can be justiﬁed only by the ﬁrm’s need to maintain a fair proﬁt. According to this argument, it is fair for retailers to raise the price of snow shovels if the wholesale price increases, but it is not fair to do so if a snowstorm leads to excess demand.”
Much later work has added to and refined that model. A particularly impressive and highly cited 2004 paper reviews the literature and proposes an elaborate framework with four different classes inputs to explain how people decide if pricing is fair or unfair in various situations. Some of the findings are quite surprising. For example: in case of differential pricing to the buyer’s disadvantage, “trust in the seller has a U-shaped effect on price fairness perceptions.”
The illusion of fairness
Sounds like we have a well-honed and sophisticated decision procedure, then? Quite the opposite, actually. The fairness heuristic seems to be rather fragile, even if complex.
Let’s start with an example. Andrew Odlyzko, in his brilliant essay on price discrimination — all the more for the fact that it was published back in 2003  — has this to say about Coca Cola’s ill-fated plans for price-adjusting vending machines: “In retrospect, Coca Cola’s main problem was that news coverage always referred to its work as leading to vending machines that would raise prices in warm weather. Had it managed to control publicity and present its work as leading to machines that would lower prices in cold weather, it might have avoided the entire controversy.”
We know how to explain the public’s reaction to the Coca Cola announcement using behavioral economics — the way it was presented (or framed), customers take the lower price as the “reference price,” and the price increase seems unfair, whereas the Odlyzko’s suggested framing would anchor the higher price as the reference price. Of course, just because we can explain how the fairness heuristic works doesn’t make it logical or consistent, let alone properly grounded in social justice.
More generally, every aspect of our mental price fairness assessment heuristic seems similarly vulnerable to hijacking by tweaking the presentation of the transaction without changing the essence of price discrimination. Companies have of course gotten wise to this; there’s even academic literature on it. One of the techniques proposed in this paper is “reference group signaling” — getting a customer to change the set of other customers to whom they mentally compare themselves. 
The perception of fairness, then, can be more properly called the illusion of fairness.
The fragility of the fairness heuristic becomes less surprising considering that we apparently share it with other primates. This hilarious clip from a TED talk shows a capuchin monkey reacting poorly, to put it mildly, to differential treatment in a monkey-commerce setting (although the jury may still be out on the significance of this experiment). If our reaction to pricing schemes is partly or largely due to brain circuitry that evolved millions of years ago, we shouldn’t expect it to fare well when faced with the complexities of modern business.
Given that the prime impediment to pervasive online price discrimination is a moral principle that is fickle and easily circumventable, one can expect that companies to do exactly that, since they can reap most of the benefits of price discrimination without the negative PR. Indeed, it is my belief that more covert price discrimination is going on than is generally recognized, and that it is accelerating due to some technological developments.
This is a problem because price discrimination does raise ethical concerns, and these concerns are every bit as significant when it is covert.  However, since it is much less transparent, there’s less of an opportunity for public debate.
There are two directions in which I want to take this series of articles from this point: first a look at how new technology is enabling powerful forms of tailoring and covert price discrimination, and second, a discussion of what can be done to make price discrimination more transparent and how to have an informed policy discussion about its benefits and dangers.
 I had the pleasure of sitting next to Professor Odlyzko at a conference dinner once, and I expressed my admiration of the prescience of his article. He replied that he’d worked it all out in his head circa 1996 but took a few years to put it down on paper. I could only stare at him wordlessly.
 I’m struck by the similarities between price fairness perceptions and privacy perceptions. The aforementioned 2004 price fairness framework can be seen as serving a roughly analogous function to contextual integrity, which is (in part) a theory of consumer privacy expectations. Both these theories are the result of “reverse engineering,” if you will, of the complex mental models in their respective domains using empirical behavioral evidence. Continuing the analogy, privacy expectations are also fragile, highly susceptible to framing, and liable to be exploited by companies. Acquisti and Grossklags, among others, have done some excellent empirical work on this.
 In fact, crude ways of making customers reveal their price sensitivity lead to a much higher social cost than overt price discrimination. I will take this up in more detail in a future post.
The mystery about online price discrimination is why so little of it seems to be happening.
Consumer advocates and journalists among others have been trying to find smoking gun evidence of price discrimination — the overt kind where different customers are charged different prices for identical products based on how much they are willing to pay. (By contrast, examples of covert or concealed price discrimination abound; see, for example, my 2011 article.) Back in 2000 Amazon tried a short-lived experiment where prices of DVDs for new and for regular users were different. But that remains essentially the only example.
This should be surprising. Tailoring prices to individuals is far more technically feasible online than offline, since shoppers are either identified or at least have loads of behavioral data associated with their pseudonymous cookies. The online advertising industry claims that this is highly effective for targeting ads; estimating consumers’ willingness to pay shouldn’t be much harder. Clearly, price discrimination has benefits to firms engaging in it by allowing them to capture more of the “consumer surplus.” (Whether or not it is beneficial to consumers is a more controversial question that I will defer to a future post.) In fact, based on technical feasibility and economic benefits, one might expect the practice to be pervasive.
The evidence (or lack thereof)
A study out of Spain last year took a comprehensive look at online merchants, by far the most thorough analysis of its kind. They created two “personas” with different browsing histories — one of which visited discount sites and the other visited sites for luxury products. Each persona then browsed 200 e-commerce sites as well as search engines to see if they were treated differently. Here’s what the authors found:
- There is evidence for search discrimination or steering where the high- and low-income personas are shown ads for high-end and low-end products respectively. In my opinion, the line between this practice and plain old behavioral advertising is very, very slim. 
- There is no evidence for price discrimination based on personas/browsing histories.
- Three of the 200 retailers including Staples varied prices based on the user’s location, but necessarily not in a way that can’t be explained by costs of doing business.
- Visitors coming from one particular deals site (nextag.com) saw lower prices at various retailers. (Discounting and “deals” are very common forms of concealed price discrimination.)
A new investigation by the Wall Street Journal analyzes Staples in more detail. While the Spain study found geographic variation in prices, the WSJ study goes further and shows a strong correlation between lower prices and consumers’ ability to drive to competitors’ stores, which is an indicator of willingness to pay. I’m not 100% convinced that they’ve ruled out alternative hypotheses, but it does seem plausible that Staples’ behavior constitutes actual price discrimination, even though geography is a far cry from utilizing behavioral data about individuals.
Other findings in the WSJ piece are websites that offer discounts for mobile users and location-dependent pricing on Lowe’s and Home Depot’s websites but with little evidence of being based on anything but costs of doing business.
So there we have it. Both studies are very thorough, and I commend the authors, but I consider their results to be mostly negative — very few companies are varying prices at all and none are utilizing anywhere near the full extent of data available about users. Other price discrimination controversies include steering by Orbitz and a hastily-retracted announcement by Coca Cola for vending machines that would tailor prices to demand. Neither company charged or planned to charge different prices for the same product based on who the consumer was.
In short, despite all the hubbub, I find overt price discrimination conspicuous by its absence. In a follow-up post I will propose an explanation for the mystery and see what we can learn from it.
 This is an automatic consequence of collaborative recommendation that suggests products to users based on what similar users have clicked on/purchased in the past. It does not require that any explicit inference of the consumer’s level of affluence be made by the system. In other words, steering, bubbling etc. are inherent features of collaborative filtering algorithms which drive personalization, recommendation and information retrieval on the Internet. This fact greatly complicates attempts to define, detect or regulate unfair discrimination online.
Thanks to Aleecia McDonald for reviewing a draft.