Posts tagged ‘academia’
As an academic who’s spent time in the startup world, I see strong similarities between the nature of a scientific research project and the nature of a startup. This boils down the fact that most research projects fail (in a sense that I’ll describe), and even among the successful projects the variance is extremely high — most of the impact is concentrated in a few big winners.
Of course, research projects are clearly unlike startups in some important ways: in research you don’t get to capture the economic benefit of your work; your personal gain from success is not money but academic reputation (unless you commercialize your research and start an actual startup, but that’s not what this post is about at all.) The potential personal downside is also lower for various reasons. But while the differences are obvious, the similarities call for some analysis.
I hope this post is useful to grad students in particular in acquiring a long-term vision for how to approach their research and how to maximize the odds of success. But perhaps others including non-researchers will also find something useful here. There are many aspects of research that may appear confusing or pathological, and at least some of them can be better understood by focusing on the high variance in research impact.
1. Most research projects fail.
To me, publication alone does not constitute success; rather, the goal of a research project is to impact the world, either directly or by influencing future research. Under this definition, the vast majority of research ideas, even if published, are forgotten in a few years. Citation counts estimate impact more accurately , but I think they still significantly underestimate the skew.
The fact that most research projects don’t make a meaningful lasting impact is OK — just as the fact that most startups fail is not an indictment of entrepreneurship.
A researcher might choose to take a self-interested view and not care about impact, but even in this view, merely aiming to get papers published is not a good long-term strategy. For example, during my recent interview tour, I got a glimpse into how candidates are evaluated, and I don’t think someone with a slew of meaningless publications would have gotten very far. 
2. Grad students: diversify your portfolio!
Given that failure is likely (and for reasons you can’t necessarily control), spending your whole Ph.D. trying to crack one hard problem is a highly risky strategy. Instead, you should work on multiple projects during your Ph.D., at least at the beginning. This can be either sequential or parallel; the former is more similar to the startup paradigm (“fail-fast”).
I achieved diversity by accident. Halfway through my Ph.D. there were at least half a dozen disparate research topics where I’d made some headway (some publications, some works in progress, some promising ideas). Although I felt I was directionless, this turned out to be the right approach in retrospect. I caught a lucky break on one of them — anonymity in sanitized databases — because of the Netflix Prize dataset, and from then on I doubled down to focus on deanonymization. This breadth-then-depth approach paid off.
3. Go for the big hits.
Paul Graham’s fascinating essay Black Swan Farming is about how skewed the returns are in early-stage startup investing. Just two of the several hundred companies that YCombinator has funded are responsible for 75% of the returns, and in each batch one company outshines all the rest.
The returns from research aren’t quite as skewed, but they’re skewed enough to be highly counterintuitive. This means researchers must explicitly account for the skew in selecting problems to work on. Following one’s intuition and/or the crowd is likely to lead to a mediocre career filled with incremental, marginally publishable results. The goal is to do something that’s not just new and interesting, but which people will remember in ten years, and the latter can’t necessarily be predicted based on the amount of buzz a problem is generating in the community right now. Breakthroughs often come from unsexy problems (more on that below).
There’s a bit of a tension between going for the hits and diversifying your portfolio. If you work on too few projects, you incur the risk that none of them will pan out. If you work on too many, you spread yourself too thin, the quality of each one suffers, and lowers the chance that at least one of them will be a big hit. Everyone must find their own sweet spot. One piece of advice given to junior professors is to “learn to say no.”
4. Find good ideas that look like bad ideas.
How do you predict if an idea you have is likely to lead to success, especially a big one? Again let’s turn to Paul Graham in Black Swan Farming:
“the best startup ideas seem at first like bad ideas. … if a good idea were obviously good, someone else would already have done it. So the most successful founders tend to work on ideas that few beside them realize are good.”
Something very similar is true in research. There are some problems that everyone realizes are important. If you want to solve such a problem, you have to be smarter than most others working on it and be at least a little bit lucky. Craig Gentry, for example, invented Fully Homomorphic Encryption mostly by being very, very smart.
Then there are research problems that are analogous to Graham’s good ideas that initially look bad. These fall into two categories: 1. research problems that no one has realized are important 2. problems that everyone considers prohibitively difficult but which turn out to have a back door.
If you feel you are in a position to take on obviously important problems, more power to you. I try to work on problems that everyone seems to think are bad ideas (either unimportant or too difficult), but where I have some “unfair advantage” that leads me to think otherwise. Of course, a lot of the time they are right, but sometimes they are not. Let me give two examples.
I consider Adnostic (online behavioral advertising without tracking) to be moderately successful: it has had an impact on other research in the area, as well as in policy circles as an existence proof of behavioral-advertising-with-privacy. Now, my coauthors started working on it before I joined them, so I can take none of the credit for problem selection. But it’s a good illustration of the principle. The main reason they decided this problem was important was that privacy advocates were up in arms about online tracking. Almost no one in the computer science community was studying the topic, because they felt that simply blocking trackers was an adequate solution. So this was a case of picking a problem that people didn’t realize was important. Three years later it’s become a very crowded research space.
Another example is my work with Shmatikov on deanonymizing social networks by being able to find a matching between the nodes of two social graphs. Most people I talked to at the time thought this was impossible — after all, it’s a much harder version of graph isomorphism, and we’re talking about graphs with millions of nodes. Here’s the catch: people intuitively think graph isomorphism is “hard,” but it is in fact not NP-complete and on real-world graphs it embarrassingly easy. We knew this, and even though the social network matching problem is harder than graph isomorphism, we thought it was still doable. In the end it took months of work, but fortunately it was just within the realm of possibility.
5. Most researchers are known for only one or two things.
Let me end with an interesting side effect of the high-skew theory: a successful researcher may have worked on many successful projects during their career, but the top one or two of those will likely be far better known than the rest. This seems to be borne out empirically, and a source of much annoyance for many researchers to be pigeonholed as “the person who did X.” Let’s take Ron Rivest who’s been prolific for several decades not just in cryptography but also in algorithms and lately in voting. Most computer scientists will recall that he’s the R in RSA, but knowledge of his work drops off sharply after that. This is also reflected in the citation counts (the first entry is a textbook, not a research paper). 
In summary, if you’re a researcher, think carefully about which projects to work on and what the individual and overall chances of success are. And if you’re someone who’s skeptical about academia because your friend who dropped out of a Ph.D. after their project failed convinced you that all research is useless, I hope this post got you to think twice.
I may do a follow-up post examining whether ideas are as valuable as they are held to be in the research community, or whether research ideas are more similar to startup ideas in that it’s really execution and selling that lead to success.
 For example, a quarter of my papers are responsible for over 80% of my citations.
 That said, I will get a much better idea in the next few months from the other side of the table :)
 Specifically, it undermines the “we can’t stop tracking because it would kill our business model” argument that companies love to make when faced with pressure from privacy advocates and regulators.
 To be clear, my point is that Rivest’s citation counts drop off relative to his most well-known works.
Thanks to Joe Bonneau for comments on a draft.
I’ve had a wonderful time at Stanford these last couple of years, but it’s time to move on. I’m currently in the middle of my job search, looking for faculty and other research positions. In the next month or two I will be interviewing at several places. It’s been an interesting journey.
My Ph.D. years in Austin were productive and blissful. When I finished and came West, I knew I enjoyed research tremendously, but there were many aspects of research culture that made me worry if I’d fit in. I hoped my postdoc would give me some clarity.
Happily, that’s exactly what happened, especially after I started being an active participant in program committees and other community activities. It’s been an enlightening and humbling experience. I’ve come to realize that in many cases, there are perfectly good reasons why frequently-criticized aspects of the culture are just the way they are. Certainly there are still facets that are far from ideal, but my overall view of the culture of scientific research and the value of research to society is dramatically more positive than it was when I graduated.
Let me illustrate. One of my major complaints when I was in grad school was that almost nobody does interdisciplinary research (which is true — the percentage of research papers that span different disciplines is tiny). Then I actually tried doing it, and came to the obvious-in-retrospect realization that collaborating with people who don’t speak your language is hard.
Make no mistake, I’m as committed to cross-disciplinary research as I ever was (I just finished writing a grant proposal with Prof’s Helen Nissenbaum and Deirdre Mulligan). I’ve gradually been getting better at it and I expect to do a lot of it in my career. But if a researcher makes a decision to stick to their sub-discipline, I can’t really fault them for that.
As another example, consider the lack of a “publish-then-filter” model for research papers, a whole two decades after the Web made it technologically straightforward. Many people find this incomprehensibly backward and inefficient. Academia.edu founder Richard Price wrote an article two days ago arguing that the future of peer review will look like a mix of Pagerank and Twitter. Three years ago, that could have been me talking. Today my view is very different.
Science is not a popularity contest; Pagerank is irrelevant as a peer-review mechanism. Basically, scientific peer review is the only process that exists for systematically separating truths from untruths. Like democracy, it has its problems, but at least it works. Social media is probably the worst analogy — it seems to be better at amplifying falsehoods than facts. Wikipedia-style crowdsourcing has its strengths, but it can hit-or-miss.
To be clear, I think peer review is probably going to change; I would like it to be done in public, for one. But even this simple change is fraught with difficulty — how would you ensure that reviewers aren’t influenced by each others’ reviews? This is an important factor in the current system. During my program committee meetings, I came to realize just how many of these little procedures for minimizing bias are built into the system and how seriously people take the spirit of this process. Revamping peer review while keeping what works is going to be slow and challenging.
Moving on, some of my other concerns have been disappearing due to recent events. Restrictive publisher copyrights are a perfect example. I have more of a problem with this than most researchers do — I did my Master’s in India, which means I’ve been on the other side of the paywall. But it looks like that pot may finally have boiled over. I think it’s only a matter of time now before open access becomes the norm in all disciplines.
There are certainly areas where the status quo is not great and not getting any better. Today if a researcher makes a discovery that’s not significant enough to write a paper about, they choose not to share that discovery at all. Unfortunately, this is the rational behavior for a self-interested researcher, because there is no way to get credit for anything other than published papers. Michael Neilsen’s excellent book exploring the future of networked science gives me some hope that change may be on the horizon.
I hope this post has given you a more nuanced appreciation of the nature of scientific research. Misconceptions about research and especially about academia seem to be widespread among the people I talk to both online and offline; I harbored a few myself during my Ph.D., as I said earlier. So I’m thinking of doing posts like this one on a semi-regular basis on this blog or on Google+. But that will probably have to wait until after my job search is done.
I was on a panel at the second FTC privacy roundtable in Berkeley on Thursday. Meeting a new community of people is always a fascinating experience. As a computer scientist, I’m used to showing up to conferences in jeans and a T-shirt; instead I found myself dressing formally and saying things like “oh, not at all, the honor is all mine!”
This post will also be the start of a new direction for this blog. So far, I’ve mostly confined myself to “doing the math” and limiting myself to factual exposition. That’s going to change, for two reasons:
- The central theme of this blog and of my Ph.D dissertation — the failure of data anonymization — now seems to be widely accepted in policy circles. This is due in large part to Paul Ohm’s excellent paper, which is a must-read for anyone interested in this topic. I no longer have to worry about the acceptance of the technical idea being “tainted” by my opinions.
- I’ve been learning about the various facets of privacy — legal, economic, etc. — for long enough to feel confident in my views. I have something to contribute to the larger discussion of where technological society is heading with respect to privacy.
Underrepresentation of scientists
As it turned out, I was the only academic computer scientist among the 35 panelists. I found this very surprising. The underrepresentation is not because computer scientists have nothing to contribute — after all, there were other CS Ph.Ds from industry groups like Mozilla. Rather, I believe it is a consequence of the general attitude of academic scientists towards policy issues. Most researchers consider it not worth their time, and a few actively disdain it.
The problem is even deeper: academics have the same disdainful attitude towards the popular exposition of science. The underlying reason is that the goal in academia is to impress one’s peers; making the world better is merely a side-effect, albeit a common one. The incentive structure in academia needs to change. I will pick up this topic in future posts.
The FTC has an admirable approach to regulation
As I found out in the course of the day’s panels, the FTC is not about prescribing or mandating what to do. Pushing a specific privacy-enhancing technology isn’t the kind of thing they are interested in doing at all. Rather, they see their role as getting the market to function better and the industry to self-regulate. The need to avoid harming innovation was repeatedly emphasized, and there was a lot of talk about not throwing the baby out with the bathwater.
The following were the potential (non baby hurting) initiatives that were most talked about:
- Market transparency. Markets can only work well when there is full information, and when it comes to privacy the market has failed horribly. Users have no idea what happens to their data once it’s collected, and no one reads privacy policies. Regulation that promotes transparency can help the market fix itself.
- Consumer education. This is a counterpart to the previous point. Education about privacy dangers as well as privacy technologies can help.
- Enforcement. A few bad apples have been responsible for the most egregious privacy SNAFUs. The larger players are by and large self-regulating. The FTC needs to work with law enforcement to punish the offenders.
- Carrots and sticks. Even the specter of regulation, corporate representatives said, is enough to get the industry to self-regulate. Many would disagree, but I think a carrots-and-sticks approach can be made to work.
- Incentivizing adoption of PETs (privacy enhancing technologies) in general. The question of how the FTC can spur the adoption of PETs was brought up on almost every panel, but I don’t think there were any halfway convincing answers. Someone mentioned that the government in general could go into the market for PETs, which seems reasonable.
As a libertarian, I think the overall non-interventionist approach here is exactly right. I’m told that the FTC is rather unusual among US regulatory agencies in this regard (which makes sense, considering that the FCC, for example, spends its time protecting children from breasts when it is not making up lists of words.)
Facebook’s two faces
Facebook public policy director Tim Sparapani, who was previously with the ACLU, made a variety of comments on the second panel that were bizarre, to put it mildly. Take a look (my comments are in sub-bullets):
- “We absolutely compete on privacy.”
- That’s a weird definition of “compete.” Facebook has a history of rolling out privacy-infringing updates, such as Beacon, the ToS changes, and the recent update that made the graph public. Then they wait to see if there’s an outcry and roll back some of the changes. It is hard to think of another company has had such a cavalier approach.
- “There are absolutely no barriers to entry to create a new social network.”
- Except for that little thing called the network effect, which is the mother of all barriers to entry. In a later post I will analyze why Facebook has reached a critical level of penetration in most markets which makes it nearly unassailable as a general-purpose social network.
- “Our users have learned to trust us.”
- I don’t even know what to say about this one.
- “We are a walled garden.”
- Sparapani is confusing two different senses of “walled garden” here. This was said in response to a statement by the Google rep about Google’s features to let users migrate their data to other services (which I find very commendable). In this sense, Facebook is indeed a walled garden, and doesn’t allow migration, which is a bad thing. But Sparapani said he meant it in the sense that Facebook doesn’t sell user data wholesale to other companies. That sounds like good news, except that third party app developers end up sharing user data with other entities, because enforcement of the application developer Terms of Service is virtually non-existent.
- “If you delete the data it’s gone.” (in the context of deleting your account)
- That might be true in a strict sense, but it is misleading. Deleting all your data is actually impossible to achieve because most pieces of data belong to more than one user. Each of your messages will live on in the other person’s inbox (and it would be improper to delete it from theirs). Similarly, photos in which you appear, which you would probably like gone when you delete your account, still live on in the album of whoever took the picture. The same goes for your pokes, likes and other multi-user interactions. These are the very things that make a social network social.
- “We now have controls on privacy at the moment you share data. This is an extraordinary innovation and our engineers are really proud of it.”
- The first part of that statement is true: you can now change the privacy controls on each of your Facebook status messages independently. The second part is downright absurd. It is completely trivial to implement from an engineering perspective (and LiveJournal for instance has had it for a decade).
There were more absurd statements, but you get the picture. It’s not just the fact that Sparapani’s comments were unhinged from reality that bothers me — the general tone was belligerent and disturbing. I missed a few minutes of the panel, during which he apparently he responded to a criticism from Chris Conley of the ACLU by saying “I was at the ACLU longer than you’ve been there.” This is unprofessional, undignified and a non-answer. Amusingly, he claimed that Facebook was “very proud” of various aspects of their privacy track record at least half a dozen times in the course of the panel.
Contrast all this with Mark Zuckerberg’s comments in an interview with Michael Arrington, which can be summed up as “the age of privacy is over.” That article goes on to say that Facebook’s actions caused the shift in social norms (to the extent that they have shifted at all) rather than merely responding to them. Either way, it is unquestionable that Facebook’s true behavior at the present time pays lip service to privacy, and Zuckerberg’s statement is a more-or-less honest reflection of that. On the other hand, as I have shown, the company sings a completely different tune when the FTC is listening.
Engaging privacy skeptics
Aside from Facebook’s shenanigans, I feel that that there are two groups in the privacy debate who are talking past each other. One side is represented by consumer advocates, and is largely echoed by the official position of the FTC. The other side’s position can be summed up as “yeah, whatever.” When expressed coherently, there are three tenets of this position (with the caveats that not all privacy skeptics adhere to all three):
- Users don’t care about privacy any more
- Even if they do, privacy is impossible to achieve in the digital age, so get over it
- There are no real harms arising from privacy breaches.
To the right is an illustrative example of a mainstream-media representative who was at the workshop covering it on Twitter through the lens of his preconceived prejudices.
Privacy scholars never engage with the skeptics because the skeptical viewpoint appears obviously false to anyone who has done some serious thinking about privacy. However, it is crucial to engage the opponents, because 1. the skeptical view is extremely common 2. many of the startups coming out of the valley fall into this group, and they are are going to have control over increasing amounts of user data in the years to come.
The “privacy is dead” view was most famously voiced by Scott McNealy. In its extreme form it is easy to argue against: “start streaming yourself live on the Internet 24/7, and then we’ll talk.” (To be sure, a few people did this 10 years ago as a publicity stunt, but it is obvious that the vast majority of people aren’t ready for this level of invasiveness of monitoring/data collection.) But engaging with skeptics isn’t about refutation, it’s about dealing with a different way of thinking and getting the message across to the other side. Unfortunately real engagement hasn’t really been happening.
I have a double life in academia and the startup world, and I think this puts me in a somewhat unusual position of being able to appreciate both sides of the argument. My own viewpoint is somewhere in the middle; I will expand on this theme in future blog posts.