The Unsung Success of CAN-SPAM
In today’s debate around Do Not Track, detractors frequently make a comparison to the CAN-SPAM Act and how it failed to stop spam. Indeed, in 2010 an average of 183 billion spam emails were sent per day, so clearly the law was thoroughly ineffective.
Or was it?
Decrying the effect of CAN-SPAM by looking at the total number, or even the percentage, of spam emails betrays a lack of understanding of what the Act was intended to do and how laws operate in general. Clearly, the Act does nothing to deter spammers in Ukraine or China; it’s not like the legislators were oblivious to this. To understand the positive effects that CAN-SPAM has had, it is necessary to go back to 2003 and see why spam filters weren’t working very well back then.
To a first approximation, a spam filter, like all machine learning-based “classifiers,” works by representing an email as a point in a multi-dimensional space and looking at which side of a surface (such as a “hyperplane”) it falls on. The hyperplane is “learned” by looking at already-classified emails. When you click the “report spam” button, you’re “training” this classifier, and it tweaks the hyperplane to become slightly more accurate in the future.
For emails that look obviously like spam, the classifier will never make a mistake, no matter how many millions of them it sees. The emails that it has trouble with are those that have some properties of ham and some properties of spam — those close to the boundary.
It is difficult for spammers to make their emails look legitimate, because ultimately they need to sell you a penis-enlargement product or whatever other scam they’re peddling. Back in the good old days when spam filters were hand-coded, they’d use tricks like replacing the word Viagra with Vi@gra. But the magic of machine learning ensures that modern filters will automatically update themselves very quickly.
Ham that looks like spam is much more of a problem. E-mail marketing is a grey area, and marketers will do anything they can to entice you to open their messages. Why honestly title your email “October widget industry newsletter” when you can instead title it “You gotta check this out!!” Compounding this problem is the fact that people get much more upset by false positives (legitimate messages getting lost) than false negatives (spam getting through to inbox).
It now becomes obvious how CAN-SPAM made honest people honest (and the bad guys easier to prosecute) and how that changed the game. The rules basically say, “don’t lie.” If you look a corpus of email today, you’ll find that the spectrum that used to exist is gone — there’s obviously legitimate e-mail (that intends to comply) and obviously illegitimate e-mail (that doesn’t care). The blue dots in the picture have been forced to migrate up — or risk being in violation. As you can imagine, spam filters have a field day in this type of situation.
And I can prove it. Instead of looking at how much spam is sent, let’s look at how much spam is getting through. Obviously this is harder to measure, but there is a simple proxy: search volume. The logic is straightforward: people who have a spam problem will search for it, in the hope of doing something about it.
A-ha! A five-fold decrease since CAN-SPAM was passed. That doesn’t prove that the decrease is necessarily due to the Act, but it does prove that those who claim spam is still a major problem have no clue what they’re talking about.
There’s unsolicited email that is legitimate under CAN-SPAM; most people would consider these to be spam as well. Here’s where another provision of the Act comes in: one-click unsubscribe. Michael Dayah reports on an experiment showing that for this type of spam, unsubscription is almost completely effective.
Incidentally, his view of CAN-SPAM concurs with mine:
The CAN-SPAM act then strongly bifurcated spammers. Some came into the light and followed the rules, using relevant subjects, no open relays, understandable language, and an unsubscribe link that supposedly functioned. Other went underground, doing their best to skirt the content filtering with nonsense text and day-old Chinese landing domains.
I would go so far as to say that the Act is a good model for the interplay between law and technology in solving a difficult problem. I’m not sure to what extent the lawmakers anticipated the developments that followed its passage, but CAN-SPAM is completely undeserving of the negative, even derisive reputation that it has acquired.