Back to: Who's been listening on *my* line? | Forward to: Blair: The Final Countdown

Blog spam (2)

We hates it. And we got more of it overnight than in the preceeding month.


(a) All comments have the NOFOLLOW attribute enabled. If you're a spammer, and you successfully spam here, you won't get any joy from Google or the other search engines. You're wasting your time.

(b) I'm adding a three-minute timeout between postings.

(c) I'm automatically filtering postings with too many URLs.

It's a nuisance, but I don't run this blog for the benefit of spammers.

In more serious spam-related news, my email inbox is now receiving between 400 and 500 spams per day. I have software to filter the crap out before I have to read it, but the sheer volume of spam prevents me from effectively over-seeing the filters to ensure they don't accidentally junk mail from real humans who want to talk to me. (I used to skim the headers looking for signs that SpamAssassin was being over-enthusiastic, but if I'm away from the keyboard for just 24 hours it builds up to the point of no return.)

So if you send me email that you expect me to answer and don't hear from me, try using the comment form (see sidebar on this page).

When I get some free time I'll look into extra stuff like configuring greylisting on the mail server — but remember: time spent fighting spam is still time wasted by spammers.

Off with their heads!



How many URLs is "too many"? The Nielsen Haydens have a cutoff at >7.


I'd just like to take this opportunity to mention that you may be able to obtain a US Green Card! No, wait, I mean a low-rate mortgage, er, cheap cigarettes?

Wait, wait, I had it right here...


It's not spam - it's Unsolicited Commercial Email, and it is a shining tribute to the joys of free speech and the vigour of the capitalist system.

You commie.


How about CAPTCHAs?

Remove 'canspam' from email.


Adding Greylisting to your mail server is definitely worth the time spent. I added milter-greylist to the mail servers I run and it's stopping around 85% of the spam by itself.


greylisting isn't a bad idea for a personal box, but it's a really really bad idea for heavily used multiuser systems.

That all aside, spammers are created by stupid people who will buy things from them. Deal with the root cause.



Captcha's are probably dead. all its needs is a cheap place to live with a cheap internet connection and someone will do the work for peanuts.

I'm not sure how many peanuts you can buy for 60cents but for the UK equivalent of 35p I can get 50g or 2 oz worth.


That all aside, spammers are created by stupid people who will buy things from them. Deal with the root cause.

How? By selling cyanide packaged to look like cut-price Viagra?

Oooh... Now THERE'S an idea for a short story...


Purdue University, where I work, deploys a PureMessage spam filter. It misses upwards of 10% of the spam, but almost never classifies a message as spam that is not. (I've looked over the headers of 1000's of messages it classified as spam and have yet to find an error.) Then client-side I just use a standard mozilla spam filter to toss suspect emails into the "Junk" folder. I can look through those once a day with no difficulty. My spam-load is probably an order of magnitude less than Charlie's, but I think this two-tiered method might scale well. Of course I don't know how much Purdue pays for PureMessage...


Graylisting for email is AFAIK used by a number of very large sites and mail providers. The main issue for big sites is that you need to use a (largish) shared database for handing their volume of mail, but that has much to do with extra problems that result from a good server hitting a different load-balanced server or backup MXes when retrying. (If you're running a really big site, you're already running numerous databases, one more isn't a big deal.)

If you use a long timeout on the "seen-it" database entries, after you've run it for a little bit there are no delays for mailservers that regularly send you good mail. If you want to be extra-cautious, you run it for a while accumulating database entries of valid server-recipient tuples before you start using delays for senders not in the database. I never did implement graylisting, but most mail admins who do have loved it.

Another technique that works quite well is delaying before mailserver banner, especially if you use a longish delay for IPs which look "dynamic" - cable and DSL pools, etc. For further fun, randomize the delay within a range. As with graylisting, an RFC-compliant mailserver will have no problem with this, but spamware botnets don't want to wait around. This I did implement and it worked quite nicely.

One technique you really want to avoid (one of many) is "sender verification". That results in what looks like a dictionary attack on other servers if the spammers use made-up names, while not making a great dent in the spam because so much of it uses valid victim addresses as the sender.

On the server side, SpamAssassin is kinda OK but only if you use it with the optional network tests and the Bayesian DB enabled (yes, another SQL DB) and train it enough for the Bayesian filters to really kick in. Without that its discrimination is only so-so and it has an unacceptable FP rate, IME. Also, it's a resource pig so you want to pre-screen as much as you can before getting to the possibly-OK mail you run through SA. The more accurate systems I know of are proprietary.

I was doing anti-spam server s/w as my main job for several years, so these are based on professional experience.

Blog spam poses a whole new set of problems, though. Have you tried running URLs in posts against SURBL, URIBL, etc. as part of the pre-screen? These are essentially DNSBLs of hostnames appearing in email spam bodies; a lot of the same sites show up in email spam and blog spam. Anything hitting those lists you needn't bother to look at.

I seem to vaguely remember some post about blog anti-spam measures where one guy found the best method for him was to move the post CGI to a new URI, and add another fake "submit" button on the page which looked to spambots - but not to people - like the one to use. Messages posted with the wrong button or with the standard CGI URI went nowhere. No URL ref to give you, sorry.


That is one of the huge benefits of wordpress, getting to use Askimet for blocking spam. "Akismet has caught 1,647 spam for you since you first installed it." and only one that I know of was a false positive


Christian: there's an Akismet plugin for Movable Type. Guess what I'm using? (Along with other stuff.)



About this Entry

This page contains a single entry by Charlie Stross published on September 6, 2006 2:30 PM.

Who's been listening on *my* line? was the previous entry in this blog.

Blair: The Final Countdown is the next entry in this blog.

Find recent content on the main index or look in the archives to find all content.

Search this blog