Back to: A creativity game | Forward to: Gadget patrol: the new iPad

Spam awareness 102

The spam has gone exponential.

I have just shut down comments on all but the last five blog entries. And I am switching off the ability to post comments globally for 24 hours. Normal service will be resumed tomorrow, i.e. on Saturday 17th around 4pm GMT (lunchtime if you're in New York). I am crossing my fingers and hoping that the shutdown will cause the spambots to give up and go away. It's a forlorn hope, I know, but it's worth a try.

If you're wondering what the 500 Server Errors are about, or why comments are currently closed, it's because the spam was coming in so thick and fast that the blog software couldn't classify it and bin it fast enough. None of it is getting through (it's really simple-minded stuff and easy to auto-detect and shitcan) but it clogs up the server's CPU and disks. In particular, this blog runs on Movable Type and MySQL, and has a dual-core 2.4GHz Athlon box to run on — about half the power of my last laptop. And it is being hammered. The trash can auto-empties after 30 days (yes, Alan, I tweaked the threshold) and currently holds a whisker under 59,000 spams; however, all but about 1-2000 of them have come in the past two weeks.

(If you want to contact me wrt. spam filtering, try tweeting to @cstross. Note, however, that I've been running this system for more than six years, I have expert help, and I think we've explored most of the options for securing a movable type installation.)



Testing ... (ignore me)


Tentatively re-enabling comments to test a hypothesis.

Comments may go away suddenly if I'm wrong ...


OK, comments are back again now.

Let's hope the spam storm has subsided somewhat ...


... Aaaaand the spam has dropped off to about one every 4-5 minutes, which the server can handle. I'm pretty sure it's the same pestilential botnet that's been plaguing me for the past month.

(What made me reach for the big red switch was that I think a second botnet had joined in, and was hammering the server every 10-15 seconds. That one appears to be badly written because it was (a) sending HTTP POST messages direct to a hard-wired URL, and (b) was hammering so hard that they weren't having time to complete, and attracted the sysadmin's attention. But switching off the comments overnight and renaming the comment script seems to have gotten rid of it ... I hope.)


Hooray! This certainly seems to be fully responsive again


I suspect moving the comment script may turn out to be a regular chore if the spammers are that dumb ...


Why are they doing this? And who the hell they are, anyway?


Spam may involve a surprisingly small number of people, with surprisingly little knowledge, and an unsurprisingly small number of operating system suppliers.


They're spammers. Why do you need to know more?

Email spam is declining because most folks don't rely on small mail servers any more -- they use Gmail or Hotmail or some other fiercely-policed system which can use statistical methods to block spam. (For example: if the same message body goes to a bazillion accounts, none of whom have received input from the sender before, and if the sender isn't a mailing list they've subscribed to, then it's probably junk.) So most spam (over 90% of email traffic) is never seen by the targeted recipients.

But the decline in email spam doesn't mean demand is declining. Spam is lucrative; it's so cheap to send that even a 0.0001% conversion rate (one in a million) is highly profitable.

So spammers are turning to other communication channels. Facebook is policed reasonably well, but spam on Twitter is rising ... and meanwhile, blogs hosted on small independent sites that host discussion threads are a promising target. Moreover, blogs like this one that have a high-ish Google pagerank (pagerank is a log function of how authoritative you are, on a scale of 1-10; this blog is usually around a 5 or 6, giving it tens of thousands of times more mojo than your typical random no-name blog) become desirable targets.


Incidentally, the spam appears to be word salad generated by a Markov-chain process fed with comments from other blogs, into which URLs are dropped at random.

Here's a sample of the typical word salad (spammy links removed by yr. moderator):

Changing diapers is every parent’s job,abercrombie, and most of us do so happily with tender loving care,burberry, however when it comes to cleaning old fashioned cloth diapers,adidas f50, there is a problem. After complete rinsing of the contents,abercrombie france, throwing the soiled diapers into the family’s washing machine is repugnant to many. Those who do so though understand that they need to kill the bacteria contained in those diapers and thus have to resort to adding chlorine (bleach) to the wash. Unfortunately although the diapers come out white and stain free,nike tn pas cher, the bodies of their babies are subject to the harsh realities of chlorine-laden diapers.
Thus,louboutin, parents have switched to disposable diapers,air jordan pas cher, but upon discovering that even disposable diapers have been created using chlorine to give the paper products that ultra-white color that parents prefer,chaussures louboutin pas cher, parents have diligently searched for alternatives. “Natural” paper products are tan or brownish in color,borse burberry, and many objected to the color. Thus manufacturers of disposable diapers began making chlorine-free diapers that were white in color.This was no easily attained feat,ralph lauren, as white paper is bleached to attain that pristine white quality. However modern ingenuity found alternatives that are not harmful to the environment as chlorine bleached paper is, and babies are no longer trapped in products that contain harmful toxins against their tender skin—toxins,nike tn, in the form of dioxins that have been directly linked to producing cancer.

This shit (actually a lot more than this two-paragraph extract) is pounding the server every couple of minutes, day and night, larded with very dodgy-looking URLs leading to malware sites or other spam-infested blogs or web storefronts who have paid the spammers to optimize their search engine rankings.


Note that my iron rule of links -- any blog comment with >1 link in it gets held for moderation; anything with >4 links is thrown in the spam bin -- kills this stuff dead, because each paragraph of word salad typically contains 2-4 links.


I did used to think that it would be possible to choke off some forms of comment spam. THe ones with identifiable shop front urls in them would seem to make it easy, you just target the shop owners with some sort of legal threat.
Of course they could be the victim of a competitor doing it to make them look bad.
Plus a lot of these websites look like shop fronts but don't seem to have any more reason to exist than to attract clickthroughs which enable them to claim advertising money from people who pay per click. They just get in the way of actually finding what you want to buy. Which then drives people further into the arms of the already known sources of online goods such as Amazon.


It's not going to stop until there is consistent international enforcement and spammers get serious jail time. Even politicians are now becoming aware that spam is not just some "schoolboy prank" that annoys their secretaries. That it is actually one prong of a national security nightmare. If and when there is a successful attack on our infrastructure that results in deaths, spam and botnets are probably going to be implicated.


It's not going to stop until there is consistent international enforcement and spammers get serious jail time.

Not gonna happen.

Spam isn't a serious political problem like, oh, censoring alternative political points of view or suppressing file sharing that chews into the MPAA's wallet (which in turn pays the politicians).

Spam is tending to be more and more of an organized crime activity these days, though, which suggests it's coming up on law enforcement radar. (This is because spammers these days rely on rented botnets, which in turn rely on computers compromised by malware, which is a law enforcement issue.)

Unfortunately, the poster child for a law enforcement solution to a nasty marketing activity is the War on Drugs.


Spam is good honest capitalism, and should of course be called 'Direct Marketing'. File sharing is communism, and costs billions of jobs every week.

And if you commies don't like that then build your own internet. Without good old American private enterprise, we'd still be using Telex.

(disclaimer- this post not fact-checked)


That's why it needs a trigger event like an attack on infrastructure that costs millions and kills people.


Hmmmm. I keep thinking about how cyberwar relies in part on things like botnets. Considering how many TLAs are bickering over some bit of the cyberwar military-industrial pie, I don't see anyone taking a coherent approach to spam any time soon. The interesting part is whether cyberwar even is war in the classical sense (e.g.

As a separate note, I like pointing out that most of the biological species in the world are parasites of some sort. Based on what's happened in biology, I doubt that getting rid of parasitic systems is possible. Living with parasites is doable, so that should probably be our goal online or offline.


Possibly the best method of reducing automated spam I've seen for blogs and forums is called "hashing and honeypots". Basically, you have one or more fields that are hidden (perhaps with CSS, perhaps with type="hidden", etc.) that have names like "email" and "url", and then you have the real email and url fields with random names (changing with each request). You might throw in a couple of extra randomly named fields as well.

Then, if any of the hidden fields are filled in, you just discard the message. (The labels for the fields would have things like: "please leave this field blank", for people without CSS.)

If you read that entire post, it's actually a bit more indepth, but the basic points are given above.

Talking of statistical analysis, isn't that what Askimet and co. are meant to do?

(Sorry for the possibly patronizing comment. However, even if you already know about the above option, I like to let others know as well. I hope you don't mind me using your soapbox as my soapbox for a little while.)


Part idle curiosity and part preparing to run my own servers, but do you have any idea of the geographical origin of the spam? Over the years I've seen various suggestions that if you firewall out IP addresses assigned to the .ru and .cn domains you reduce your problems massively. Of course, this isn't an option if you want a global reach.


No idea where it comes from, but checking IPs is probably useless -- the spammers themselves don't post it, they post via zombie machines that have been compromised by malware, and they could be anywhere.


There was a time when blocking China's /8's would help a lot -- not just from infected machines there, but spammers were renting lots of systems there.

Last numbers I saw, however, had the US back with the most infected machines. Russia is in bad shape as well.

Oh, and IPv6 is going to make it ever so much more fun.


That particular spam didn't look like the product of a Markov Chain. It looked like a properly written article that has had some (commercial?) irrelevant words inserted every now and again. /pedant


And we've seen just how effectively some governments have reacted to such events in the activities of the TSA and New York Police in the US and the Metropolitan Police in London (especially "Operation Kratos"); </snark>

Going after spammers isn't a high-priority for law enforcement because of the transnational nature of the operation and the small number of people they can take on the perp walk at the end. Going after pedophiles is so much more productive: most people hate them or are afraid to say otherwise, and they make great television when paraded into jail.


Hi Charlie,

A little math tells me that your server is choking on an average of 3 spams a minute. Obviously the spam does not arrive at regular intervals, but this still implies some pretty fundamental problems with memory, CPU, etc. I'd guess that the server gets a big load of spam then starts using swap and bogs down.

Hardware is cheap.


You missed the change in style/tone when the original nappy-cleaning article ended and whatever-followed-it-before-I-hit-delete carried on.


Alex, it's more like five-plus spams a minute. And what cripples the server is that it's highly tuned for handing out static HTML pages, because 99% of the time when someone loads a page on this site they're not issuing a POST request to submit a comment but sending a GET request to reload and read a page. Posting a comment triggers a heavyweight process that invokes a bunch of plugins to pre-filter stuff, then triggers multiple socket connections to other servers (specifically, spam blacklist services), and then a whole bunch of MySQL activity including INSERTs. Finally, if the comment isn't flagged as spam, it triggers a mammoth SELECT operation to rebuild the entry (with updated comments) and then write the new HTML out to disk.

You seem to have forgotten that I run this as a hobby. I could commercialize, in which case the extra servers would come out of the budget, but then I'd have to get sponsors/advertisers, keep accounts for HMRC, and lots of boring and annoying shit: I do not want to run a commercial website any more than I want to be a commercial publisher. As it is, I can justify spending £1000 a year and a chunk of time (probably 50 working days a year) on the blog as a marketing exercise with no direct payback. What you're proposing, the next level up, would not only cost money (a couple of grand a year extra for beefier hardware) but also time. Or more money to pay sysadmins to do the work for me, an advertising sales manager to get in the ads, a bookkeeper to do the accounts ... all of which requires management input from me, i.e. time, only this time spent managing other people rather than un-bunging the toilet u-bend myself.

Finally, did you notice my comment about the spam load going up exponentially? Twelve months ago it was about five spams per day. Three months ago it hit five spams per hour. Throwing money at extra hardware (or MySQL performance tuning) or a cache for the static HTML pages (irrelevant as they're not where the bottleneck lies) will not solve the problem in the long term.


Why did I suggest new hardware? Because I expect your software bottlenecks will be solved by other people while you spend your time on something more pleasant/productive. I believe that the time vs. money equation will come out heavily in your favor.* If I had a six-year-old server that was behaving like yours I'd fire it out of a cannon. And I wouldn't stick around to watch it land.

"I can't handle 5 spams a minute" points to a whole host of really obvious issues,** but for you to diagnose and code/configure around those issues is a gigantic waste of your time. Get some new hardware, let someone else solve the problem(s), and enjoy life.

I'm definitely not suggesting that you turn your blog into a business. You'd write less and that would be bad.

*My hobby involves Ruby, Postgres, Apache, Linux, and (unfortunately) JavaScript, but you'll notice that I gave no advice on that basis.

**Which issue it might be I have no idea and couldn't care less. My suggestion is based on the idea that you shouldn't bang your head against that particular wall.


Are other open-comment sites with similar pageranks being similarly hit?


As most readers are aware, it's no great trick to hire more code monkeys. I don't know in detail what the IT scene is like where Charlie lives, but it's obvious that plenty of folks in his social circle (including him) have some skill in computers.

What cannot be outsourced is the ability to write cool Charles Stross stories; therefore the optimal strategy is to get someone or something else to wield the spam hammer.

Me, I don't care who keeps the spam out of the blog; I do look forward to buying more Stross novels. I suspect this is a very common attitude here.




But...the blog helps Charlie to find other people to buy from his backlist.

This is marketing, and it does work: I live in a world where there are no bookstores that stock Charlie's work, so until I stumbled on the blog, I had read none of it.

Now, I've read all of it, bought extra copies, recommended it to friends.

It's an impossible balance: do the things that people pay you to do, or do the things that get more people. I would much rather have more novels, and would never encourage Charlie to spend more of his time on running the blog, rather than writing it. He's right: the constraints are on time, not money.

I teach, conversational English to children, and listen every week to people who want to talk to me about how to run my business better. I don't WANT to run my business: it exists solely to make it legal for me to sit at a table talking. There is an illogical trade-off: I can worry about getting paid, or I can worry about teaching, but I don't really have time to worry about both.

Luckily for me, the number of people who don't pay is small enough to justify my not spending time on it. Charlie's not that lucky: spam evolves.

Thanks, Charlie.


This may come as a surprise to you, but I do NOT expect other folks to solve the software problem here, because it's an arms race and the adversary has more money and resources than any individual blogger or small open source blogging software company. A fix to the immediate problem will simply buy a breathing space until the hacking group within a corporation making multiple millions a year on blackhat SEO finds a way around it.

Been there, done that, with email: for years I ran my own mail server (and it's still there), but when the spam inbound for my mailbox every 24 hours hit an average of 20,000 messages -- spiking to 60,000 a day when bounces from Joe Jobs came in -- I had to give up and switch to using Gmail as a spam-washing service. SpamAssassin was killing over 99% of the spam, but it wasn't enough, and furthermore, the CPU load from scanning that much spam and rejecting it was crippling the server.

I see this blog spam problem going the same way: ultimately it'll be self-defeating, but before it gets that bad it will kill the small independent comment sites.

I can keep blogging without comments, but something tells me you don't want that. Right?

Nor am I terribly happy about paying a sysadmin or programmer to try and fix the problem for me because such professionals earn within the same order of magnitude as I do; I'd expect to pay multiple hundreds per day of work for a contractor, and as I've noted above it's not a wave-magic-wand problem-fixed-within-four-hours job.

If I really wanted the problem solved I'd go back to basics, sit down to make a plan, then form a set up to write and market a secure, spam-proof web discussion forum. I'm pretty sure I could do it, too, but I'd have to say goodbye to around 5-10 years of writing time in the name of founding and operating a start-up ...


You could just require registration and approve new users first posts before allowing them to comment freely. This will make spamming your blog too much work.


Yes well, the last MT security update broke sign-in. And I can't track down why (despite some skull sweat on the problem). And it's a mandatory security update to avoid some rather nasty exploits ...


Or a simple answer the question box, which must be correct, otherwise the thing dumps?

Such as: "What's Two plus Two? - And answer with a numeric please!"

Any reply that does not contain "4" in the box gets dumped ...
Is that too simple, or is implementation too difficult, or what?


Well, soon enough spam bots will start to pass Turing test, and we'll find out if they are the true AI...


I think this is the appropriate reply at this point.


We can but hope.


The future looks like this: "If we bring your mother back now it will only be for one day. And you will never be able to see her again. But wait. There's more!"


What worries me is that I've received 3 text messages in the past month which are spam. Which doesn't sound like much, until you cast your mind back to the days when a dozen spammers were terrorizing the internet, sending *thousands* of e-mails a week.

Does anybody know of text-blocking apps for the iPhone? (asks the lazy guy)



About this Entry

This page contains a single entry by Charlie Stross published on March 16, 2012 4:14 PM.

A creativity game was the previous entry in this blog.

Gadget patrol: the new iPad is the next entry in this blog.

Find recent content on the main index or look in the archives to find all content.

Search this blog