Back to: Going dark | Forward to: Arrive alive

Admin note

We've picked up another spammer botnet again. Blog spam is now coming in at around one every ten seconds. I am therefore switching off comments on all threads for a while until the scumbags stop tormenting my server and go away.

(The spam is being detected and binned. Unfortunately the process of doing so is machine-intensive — it involves running a CGI script that polls the Akismet servers over the internet, does regular expression matches for spam keywords, executes a SQL database insert, and a bunch of other stuff. There's no way around this bottleneck: it's a side-effect of the blog being designed to support many orders of magnitudes more people reading it than actively posting comments, and it works okay until a botnet with thousands of hosts starts firing spam at it every couple of seconds.)

Based on past experience the spammers were attracted by my essay on what Amazon's ebook strategy means, and they'll go away after a few hours if I block all inbound comments.

Normal service will be resumed presently ... meanwhile I can be found at @cstross on Twitter.

Update: re-enabling comments. Will switch them off again if the spammers are still hammering on the front door.

31 Comments

1:

Since you've seen a few of these come and go now, do you have any sense of what the trigger mechanism might be? Volume, keywords, ???.

2:

Makes me wonder about a pseudorandom human-detection front end to the submit, before spam detect, during high load (likely spam flood) conditions. Not captchas but say a "click on the image that says "human" "... Just a little algorithmic path change here and there...

3:

I suspect the trigger event is a blog entry that rips around the internet and brings the blog to the attention of a bunch of spammers.

Right now, we get hit by bot-spam once every 2-10 minutes. But -- and this has happened a couple of times now -- if I have a major viral post, the frequency ramps up to one every 5-10 seconds (at which point I run into trouble). Switching off comments globally for 6 hours and closing old threads seems to get rid of the newcomers.

My posting about Amazon's business strategy went crazy (linked to by Bloomberg, ZDNet, Forbes, Publishers Weekly, a bunch of libertarian watering holes, and so on) and got nearly 150,000 page views that I know of -- it's also been translated into Portuguese and Japanese to my certain knowledge (and maybe other languages by folks who weren't as polite). That's guaranteed to make it an attractive target for spammers, and a certain proportion fail to notice the "rel=nofollow" appended to all the links in posted comments.

It's not that spammers are stupid; it's just that spamming is dirt cheap and the ROI -- idiots who click through the links and buy stuff -- is high enough to encourage them to keep on spamming.

4:

I think I said something similar last time, but why not add some extra ('honey-pot') fields and randomly rename the fields you do have?

It must surely be less machine intensive to block machine spam at the door because they entered something in a field that should have been left blank, than to send it on a round trip around the Internet.

I also wonder, were you requiring JS before this event? (I can't recall, as I rarely comment.) If so, any ideas why it didn't do anything? (The spammers were executing and running the JS?)

5:

Ever tried hacking on the Movable Type code base?

(Every time I look at it my brain tries to melt and run out through my nose. CGI.pm it ain't.)

6:

Ah, I thought you said you had an expert for that sort of stuff. I maybe mis-remembering.

(And no, I've never even looked at MT. Mainly because I like to hand-roll half-baked software for my own use, and suggest WordPress for everyone else. I'm more than happy to knock up a WordPress plugin that'll do what's needed though. Also, isn't MT Perl? I've never tried to learn Perl. To easily scared by the "write only language" propaganda ^_^.)

7:

TurdPress is implemented in PHP. I refuse to let PHP onto my server.

(Perl I used to do for a living. It's not write-only any more than C or C++ are write-only -- you have to enforce decent coding style, but that's all.)

8:

Kinda off topic question, but belongs here better than in your regular blog posts:

Why doesn't antipope.org have a DNS entry that points here? I'm of the dying breed that types addresses in the address bar and having to type the useless www. in the beginning becomes mildly annoying after a few days of checking your blog :)

9:

Because I can't be arsed adding one?

(Some time I'll try and add a CNAME or a default for the domain. Then I'll have to go wrestle with Apache, though, which is configured to handle multiple domains. Which will annoy me, because I've let those skills atrophy because they are orthogonal to what I do for a living these days.)

10:

Mrrr apache and multiple domains... okay, i'll just try and remember to type the 'www'. Maybe in the 202x decade i'll even start using that nifty bookmark feature...

11:

If you add an A record for antipope.org pointing to the same place as www.antipope.org it's reasonably easy (

12:

(Oh dear oh dear oh dear. I didn't escape my "<" in the previous comment.)

...it's reasonably easy (less than 10 lines) to configure Apache to return a permanent redirect from antipope.org/whatever to www.antipope.org/whatever

13:

Hmm I posted in the Single Point of Failure article but my post hasn't appeared. It was waiting approval because it had a link in it but the spam attack hit before it could be, any chance it got lost in the filter?

14:

I just type the letter A in Chrome's address field and it takes me directly to Charlie's blog at http://www.accelerando.org

I prefer using that instead of the antipope URL, though when I feel nostalgic (from before the accelerando domain addition), I may bow to the antipope.

15:

i'll just try and remember to type the 'www'.

All I have to type is 'ant' and the summary page becomes the first choice. Down arrow then enter.

16:

I quite liked those JVP comments. Not quite off/on topic. Most interesting.

17:

www.antipope.org reports itself as running a Debian+Apache 2.2 combination. Charlie has talked about his server specs in the past, and that it is not a very big piece of iron. The site runs on Movable Type which is written in perl.

Apache 2.2 is slow. Perl, alas, is not the fastest interpreter. Charlie's server is plenty of iron to run this site for the normal visitors, but it is also not spec'ed to run the site during heavy traffic spikes.

The one spam comment every 5-10 seconds may become manageable with honey pot fields, or it may not. One every 5-10 second certainly doesn't sound like as fast a bot can go (which makes me wonder if it is as fast as the server can go, and the honey pot would just mean rejecting 5 every second and still be overloaded...).

I think the "block the comment script for six hours" is probably the best way to manage this right now.

(Ok, this sysadmin is going back into Lurk Mode. Safe travels this week, Mr. Stross.)

18:

Jeff, Charlie wrote: "Blog spam is now coming in at around one every ten seconds."

I suspect that this is an average, and single spams 10 seconds apart should not be a problem for Charlie's system. The problem is that the spams don't come in "every ten seconds." That figure is almost certainly an average, which means that several spams could come in within a couple seconds. Since Perl is an interpreted language, that means several interpreters running at once, and it gets worse if one of the anti-spam servers is running a little slow for some reason, because all those Perl interpreters are just waiting around until their programs get an answer. Pretty soon the system is running with swap memory and apache is throwing out 500 errors...

The real problem is that MT doesn't have a queuing system.

19:

Quick aside. Charlie, just wondered, what's your beef with PHP?

Just asking because. I'm no programmer - dabbled with PHP a bit. I'm mainly focussed on networking, regarding IT. But as I might have a lot more free time soon. (upcoming redundancy.) I fancy more seriously getting back into trying programming. Mainly in terms of web dev. Plus Perl for the network admin scripting stuff. FWIW.

20:

I can't speak for Charlie here, but this recent article is typical of many people's views on PHP.

21:

Actually, my site's bulletproofed to handle very heavy traffic spikes ... as long as the traffic is read-only. It routinely gets 150,000-300,000 hits a day, and it's survived one bombardment that hit 300,000 hits per hour -- although I had to switch off comments and unpublish all comments on the thread in question to reduce the page download size.

That's because the blog itself is just static HTML plus a bit of javascript, some style sheets, and a couple of graphics. It's small enough that the main content probably sits in the box's buffer cache the whole time, unless it's taking such a hammering that it goes cpu-bound and runs a couple of hundred instances of everything. (Highest load average I've actually witnessed was north of 70, back during the really heavy traffic spike last year. That'll nail ya, every time.)

What cripples it is when the comment system gets hammered -- either by spammers or, rarely, by lots of real people trying to comment. The comment system is dynamic, has to wake up a MySQL server (running on the same machine) to do various SELECT and INSERT operations, then re-builds the entire fricking page (if the inbound comment is actually publishable and not spam). Along the way it also polls several blacklist servers, which entails yet more traffic.

Perl is actually very fast, as interpreters go. The trouble is that the MT comment system isn't optimized for handling high volumes and doesn't seem to have any fast-track way of rejecting obvious junk. When I get a couple of days free I'll see about adding a honeypot field, but the MT manuals are opaque -- they don't go out of their way to make it easy to modify -- and my Perl is rusty (I used to be highly proficient, but "use it or lose it" takes its toll over a decade).

22:

I think that #7, and the fact that most people I know refer to that software by that name or things like "Wordpretzel", "Wierdpress"... should say it all.

23:

(Thanks to whatever mod found and approved my comment)

Off topic but I'm not sure where else to post it; for a few weeks now I haven't been able to see peoples avatars. Older posts still have Charlie's picture next to his posts but newer ones don't. Anyone know how to fix or is this something with the site and not my browser?

24:

I think it's site/set-up related. Bellinghman has recently reacquired his avatar if you look at posts on hte last 3 or 4 threads and from the last 3 or 4 days

25:

It's the site, not you. When Charlie did an emergency update of the MT software, the sign-in code broke. The result is that the bit of script that ought to be doing the sign-in from a comment page is broken, and is trying to execute something which just isn't code. (If you're running Opera and watching the error console, you can see that. My current suspicion is it's trying to execute a 404 page.)

When you see my avatar or Sean Eric Fagan's on newer posts, that's not because we've managed to sign in through that portal. It's because a handful of people have another way to comment which is not, and will not be, generally available.

26:

"I suspect the trigger event is a blog entry that rips around the internet and brings the blog to the attention of a bunch of spammers."

'and some stats'

It's probably a bunch of software, running off a delta-volume trigger. The math doesn't really work for meatware even if we double the hit count, and hold time to, say, 24 hours. Which implies that the economics may be even further against you than you may have thought.

Bad news, but useful for capacity planning. Or might there be a simple way to lie to their scraper? Most are fragile.

27:

Regarding PHP, I have no complaints with it. Well, no major ones. Then again, the two times I've tried to learn C... I, well, haven't. I used to be quite proficient at Java though...

Also, I don't really understand the WordPress hate. It works. It works well if you keep it updated and feed and watered well. And, this is what I love, it's really easy to change things by adding a plugin! Is it just the PHP?

I can certainly understand the analogy given by the linked article above though... (I didn't read all of it, too long. But I do agree with many of the issues I did read.)

Ah heck, I don't want to defend PHP, except so far as to say that I use it, it works for me, and there is a very large amount of infrastructure that works with it. If you don't want to use it, I'm not about to complain.

28:

Really, as a n00B with coding. (aside dabbling and a bit of BBC Basic back in the day.) I take on board the critisisms I've now read over PHP. It still looks like one I'll persew for now. It's sheer ubicquity and relatively easy learning curve apeal. Yeah, I know this probably makes seasoned developpers compitent in other languages seithe but as I'm unlikely to be writing my own CMS from the ground up any time soon or mission criticle apps, it's something I can use fairly quickly once I get back into it. Though I've got my eye on trying Perl too.

29:

On a somewhat related note. Has anyone every seen a legit email from a .info domain? Says the email admin that tends to just block them out of hand.

30:

Hmm. I might, but don't hold me to it.

However, I do buy cookware from a domain in the .biz domain, which I'd normally consider even more dubious.

31:

Considering that my email is at a .info ... Well, actually, I'm not sure whether or not I've actually received a 'legit email' from a .info domain (that wasn't from me), but I've certainly sent them...

Straylight: I wouldn't be so shy about writing a CMS from the ground up. It isn't that hard if you are willing to have something that may not be 100% secure.

Also, if you want another language, I suggest Python. I've not played with it for, ooh, more than 10 years now, but I have fond memories of trying to make a 'Robots' game. (I never did finish it though.)

Specials

Merchandise

About this Entry

This page contains a single entry by Charlie Stross published on April 21, 2012 10:41 AM.

Going dark was the previous entry in this blog.

Arrive alive is the next entry in this blog.

Find recent content on the main index or look in the archives to find all content.

Search this blog

Propaganda