Back to: Please try to remember this | Forward to: Buy the book. Feed the cat.

Spam storm rising

A spam botnet is torture-testing the server again, so I'm shutting down comments until it goes away.

Update Comments may be up or down depending on how irritating the spammers get. I just re-enabled them. (Load average on the server is currently 0.01. At the point when I disabled comments the load average was spiking towards 10. It's a roughly logarithmic indicator of how heavily the machine is loaded ...)

In other news: Over on SFSignal, here's a round table discussion on literary and speculative fiction by me, Elizabeth Bear, Bradley Beaulieu, and Rob Ziegler. Lots of chewy opinions here. (And yes, I had a brain fart and confuse "Stand on Zanzibar" with "The Shockwave Rider", and nobody caught it before publication.)

39 Comments

1:

Testing ...

If you can see this, commenting works and the server isn't overloaded.

2:

Load average is logarithmic? I'm pretty sure it's a linear measure of the average length of the process run queue. Smoothed (with three different cutoff frequencies for maximum confusion), but definitely linear.

3:

What triggers the spam (if anything)?

4:

IMHO, the problem isn't spam (as such.) I suspect that one of the anti-spam servers Charlie uses sometimes experiences some latency, which causes an anti-spam program to stop running until it gets a reply. If this happens a couple times in a row, a whole bunch of processes stop because they're waiting for data, and pretty soon his server runs out of memory, at which point Apache has a little tantrum...

Perl tends to be a memory hog, and each process probably starts a separate Perl interpreter. If I recall correctly, a modern Perl interpreter uses about 100M of memory, so things would pile up pretty quickly.

Moveable Type needs a queuing system for it's spam.

5:

It's a count of the processes waiting for CPU time according to the scheduler. First column is instantaneous, second is averaged over a minute, third is averaged over (3, 5, 10 -- I forget) multiple minutes.

The thing is, as the load average rises the scheduler spends more and more of its time swapping processes in and out. A load avg. of 2 is significantly more worrying than a load avg. of 1. And going from 0.1 to 1.0 and then 10.0 is roughly equivalent in terms of how much more congested your machine is becoming. By 10.0, there's noticeable latency in doing stuff like executing scripts or typing on the console. By 100.0 ... you don't want to ever go there. Trust me on this.

6:

No idea. I suspect a bunch of spam marketers compile lists of IP addresses of known hosts running particular types of blogging software and sell them, along with access to a (illegal) botnet running software that can pound on CGI scripts repeatedly if given a bunch of advertisements and targets. It's possible (based on past experience) that they're using hardwired IP addresses, so changing to a new server might help ... or not.

7:

So its not correlated with words like that big S American river?

8:

I don't think they're that sophisticated.

The aim of blog spam is to generate inbound links from high google pagerank sites that point to the spammer's sales pitch, thus driving up its google pagerank (i.e. how high it ranks in google searches).

This is actually futile: all comment links on this blog have the "nofollow" attribute automatically added, which tells google's spider to ignore them. So spam links on this blog don't benefit the spammers.

But the cost of spamming is so freaking low that it's still a cost-effective form of generating sales leads and there's no point in pre-filtering the targets or tailoring the spam to blend in with the comments by being on-topic.

9:

That's a great conversation, Charlie. Thanks for linking to the transcript.

10:

I remember, back when I was at university, having access to a machine named (appropriately) Yoyo. This was a general access Unix system: any student at the university could get an account if they wanted one.

It was a DECstation 2100, if I remember rightly. One (possibly apocryphal) story said that the sysadmins called up DEC, and said, "We're running this system with 3000 users, and we want to increase the number of users we can get on it. Can you help?" DEC came back with, "You're running that system with HOW many users? Mind telling US how you did that?"

They eventually persuaded the computer centre to get them a DEC Alpha, which was far more capable at handling the load (that would have been around 1996, I think.) I believe that they're now running a FreeBSD host.

But yeah - I distinctly remember the load average on one memorable day being around 300. That was very definitely a "give up and go home, you're not getting anything done on this box" moment.

A lot depends on the system design, though; I could see a fully specced p795 coping very well indeed with a 200+ load average. Granted, that would be massive overkill for a simple web server ...

11:

There's no craftsmanship in spam these days - the efforts that appear in my spam folder are really lazy...

12:

You must not be filling in enough consumer information surveys, if you fill in enough of those some small fraction of the spam you get will actually relate to your specified interests... or at least mention those key-terms in the email body. :)

13:

True enough. My old work account gets a metric shedload of scientific spam, so i suppose some of it is targeted. Still poorly-made, though. Saying that, some academic spam has been rather successful, especially the fake conferences.

14:

A few years ago I read that most of the spam was from one ISP. It was cut out of the net and spam dropped %80. HUMMMM.

15:

Could you use the load average connected by script to the comment blocker as a spam filter?

16:

(Sorry about the double post, I got a comment error message first time)

17:

Thanks for linking to the round-table, Charlie. I found myself wanting to take your 'props department' essay and slap every other living SF&F writer upside the head with it, repeatedly.

Not at all a non sequitur: the other day I caught myself wondering how Liz Kavanagh was getting on... first time ever for a fictional character.

18:

From previous experience when I let people comment I suspect that either there are search engines dedicated to finding suitable pages, or spammers just use Google or another search engine to find pages with the right keywords and pass them into a script.

At one time I had a comment feature on my site and I renamed the comment script every week or so. That kept comment spam down to a dribble - if I left it for much longer I got a peak that continued with (failing) attempts at the old script for several days when I did rename it.

And yes, the relationship between number of processes running and system responsiveness is /very/ non-linear.

19:

On the role of modern speculative fiction, I remember a comment in one of your recent threads, where the commenter said that the people he knew who understood quantum cryptography didn't read SF, and the people he knew who read SF didn't understand quantum cryptography. Similarly, I wonder whether people who want to understand the future really read SF, and whether people who read SF really want to understand the future. If these groups are all disparate, then an author trying to use fiction to examine the effect of today's cutting-edge science on the future is facing a challenge in finding an audience (unfortunately).

To take a particular example, in all the stories about uploading personalities into computronium, how many have actually looked at the world-wide web today and applied its lessons? If a personality can be rendered digitally in the future, then it will become analogous to an MP3 file now: one of many digital objects that can be copied, indexed, downloaded, etc. So where is the Google search engine, the all-pervasive advertising, the porn, the breach of privacy, the piracy? Most of this fiction just seems to assume an idealised playground of wishful thinking.

As an exception to the rule, the opening chapters of Tony Ballantyne's "Capacity" include piracy, porn and privacy invasion of a digital personality. It's a playground, but not a pleasant one. But are many people actually interested in extrapolation of this sort?

Can you, Charlie, or other commenters, recommend books that are doing this hard-nosed extrapolation? (As well as OGH's own works, of course: I enjoyed reading "Rule 34" recently).

20:

To take a particular example, in all the stories about uploading personalities into computronium, how many have actually looked at the world-wide web today and applied its lessons?

For a novel on this subject, you might want to have a look at "The Rapture of the Nerds" (by one Cory Doctorow and Charles Stross -- due out September 4th). Oh, and Hannu Rajaniemi's books get it, too ("The Quantum Thief" and "The Fractal Prince" (due out soon)).

See also "The Life Cycle of Software Objects" by Ted Chiang.

21:

Thanks, Charlie. "The Quantum Thief" is in my pile of books to read, so I should get to that soon. I'll look forward to "The Rapture of the Nerds".

22:

If a personality can be rendered digitally in the future, then it will become analogous to an MP3 file now: one of many digital objects that can be copied, indexed, downloaded, etc. So where is the Google search engine, the all-pervasive advertising, the porn, the breach of privacy, the piracy?

The earliest example I can think of is the background event of The Phantom of Kansas by John Varley, 1976. I expect there are others.

The privacy concerns that a personality decompiler would present are obvious, starting with one's passwords and credit card numbers and only getting worse from there. I hope someone can come up with a defense before such a thing is developed. Running multiple copies seems hard to prevent; various obvious safeguards come to mind, most of which could be hacked around fairly easily if you've got access to the source code - just like DRM today.

We may need a Spammer Jihad before then, although I don't see it coming very soon. They're still at the level of damn nuisances but haven't yet incited mob violence. Then again, no politician seems to have noticed them as a bandwagon target yet either...

23:

Why not just write a script that changes the name of the comment script when the spam level gets too high?

24:

I liked Rob Zeigler's idea of "get-off-my-lawn-punk." I instantly decided that it was science fiction told from the POV of the terminally future-shocked.

25:

I can quite imagine a Spammer Jihad when some spammers botnet gets to spread a repurposed stuxnet whose effects involve killing people in hospital ICUs, shutting down traffic control systems in cities, causing power outages or crashing the odd airliner etc.

26:

It's been asked before, but would a fourth box in the comments-section with a very simple question, as a required field cut this down, or not?

And, if it would, is the reason OGH has not implemented it, is that it would otherwise screw with the format/presentation of the blog?

27:

Wouldn't work. The spambots scrape the blog pages and submit requests to whatever URL is indicated in the form. To shake them off would entail a full-scale rewrite of the Movable Type comment system to generate one-time field names or something. And the MT code base is so baroque I can't get my teeth into it.

28:

Maybe someone who does know the codebase would like to earn a "bug bounty?"

I'm in for $10.00.

29:

ObSheesh: Furrfu!

30:

It's not a cost-effective form of generating leads. The problem is that stopping isn't cost-effective. Stopping botnets that are only weakly controlled involves work.

Agree with someone downthread that spam isn't getting smarter, rather cheaper. I personally suspect it's evolving away from fraud against the recipient to fraud against the advertiser.

31:

Charlie @ 27 And you haven't got the TIME to do a complete (non-MT) re-write, nor the (effective) money, so you are stuck, for the time being, at least? Unpleasant. My sympathies

32:

See also "The Life Cycle of Software Objects" by Ted Chiang

I wonder how many times this has been ordered by someone who didn't pay close attention to the description and was looking at adding a software development book to their shelf.

33:

For those who follow the linked link to Moorcock's "Starship Stormtroopers," a footnote: while I agree with much of his characterization of Astounding and SF's implicit ideologies in the 1950s, "Starship Troopers" was not serialized there as he asserts, but in Fantasy & Science Fiction -- which had less of a penchant for Earthman's burden and slipstick uber alles.

34:

"I can quite imagine a Spammer Jihad when some spammers botnet gets to spread a repurposed stuxnet whose effects involve killing people in hospital ICUs, shutting down traffic control systems in cities, causing power outages or crashing the odd airliner etc."

Given the likely sources of stuxnet, I can see the governments involved looking for decoys if that happened.

35:

"The Life Cycle of Software Objects" is a book about designing and building software. Granted, it's an extended metaphor, but it should sit on a bookshelf right next to the O'Reilly's.

36:

I was in the middle of 'The Shockwave Rider' when I saw this article. You can add epigenetics and laser printers to his list of technologies/science. I don't think either was around in the real world in 1975.

37:

@36: In 1966-1969, C. H. Waddington led Rockefeller conferences on theoretical biology at Bellagio, on Lake Como in Italy. Wolpert, Pattee, Delbruck, Thom et al on morphogenesis, differentiation, pattern formation etc. They were short on mechanisms, of course -- and far from any technology -- but they had robust beginnings of much of epigenetics.

38:

The roundtable was brilliant. I know this isn't a democracy but I vote for more of those regardless.

39:

It may be not a democracy, but it's still not a bad idea to let people know that you appreciate such things.

Specials

Merchandise

About this Entry

This page contains a single entry by Charlie Stross published on June 9, 2012 12:55 PM.

Please try to remember this was the previous entry in this blog.

Buy the book. Feed the cat. is the next entry in this blog.

Find recent content on the main index or look in the archives to find all content.

Search this blog

Propaganda