Back to: The monetization paradox (or why Google is not my friend) | Forward to: I have no life, so I must blog

Too many spammers

I have never seen so many blog spammers show up in a seething, pullulating mass as happened within half an hour of the previous post showing up on Hacker News. Debate seems to have run its course, so without further ado I'm closing comments there. If you need to talk among yourselves, feel free to continue here.

(In other news: exhausted, so taking the weekend off. Then I've got a short story to finish and a chunk of novel-outline to write and some more research for $SEKRIT_MEDIA_PROJECT to do.)

63 Comments

1:

Slashdot is worse, and it was so bad that I wrote a longish post on the subject:

Part of the reason the comments are so bad is the hit and run nature of comments, especially on larger sites. If you have something substantial to say, and particularly if you regularly have something substantial to say, you tend to get a blog of your own.

This is doubly true if you're linked to from a very large site.

2:

Jake, I'm not talking about hit-and-run commenters, I'm talking about vacuous one-liners posted by folks with names like "spyware detection" and "cheap sms". They're up to one an hour, and while the spam filter's getting most of them, an irritating handful have been getting through. Worse, the spamtards have been sticking around and looking for archives to piss in, forcing me to scurry around closing old threads.

Can we just castrate them with a blunt flint hand-axe and be done with the problem?

3:

There are websites out there that advertise "freelance IT work" and "data entry" jobs; when you read the job descriptions, they're clearly looking for people to post hit'n'run spam comments on forums and blogs (finding said forums and blogs by keyword searches, obviously). The going rate seems to be $10 per hundred and, judging from the feedback published on one of these sites, there are entire companies of Indian spammers competing for the work. So I'm afraid spamming is now a fact of life. Sorry, I've no suggestions for dealing with it.

4:

You might look into Disqus. Not sure they'd be any better than the spam filters you already have set up, but it's notable that Disqus is in the business of serving comments, and nothing else. Here you go: http://disqus.com/

5:

Castrating spammers with a flint hand-axe doesn't scale.

On the other hand, raising the stakes on the same outsourcing job boards to a princely $12 per 100 spammers castrated with a tool of ones choice might scale nicely. Surely we would get some of those Third World companies to take up that offer and set to work on eliminating their competition?

I suppose there might be some general policy problems with this approach, more's the pity.

6:

Most of my comment spam these days points to legit sites -- but unobtrusive corners of legit sites, such as user profile pages. I'd like to start sharing reports of spam URLs in a way that makes it easy for the owners of exploited sites to clean up bad profile pages and other spam dens. Basically, you start reading one RSS feed, and when other webmasters get spam pointing to a page on your site, you see a notification in your feed and can clean it up.

7:

Seriously, Clifton has a good idea. Outsource spam deletion.

8:

I have decent spam filtering; it's just that even a 1% failure rate is too high, if I'm asleep when it comes in. (I HATE spam.) The real annoyance is riding herd on the 500-odd older posts with discussion threads on this site and closing them down when some enterprising slime mold finds them and starts posting ...

9:

You think it's bad now, wait till we start getting AIs to do it. Seriously, every communication channel will need a gatekeeper. Every archive will be presumptively corrupt. We'll be like the Laputans in Gulliver's Travels, and we will only know things that can pass our spam filters.

10:

Another reason why you should sell links... google is the one who has created the situation in which the spammers think a link from your comments is valuable, as a way to game search: So why not profit from it, since you have to work to prevent it?

Disclaimer: I don't work for a link selling company but I have used them on my own sites, in a way it's the perfect advertising since a hardlink on a keyword bought months after the post was made is almost totally invisible to the regular human readers.

11:

I find that once any comment thread gets longer than the average screenful, the rate of repeated comments gets higher and higher (albeit slower in smarter forums). If people are interacting with the forum via email, they don't get to see that the argument they're posting has already been posted 5 times. Youtube only showing the last 10 comments makes its threads a repeating hell. I suspect 1280x240 ultra-mega-super-duper-widescreen diplays are exacerbating this problem :)

12:

Actually, thinking about it, the problem is not the Indian spammers, but the hirers. I'll pay a small bounty to have a crew find the hirers and prosecute them. The small payment represents less of my own resource, which is time. If you'll pay too, that might scale.

13:

monetization... if Dick's estate can sue Google for the use of "Nexus", how about Charlie suing the US Govt?

"The amount of drone footage is poised to grow exponentially. In the next year, the Air Force will outfit 10 Reaper drones with “Gorgon Stare” sensors. It’s a package of high-powered cameras that can film an area, two-and-a-half miles around, from 12 different angles. Eventually, the military hopes to equip drones with 92-camera arrays."

from http://www.wired.com/dangerroom/2010/01/madden-nfl-for-militarys-drone-operators/

14:

Charlie,

I have a couple thoughts from reading the previous thread to run by you. I apologize if anyone else already covered these, but I tend to just scan your comment threads by reading your comments and not everyone's. i figure if anyone writes something worthwhile you'll comment on it. So i guess thank you for being my own personal spam filter?

First, one of the points you seemed to make several times was that you are a writer, and therefore have neither the time nor inclination to self-publish via either POD or e-books. (self-publishing being inclusive of editing/marketing etc..) Do you think this role is not adequately being provided by your existing publishers? or to put it another way is there space in the market for a pure ebook or ebook.POD publisher to come in and steal talent like yourself away from the traditional publishing industry?

Second, you mentioned the inability of yourself to navigate a foreign legal system (the US) in order to enforce your IP. Is there a place for a company to license the rights to such IP and enforce it on your behalf, or through some profit sharing mechanism? maybe a venture legal fund similar to those in Cory Doctorow's book Maker's?

love the new blog format by the way, kudos to the designer.

15:

"even a 1% failure rate is too high, if I'm asleep when it comes in."

That's easy to solve. Delegate the moderation task to a few trustworthy people in distant timezones.

16:

If the Nielsen Haydens don't know, nobody knows. Why not just ask them?

17:

Alex@14: the independent epublishers exist, and they do world rights. Without DRM, even. They're small presses. Right now there isn't the market penetration on ebooks for them to be anything else, although there is much debate about whether the Kindle and/or Apple's rumoured Kindle-killer will change that.

There are some advantages to working with them, but believe me, if I could sell to Charlie's publishers, I would. And my current publishers would be very happy for me if I did.

18:

@ 13 (how appropriate!) Seen THIS? ... and how scary is it, really?

19:

Greg: SPS has been around as a pipe dream for decades -- it's only now that it seems to be (literally) on the verge of getting off the ground. The upside is that you get about five times the energy per square metre of solar panel, and by picking the right orbit you can get solar power 24 hours a day. The downside is that to get it back to earth you need to beam it (lossy) at receivers (what if the beam wanders or is hijacked?). It also requires extremely durable, high efficiency, ultra-lightweight panels that are cheap enough to mass produce. This latter requirement is finally on the edge of being met.

So on the plus side it's a genuine base-load supplying solar power option, but on the minus side it's a death ray.

20:

Alex: it's not that I can't navigate a foreign legal system, it's that I can't do it cost-effectively. I am not a large corporation, and I can't afford to throw money at international law firms to act on my behalf against minor commercial pirates who might individually be costing me on the order of £100-2000 each. It doesn't make economic sense. (The fear is that they might well multiply and, much as was the case with rogue printers pre-1830s in England, end up eating every authors' lunch after the first official printing was available.)

I'm actually pretty happy with my dead-tree publishers' activities, in the dead tree department. I certainly couldn't do better if I went out to do it myself. I'm less happy about their work in the ebook field, but they're still in the institutional learning stage. And ebooks account for under 5% of the revenue from books right now -- they'll still be under 25% for a long time to come (years, definitely: decades, possibly).

21:

Actually, it's getting truly uncanny how good the comments spam bots are at finding a topical chunk of text to include.

On the other hand, there's a serious flaw in the idea of a distributed linkspam filter; you could spoof it to eliminate major content sources you don't like from the blogosphere. You'd implement this by generating flagrant spam comments but including links to - say - RealClimate, Sourcewatch, Kos, my.barackobama.com or whoever in the spam payload. People would press the "report spam" button and send the comment to your bitbucket; quite possibly, and worse, automated filters would do the same.

Ideally, you'd want the attack traffic to be so spammy that it was shot down by the Bayesian filter before getting to a human reviewer, as they might detect the spoof. However, it's possible that including genuine links in the spam would tend to get it through the spam filters, which would be a bonus in the sense that you could benefit from the spamming, but might cause it to be discovered.

22:

Jake, I'm not talking about hit-and-run commenters, I'm talking about vacuous one-liners posted by folks with names like "spyware detection" and "cheap sms".

Ah. In that case, I have no idea where they came from, or why they struck. Wordpress' Askimet seems to kill almost all of them incoming to my site, FWIW, but it requires a wordpress.com account and integration to your site (and I'm not sure what this actually runs on).

23:

I'm using MT-Akismet, which extends Akismet to Movable Type. It kills most of the spam dead. Unfortunately, when a whole shit-storm lands on you, one or two fecal fragments make it through :-(

24:

Alex@21:

Ideally, you'd want the attack traffic to be so spammy that it was shot down by the Bayesian filter before getting to a human reviewer, as they might detect the spoof. However, it's possible that including genuine links in the spam would tend to get it through the spam filters, which would be a bonus in the sense that you could benefit from the spamming, but might cause it to be discovered.[1]

Maybe a silly question, but not a facetious one: how well do human reviewers detect spam vs the robotic kind on a typical sample? Does it come down to a je nais se quoix that is human-centric? Or is it mostly a matter of a much larger DB in the human head?

There's been a lot of story mileage wrung out of self-evolving spam as a pathway to so-called "genuine" AI, but I'm rather skeptical. Spam tends to encode on the order of megabytes of information, while spam blockers encode on the order of giga- or terabytes of information. Hard to go up against that sort of gradient.

Who was it - maybe Sagan - that said that the complex instinctual behaviours of everything from bees to sheep only look like the work of a designer because you never see the hundreds of millions of dead sheep or the trillions of bees.

[1]Major plot point of Watt's "Maelstrom"

25:

Charlie@19:

It also requires extremely durable, high efficiency, ultra-lightweight panels that are cheap enough to mass produce. This latter requirement is finally on the edge of being met.

Do you have any links for this? It's been my experience that cheap solar cells have been coming Real Soon Now since at least the the late 70's. If commercial fusion is fifty years away, can we say that cheap solar panels are thirty years away :-)

Plus, there's also the fact that Earth orbit tends to be a much harsher environment than what you find at sea level. Maintenance will always be a factor in the cost of solar power, just as it is in every other form of power, but I get the distinct impression that orbital costs in this department are going to be not one or two, but several more orders of magnitude more expensive.

26:

Hi Charlie,

So as part of the anti-spam effort, you're probably going to have to close off every thread after it's run a set time, say one month (or after it's three or four down in the queue, or whatever). As for the rest...I'd use a blood pressure cuff, and so long as your systolic pressure doesn't go up more than, say, 5 points while you're cleaning out the crap, it's a minor nuisance. Perfection's probably unattainable.

As for the flint axe...that's sooooo English. Send them to the southern US. We'll stake 'em out over a fire ant's nest and smear honey on their genitals. US$100 per 10 staked. The rate's high because we'll need to pay workmen's comp for the people who get stung while staking the spammers out.

27:

Spam without links is not profitable. Obfuscated links aren't any good either.

So: Disallow posting of comments with links.

There are valid reasons to let ppl post links, maybe introduce registered accounts for those that want more freedom. Confirmed opt-in gets rid of spammers, that would bother to register otherwise.

28:

Despite the moderation policy, I have received permission to state this website. It is a technological review of some Sony demon generator which is nothing but a streaming concatenation of Anglo-Saxon words. Not work safe, unless you work for the OED. It is a comment on Sony and perhaps some people who fill the Aching Void with gadgets. http://www.youtube.com/watch?v=8AyVh1_vWYQ

29:

Some blogs are having decent success with a simple question on the comment form -- you have to get the answer right to have your comment show up. "What British agency does Bob Howard work for?" might work here.

30:

24: I don't have any data to hand on how well Bayesian filter applications do compared to human reviewers. All the ones I've used typically give a high weighting to links that have been previously posted, but if you put in enough spammy content you'll eventually outweigh that, and get the legit link into the filter's own list of links found in spam.

A human might suspect that legitimate links might be added to spam either to poison the spam filter, or to carry out a backscatter attack like I described.

31:

Posts with links are already held to be read by the moderator. Non-spammers may need to link to give more information.

32:

Coincidentally topical:
http://www.newscientist.com/article/mg20527446.000-to-beat-spam-turn-its-own-weapons-against-it.html

Bayesian filters are pretty good, but this new technique above automatically reverse-engineers the code a spambot uses to generate varied messages. As a result, it can discriminate 100% of all spam from that attack, so the instant the first few unsolicited emails trickle out it is automatically trained to block them.

33:

I don't know what to do about journalists, I don't know how they'll manage to catch a ride on the shockwave, but if you're asking what to do for yourself... get with a publisher that lets you publish non-DRM eBooks.

Reading a book on a decent handheld, even my seven year old Clie, is pretty much like reading a paperback for me. There's no spam blogs on my handheld. There's Accelerando, which I got free, but I bought Lobsters from Fictionwise as well, and I'd have bought everything you wrote in eBook format if I could... but I can't.

I won't buy DRM-locked ebooks, because I've already had some of those eaten by bitrot, so right now your stuff is dead trees or nothing.

I bought Glasshouse on dead trees, and Iron Sunrise, and Singularity Sky, but buying dead trees from Amazon can't compete with the instant gratification of ebook downloads or picking up a book in a bookstore, so I haven't kept up with your books as well as I would have when I was younger and had more time to spend haunting bookstores... and when I lived somewhere there were good bookstores in walking distance.

34:

Oooh, very interesting. I see you did have some stuff at Baen's Webscriptions, but it says:

Published 5/1/2006 SKU: 0765309300 Ebook Not Currently Available

... and now I see the explanation in your FAQ.

sigh

35:

Peter, I can go with publishers who permit non-DRM ebooks, or I can go with big publishers who pay book advances I can live on.

As ebooks constitute at best 1% of my income stream right now, and might generate as much as 10% of revenue if I switched to a publisher who permits non-DRM ebooks, I'd be cutting my own throat commercially for a matter of principle.

So for now I'm sticking with the big publishers, but nagging them continuously for a change of policy. And this strategy is subject to continuous review.

36:

mr o@27: that's like preventing theft by chopping off everyone's hands. We'd lose interesting comments like Greg. Tingey's up @18.

And this one...

37:

How to make money as a writer, without learning huge new skill, somewhat insulated from ebook fallout without impacting time spent writing what we want to read?

Write a (young) childrens book.

Because:

Enough of your readers must have (or know people who have) children of the age when you only need a few words.

Lots of kids books suck, some are good, some are a genuine pleasure to read (daily for weeks at a time if you're really lucky) and for those you buy more by the same person. How hard can it be? You already write short fiction, this is just VERY short, might even be interesting challenge.

Now way are babies getting ebook readers anytime soon.

If it works you can pull JKR's trick and grow the stuff up with the kids, creating a whole new set of future SF fans (and lots of money!)

Laundry universe might be interesting, but possibly the easy option - may be good to start with but the successful hard sf tomas the tank engine/harry potter would rebalance the sf/fantasy genre readership nicely.

Cue someone saying 'the illustrator makes the cash' or 'the author gets less from a kids book'.

38:

Re: SPS Death Rays

I thought that the latest thoughts on SPS was that it'd cause seriously, but not scifi-like incinerating, burns if passed over civilian areas, rolling brownouts if misaligned and a tendency to cause heat bloom as the atmosphere absorbs most of the energy, thus causing every climatologist and environmentalist on the surface of the globe to sweat very, very profusely for reasons related to heat?

Death ray suggests something a bit more immediately zappy.

39:

It's not that you should disallow links in comments altogether, but the set of valid comments containing links seems to be pretty small.... I would assume that the set of spam comments containing links is ~100%. So it might be reasonable to make it extra-tricky to post a comment containing a link and get rid of almost all spam off the top. I actually have no practical experience in running a blog, (I don't know how customized versus off-the-shelf this site really is) so it isn't clear to me how much flexibility is available to configure filtering.

40:

Graham @37: childrens and young adult fiction is a highly competitive field, and it's a lot harder to get a toe-hold than you seem to imagine. I don't have kids of my own, or any to practice on; nor am I particularly interested in writing fiction aimed at that readership. (Hint: conceptual density is my shtick.)

41:

Sigh - the 'Snappy sounds of Happy farm' it is and will have to remain then. And no world domination for Hard SF! I fully appreciate the conceptual density thing, it's why i read the proper books, but I still think it rates up with action figures in the spin off merchandising stakes (although Terry Pratchett probably sells alot more models of giant turtles than copies of 'Wheres my cow?')

42:

Hmmm, somewhat funny, in the tragic sense, that your blog post about monetization of content was monetized by the internet equivalent of antibiotic resistant airborne gonorrhea. Also, about combating spam, I wonder if a bit javascript which repeatedly hammered all the links in a blog's spam bucket would make any difference? Could you imagine the hilarity of those little scripts responding incessantly to every spam message whenever a user is using your site, or if Gmail,Yahoo, and Hotmail also did this! It would be an interesting academic problem to find out what the magical profit-to-noise number is for most spammers.

43:

One simple way to stop spammers from hitting old threads is close them after two weeks. That's what many other blogs do.

44:

Allynh: I tend to work by lazy caching -- that is, I leave comments open until a piece of spam shows up in the logs, then I nuke the spam (if it didn't get trapped -- 98% of them don't get through) and close comments.

Experiments with fun MT plugins to auto-close threads led to near disaster, so I stopped doing that.

The annoying thing about the recent influx is that the spammers, after getting nowhere with the current threads, went hunting for ancient ones. Thus throwing up a lot of cache hits.

45:

Hi Charlie, There're probably a hundred good reasons not to but why not plug a manual authorise link into a submission? I'm thinking of those vote pages you get when you do the TopWebComic thing. Enforces a choice via manual click-thru. i think they were designed to defeat spambots. As for the deliberate spammers-for-fun-and-profit, Mandatory Cactus Fisting is the way of the future. Enjoying the new look immensely too,

46:

Charlie @0:  While the idea has great merit, I'm afraid that there is not enough flint available to keep them from reproducing faster than one can sterilise them.

Whibley @45:  Might work. But pray tell, what has any cactus done to deserve such fate?

47:

Laundry universe might be interesting, but possibly the easy option

As the proud parent of a toddler the concept of 'Baby's First Cthulhu', particularly if done as one of those 'lift the flap feel the texture' books that she loves so much, is quite terrifying.

H.R.Giger doesn't do baby books either.

Charlie @ 40 > nor am I particularly interested in writing fiction aimed at that readership. (Hint: conceptual density is my shtick)

I think you're dismissing young adults a bit readily there. Most of your readers were probably reading mainstream scifi when they were thirteen years old. I know I was and I bet you were as well.

48:

Robin @47 - It might be terrifying for you, but would you buy it? As for young adults - the problem is it's still proper length books so similar work and a disrtaction from the ones we want to read...

I guess my point was that it's adult's who buy and read books for toddler's. Keeping the kids happy is totally simple - have nice colours, text that has a bit of suspense and some load noise and it's done. It's having a book that ALSO appeals to adults that's the hard part and what makes them sell (unless you lose concentration in the bookshop and let the child decide without reading the book first which is where the happy sounds of snappy farm came from). It's unlikely that any book of this type would be mainstream successful, but it might sell to the current readership in limited quataties (self publish, sell on website, nice chunk of cash on the side for minimal work.

49:

@ 47 48 I STARTED on SF with my father's (pre-war, penguin/pelican) copy of "Last & First Men" at age 9. Then the Sayers' translation of Dante - since THAT'S SF as well .......

51:

When amazon knocks £2.53 off the cover price of The Jennifer Morgue paperback, what happens to your and your publisher's cuts?

52:

You adults probably ARE reading Charlie's stuff.

As a dad, I've noticed that you can put conceptual depth into a childrens fiction, but it's strictly on a show don't tell basis.

I can imagine some titles designed to help children grow up in a Strossverse, e.g. "Mummy and Daddy Uploaded".

53:

thomas @51: When amazon knocks £2.53 off the cover price of The Jennifer Morgue paperback, what happens to your and your publisher's cuts?

We get less money, of course.

(What happens then depends on the contractual small print, which defines whether I'm getting my royalty percentage out of the SRP, or the publisher's net receipts for the book, depending on the sales channel in question, number of copies previously sold, phase of the moon, and so on. But the short answer is, my publisher and I have less money to divvy up between us. This is usually framed in terms of the publisher having given Amazon a greater percentage discount off SRP, for royalty accounting purposes.)

54:

Uh, a dumb question for someone who actually knew a bit about computing and worked in IT . . . around 1993 when the operating system du jour was DOS and we were preoccupied this thing called "high memory area" to conserve address space: why not simply require that everyone have a Livejournal or Typepad or somesuch account before they can post?

Or would that be unfair to those who don't have one?

55:

SoV: you nailed it in your last sentence. More to the point: it'd deter legitimate commenters while not doing anything to prevent spammers (hint: LJ accounts are available free -- a sufficiently determined blog spammer could simply code up a tool to semi-automate grabbing new handles on demand).

56:

As a followup to my 54, how hard is it to put in one of those character recognition routines where you have to manually type in a series of letters and numbers that aren't easily machine-recognizable? Wouldn't that knock out most spam? Or has it gotten sophisticated enough to pass this sort of barrier?

57:

Charlie@55: there are already spam accounts on LJ, spamming communities.

scentofviolets@56: they're no longer a barrier to anyone except people with disabilities.

58:

@45 - on the suggestion that you add some sort of "test for humanity" on posting, I had this on a guestbook page for quite a while. It presented you with a picture and a list of descriptions, and you had to select the right one. That hardly reduced the spam at all - I can only conclude, as someone said in one of the very earliest comments, that people are being paid to do this.

On making money - it's always seemed pretty obvious to me that if Charlie was after becoming seriously rich, rather than making a living, he'd diversify into the slightly-techy airport bookrack thriller. There's plenty of evidence he can write gripping action sequences, and string a plot together. Luckily, he seems to want to write what he wants to.

59:

@ 45 / 56 erm, err ... What you REALLY need is a test for INTELLIGENCE. "Humanity" has nothing to do with it ....

60:

Surely you want a test for decency more than anything else? Voight-Kampff rather than Turing...

61:

A couple thoughts occurred to me by the time I gave up reading the huge volume of comments on your monetization paradox thread:

One issue has to do with the wall street journal's online publishing policies and book publisher policies: Both have a "premium price for early access" model. In the case of book publishers, hardbacks are typically released months before trade or regular paperbacks.

Another issue has to do with the very spam burst which caused you to shut off comments on that blog: In an advertising context, that kind of meltdown can be highly profitable.

I am not ready to draw conclusions yet, but I am also interested in these kinds of issues and maybe someone smarter than I can help us connect some of the dots.

62:

Yeh, I get it, I'll try and keep up with your stuff on paper.

63:

I concur with some of the previous comments about not allowing links. At most, people could list the url in text but not clickable; and I do hope that all links in comments are already automatically appended with nofollow.

I am somewhat surprised that there's no ReCaptcha in here yet (though maybe with it being owned by the google now, it's not quite so attractive if you are opposed to their orphan works agreement.) While it won't eliminate the spam, it should at least reduce the automated ones.

Specials

Merchandise

About this Entry

This page contains a single entry by Charlie Stross published on January 23, 2010 7:12 PM.

The monetization paradox (or why Google is not my friend) was the previous entry in this blog.

I have no life, so I must blog is the next entry in this blog.

Find recent content on the main index or look in the archives to find all content.

Search this blog

Propaganda