Back to: CMAP #9: Ebooks | Forward to: Meet the new boss. Same as the old boss ... not.

PDF: Satanic horror from the abyss, or merely evil?

Ahem. It has just been drawn to my attention that the lovingly-compiled packet of Hugo nominated works, made available to the elligible voters by this year's worldcon committee, contains nothing but PDF (Acrobat) files. What can I say? I've been overseas, then I've been busy.

This is probably not the right place for my rant about why PDF is not an ebook file format (hint), but PDF is not an ebook file format. It's a page layout format, complete with margins, gutters, page numbers, and loads of white space. It is not designed to be reflowable either, meaning that if you try to view it on a screen smaller than the page it was designed to be printed onto, you either (a) get a shrunken-to-the-point-of-illegible miniature, or (b) a letterbox-sized window revealing part of the page. (Yes, some ereaders try to reflow PDFs — with erratic results — but often it ends in tears.)

PDF is a BAD format for people who read a lot of fiction online. About the only way to do worse would have been to have converted all the novels, novellas, and stories into a series of JPEG image files of each page.

Anyway, if you're a voter and you want to find copies of my nominated files in a sane format, follow this link. "Palimpsest" and "Overtime" are both available as HTML, and in the case of "Palimpsest" as a single file that you can download and convert into Mobipocket or ePub. (If you want a direct link to one of those formats because you can't convert the file yourself, bug me in the comments here.)



I disagree ;-) - I really like to read propely formatted fiction PDFs with a nice lay-out on my netbook, turned 90°, so fitting almost perfectly to the screen.


Satanic horror from the abyss, certainly. If you are so weak the lack of visible tentacles makes you doubt, then that awful piece of bloatware called Acrobat Reader should be evidence enough to convince you.

And besides, that lurking abomination gains ground every day and is already in our very doorstep. I kid you not, I have asked one of my brothers, sadly ignorant in IT, a photo and received it... in a pdf file!


Till, you just happen to be interested in books that by sheer chance fit perfectly in your screen. But if you became interested in big format books (architecture, for example) or bought an standard e-book reader, then you would suddenly understand, like Lovecraft characters, the unspeakable horror that lurks behind the beautiful mask.


Please be aware that mention of text or line graphics represented using JPEG can be triggery for some readers. The shakes have stopped now.


Coming from an academic background, I'd say that PDF was an abyssian horror. The amount of publications, findings, and other information that is only available in beautifully laid out PDF format is staggering - but in the end, useless, as it's almost impossible to refer back to.


So, I understand from from Wikipedia that epub doesn't specify how to handle links within a book. It follows that there are issues with dealing with footnotes. Have any of the proprietary examples of e-reading software handled this in a good way? Or do they all put it in the 'too hard' basket?


If you are so weak the lack of visible tentacles makes you doubt, then that awful piece of bloatware called Acrobat Reader should be evidence enough to convince you.

There are alternatives. If you have a Mac (Preview reads PDF files quite nicely (unless someone got cute with scripts and forms…). I suspect there's something similar for Windows.

I confess I quite like PDF, as a way of distributing documents to colleagues who use many different computer systems, with who-knows-what fonts installed, in a situation where page layout and diagrams are important.


Errolwi, all the readers I've tried handle them pretty well. I have a Sony 505 and that's fine. EPub is just HTML in a box so the technology is pretty solid. It's only really annoying when there's no backlink so I have to press more than one button to go back and continue reading.

I was going to sign up to the con to get the ebooks, so you've saved me a few bucks on that front. And yes, PDF is Satin's little helper. I find it annoyingly unreadable even on a desktop computer with a decent monitor. The lack of proper flow, reflow and resize drives me nuts. Not to mention the gigantic helping of malware that Acrobat Reader is. But at least PDF is very secure... while it'll take minutes and minutes for that pack to make it to the darknet, but the goal of preventing rightful users accessing it has been achieved. It reminds me of the "how do you make MS-Windows secure" jokes... exactly the same approach (ahem. "make it complete inaccessible" sums up the punchline).


There's a comic book format that sticks every page as .jpeg into a .rar file. Then the readers open double pages at once and you can page up/page down through them.

It's quite nice actually.


Dear lazyweb, I would kill for my ability to highlight a section of text on a HTML page; hit "Bookmark", and have my browser actually bookmark it.

It would only take a bit of magic with window.getSelection(); and inserting a span with an ID in it - then bookmarking a link like foo.html#generatedIDHere; plus some way to make storage persist (ie: a cached version of the HTML page).

As trivial as it all is; I still don't have it - so it's a killer for reading novel length HTML documents in firefox.


Daniel, I use hibernate to get that effect when converting the page to a usable format isn't an option. It's vaguely annoying, but it works. Much better to use Calibre to turn it into an epub and read it on a purpose-built device though.


As you know, Bob, I generally agree with your commentary on the ongoing saga of e-books and other fronts in the ongoing encounter between digital technology and the publishing industry. In this case, however, I think you verge on the sin of Complaining About The Free Ice Cream.

What's striking about this year's Hugo packet isn't that so much of it consists of PDFs, it's that practically everything on the ballot is present--and present in DRM-free form--thanks to the cooperation of every single publisher. This would have been unimaginable as recently as a couple of years ago. As recently as a couple years before that, it was far from established that publishers and authors would be generally willing to distribute free digital copies of their nominated work to the entire Hugo electorate in any form. This time, everyone involved was willing to let thousands of people download completely unencrypted copies of their books and stories, despite knowing full well that a few of them will inevitably leak onto pirate sites. This represents a hell of a lot of progress, and deserves better than to be characterized as "evil."

The reason almost everything is in PDF form is simple: it's fast and Good Enough. PDF was how to quickly oblige Aussiecon's request with a resource that's immediately to hand and which is usable on every desktop platform--rather than trying to get some other part of the organization, some department that doesn't report to editorial, to do extra work on short notice. As the ebook market matures, people in all departments of publishing will get more familiar with these issues and more facile with the relevant file formats and tools.

I know you're simply indulging in good-humored Strossean hyperbole, but there are quite a few people out there in ebook land who appear to think that the way to get ebooks in their preferred formats, on their preferred schedules, and at their preferred price points is to assault publishers--specifically, the people inside publishing organizations who are most sympathetic with their desires--with streams of rhetorical high-pressure liquid pigshit. Encouraging this kind of online behavior does not a better world make.


At the risk of re-triggering Del's shakes, some of us have seen more than one Word document containing only an embedded JPEG which is a screen shot of a command line.


Thanks for the info moz. I think the PDFs are used because they are easily available, and the lack of security is a requirement of how they are to be used.


Patrick, did you note that Charlie provided an epub, which was converted (which is understandable), but was not itself included in the packet? So a deliberate decision was made to not include an additional file format that many of the recipients would find useful. I understand why publishers provided PDFs, and I for one are certainly not complaining about that.


PNH: ...who appear to think that the way to get ebooks in their preferred formats, on their preferred schedules, and at their preferred price points is to assault publishers--specifically, the people inside publishing organizations who are most sympathetic with their desires--with streams of rhetorical high-pressure liquid pigshit.

Whoops, sorry. (Yes: I, too, get a serving of that slurry in my email inbox from readers who can't tell the difference between authors, publishers, and the moon on a stick: I feel a vestigial echo of your pain.)


errolwi: no, I provided an RTF file.

If I was going to compile such a package, FWIW, ideally I'd go with: RTF, ePub, and Mobipocket in the core package, and an optional -- because it's w-a-y bigger -- PDF package. Art books and comics probably to be packaged only in PDF (depending on layout complexity). The problem here is collating what's provided. Publishers often don't have the final copy in anything an end-user would regard as usable: they've got a PDF, prepared by an external typesetting bureau, from Adobe InDesign or Quark Publishing System. However, the request for my short works came to me, directly, as the author -- it really depends on who's sourcing the material.


Daniel have you tried out the Dog Ears extension for Firefox? I've used it a few times and found it pretty handy. Allows you to bookmark a specific part of the page and jump back easily. I've not used it on the latest FF but it worked a treat last time I used it,


@7 Robert, I know. At home I use Foxit, but at work it's Acrobat Reader. Corporate policy, etc... in other words, "I'm just following orders".

@9 I guess you mean "Comic Book Archive file" for CDisplay, the format behind those ubiquitous .cbr and .cbz files. Not my piece of cake, but certainly many people seem to like it.


Sorry Charlie, but that's what I understood from your comment on CMAP #9. At least I learnt stuff while experimenting with conversion methods in the weekend. Goodnight from NZ, and I hope you can still make it here in August.


I'm with Patrick on this one, but then I'm quite happy with Adobe, which keeps responding to criticism and making improvements. (The size of the app is sad, yes, but if we're disqualifying all apps that qualify as bloatware, there will be very little use of computers by non technogeeks.)

Adobe digital editions are fine on a screen. The problem with PDF may not be so much that its not made for screens as that there is enough freeware out there (e.g., CutePDF) that you don't need to learn the entire Adobe suite to produce a PDF--which then can be set so that it can be searched but not edited, ensuring a trail of ownership.

Which is not something to minimize in discussions of eBook formats.


IMO people who think PDF is good because it is non-edittable are 1) rather annoying* and 2) at best kidding themselves. There are a number of converters which would allow such edition, as well as the spec of the file format. Last resort, it's possible to feed a PDF to an OCR and act on it accordingly.

*Incidentally that's the kind of BS which tends to interfere with a11y, and as a blind user I care about that more than most.


you don't really want art books as PDFs as unless they are line art created in say Adobe Illustrator, the pdf is only really linking to a set of embedded jpeg images alongside the overhead of postscript.

.cbr and .cbz exist to solve the same problem (a large number of embeddede jpeg images) in a more elegent solution.


The original purpose behind PDF was to create a format that would result in the same appearance, everywhere.

The result is a layout format, not something that is designed to be viewed on devices of different sizes and proportions, or to have the font size changed to make it easier to view. (You can do those things, but not terribly well, and it gets down to what PDF is really for.)

But I'll also side with PNH -- the fact that we've reached the point that it's expected that the nominees will be available, DRM-free, is fantabulous. Hats off to everyone who helped make that happen.


There's nothing wrong with PDF files that a cyborg, a time machine and 'detailed files' on the founders of Adobe couldn't cure.


Ugh. PDF is abysmal on eReaders. They tend to have illegibly small text in their normal size, and are a pain to resize. Generally a page doesn't fit, so you end up with crap where each "page" takes 2 1/8 pages on the device, leaving you with page headers at random places and 1/3 pages having only two sentences on it.

(In general, this is partly because publishers tend to just put whatever PDFs they used in the publishing process up in these Hugo packets. The PDF's would be easier to deal with if they stripped the page headers/footers etc.)


Just read "Overtime" and enjoyed it a lot. Your Laundry world again somehow reminds me the old Soviet SF story "Monday begins on Saturday" by Strugatsky brothers ( ), first half of "Overtime" -- especially (starting from here: ). Of course, Laundry is much darker, more "adult" and also very enjoyable. I really hope "Overtime" gets the award.


I asked the Aussiecon committee if the preponderance of PDF were the result of an oversight on their part, if they'd converted the files sent to them in the mistaken belief that PDFs would be more useful, and asking if there was any way to get "epub or some other practical format" included in the packet, and got a reply in the name "Aussiecon 4 Hugo Staff" implying that the files in the packet were what they'd been sent.

("As you've seen many of the documents were provided as PDF's. No ePubs were included.

"There will be updates to the package and there may be different formats included throughout the voting period. I'll bring up your concerns.")

Given that you say you sent them an RTF file that they didn't include in the packet, I'd say that their response amounts to deliberate deception. I am thoroughly pissed off.


Incidentally that's the kind of BS which tends to interfere with a11y

For anyone who didn't get it, "a11y" is "accessibility." Just as "i18n" is "internationalization," because people get really tired of typing long words out all the time, when they're the ones doing the standards.


Steven: I have just heard from the organizers that they want other file formats as well, and will be updating the packages in the next few days/weeks.

So, no deception.


I represent a coalition of Satanic horrors from the Abyss - and associated eldritch things from the outer dark, all of whom are mortally offended by the comparison to the utter evil that is PDF.

I think the real problem with PDF is that for too long people have been trying to use it for something radically different from what it was originally intended for and for which it is remarkably ill-suited.

I also think part of the problem is that because PDF is a self-described "document format," people think it's designed to be read. It's a great format for making sure everyone's printout on standard sized paper looks the same - not so much for all of those reading on non-standard screen sizes.


That's excellent news. I should go give them money now. 8-)


Sean: a-one-one-y is the kind of contraction that Does Not Work if you read it in a sans serif font.

(As I did ...)

Accessibility is good, though. (For a long time, my goal with my blog's design was to make sure that it rendered okay on an ancient Palm Pilot or Windows Mobile PDA's web browser. WebKit's spread has made it easier to splurge a bit, and move the trailing edge forward from mid-90s to early-00s tech, but accessibility alone should be sufficient justification for banning Flash from web pages.)


I so wanted to get the packet (and cast my vote), but the PDF format turned me off.

Steven, I hope Aussiecon do update the packet with at least RTF's of the novels. Thanks for contacting them!


Yes, I read it as "ally" at first, and it didn't make much sense, so I looked closer at it. Amusing, given the earlier comments about how a sans serif font is so much better :). (I prefer serif fonts. My calligraphing fiancee agrees with me.)


Despite the fact that "PDF is not an ebook format," it's probably the most ubiquitous format in Overdrive libraries.

As a result, I have read many, many, many PDFs on an e-reader (nook), because, hey! library e-books! The original size is generally too tiny to read; the reflowed versions range from nearly-perfect (very rare) to a bit annoying to read (common) to unreadable (very rare). The bit-annoying problems tend to be wonky chapter headers, page breaks in middle of a page, dropped hyphens in compound words, and loss of paragraph breaks in back-and-forth dialogue.

The very-rare unreadability errors generally consist of the retention of line-break hyphens in the middle of words that are no longer at the end of a line, and the omission of hyphens that are supposed to lie between words.

Formatting matters. But some types of formatting matter more than others.

I just want to reach into the e-reader and tweak the reflow logic a bit more and more and more...


As someone who reads almost exclusively ebooks, I find that PDF is by far my preferred format. Of course, I read on a tablet with a 12.1" or on a actual desktop monitor, so my preference may be unusual. I suppose with Baen I always get HTML files, which would be my second choice.


But I read some books in which the layout is something that the author worked carefully on - a re-flowable ebook can completely destroy the structure of the book. "Just give me the text" is not always the right answer. It's why, in fact, I don't bother with ebooks at all.

PDF is a horror for plenty of reasons (I work in computer security and have seen PDF as a looming pit of potential malware coming from a long way off) but I am equally sick of seeing horrible bletcherously formatted html crap with rags all over the place. And, while css might have been useful for bringing some of that back into control, it has mostly resulted in semi-transparent gif backgrounds to further reduce readability.

For those of us who still want books to be beautiful, and who want to read books because they sometimes are beautiful, please don't encourage substitutes for PDF that are aesthetically just as bad for other reasons. Ebook formats are like having to choose between being boiled alive or flogged to death: there still is no alternative that is superior to ink on paper. (and before someone complains about the mass of paper books: yes, I'm the guy who travels with one bag of luggage that's just books for reading on the plane.)


I have a great deal of sympathy for Patrick's post @12, and I am extremely grateful for the publishers coughing up with the files. Grateful enough that I did in fact take out a supporting membership for 2009 and 2008 in order to get the voting packet, with the intention of using it to, well, vote.

But I still haven't read some of the material in those packets, because they were pdf only, which meant that I couldn't read them on my Cybook because they were formatted to a page size that's not readable on a handbag-sized screen. :-/ I'll get to them eventually, but only when I feel up to reading an entire story/novella/novel on my desktop, which will be far too late to use them for the intended purpose. So I think I'll pass on this year's packet.

(Oh, and I am exceedingly grateful that when was giving away ebooks as a promotional tool, there was a good selection of file formats on offer. I know how much work is involved in that.)


Usually the layout doesn't matter so much to the story. Usually it's enough to make sure the fonts are "right" (including a bold and/or italic equivalent, as necessary), that paragraphs are properly separated, etc.

But, to counter that: Terry Pratchett has described the efforts he went to to ensure that Azrael's response in Reaper Man showed up on the left hand side of the page, and took up as much space as it did. And the ebook version of it... didn't.


Those who have iPads and the 99-cent GoodReader app will find most PDFs to be remarkably readable. I agree that they're a pain for just about everybody else, though.


Well, excuse me, but you need your putative brain re-fucking.

"For anyone who didn't get it, "a11y" is "accessibility." Just as "i18n" is "internationalization," because people get really tired of typing long words out all the time, when they're the ones doing the standards."

Sorry, again, but a11y I read as "an-11-year old" (presumably computer-literate ....) as for i18n, why that should read interNATIONALisation is beyond me.

Now either piss off, or use English, or at the very least well-recognised contractions in initial-slang

Grrrrrrr .....

On another subject, from the previous thread... What is Calibre? And how does one use it to untangle .pdf files? Pretty please?


Calibre is a commonly used ebook library programme. It has built-in conversion routines - load the PDF (or mobi, or whatever), it sets it up as a book in its library, you can then convert to other formats (rtf, epub, etc). It can also act as a server that eg Stanza on iPhone can download epubs from. Written and regularly updated by an individual, whose name I'm sure you can find.


Um... you what? PDFs only contains links to embedded jpegs if they were built in a spectacularly broken fashion. Proper pdfs are (basically, sure someone will correct the details) compressed postscript. Remember postscript? The file format specifically created for printing?

If you're really unhappy about things being in pdf format, you're complaining about the horse rather than the jockey. PDF is a perfectly good file format for a workflow based on commercial printing, and the fact that it doesn't suit your ebook reader just reflects the fact that publishers don't make ebooks part of their workflow (yet).

For similar reasons: editing pdfs, wtf? Edit the source file and generate a new pdf, that's how it's designed to work. All-in-one formatted files like Word uses are a vile time-wasting hard-drive-bloating excrescence that would crash before they finished loading the tentacles, assuming you had the right software to open them in the first place.


I am going to have to go ahead and agree with Chris up there. Ipad+goodreader has single handedly turned pdf from a business only document sharing thing into something functional and, well, pretty darn usable as a pdf ebook reader. I dread the thought of having to use anything else, though.


It's been a while since I messed around with PostScript, but I'm pretty sure that the way to include bitmapped images is, well, to just include the bitmapped images as they are, possibly with some simple format conversion. From what I know about PDF, I would bet it's similar. Line art is a different story, as Ben noted.

So it doesn't seem unreasonable at all that if you feed your PDF-converting program a JPEG image, it'd just end up as a direct binary copy inside the file, seeing as how JPEG (de)compression is a pretty normal thing to have nowadays. Uncompressing would be wasteful and pretty pointless, and uncompressing followed by recompressing will lose quality.

Of course you could convert bitmapped images to line-based drawing primitives, but that would just be a very difficult way to end up with something that in the end just takes up more space (and most likely looks worse as well).

(But I don't see how CBR/CBZ is "better" either; it's really just the same thing in principle, only now you don't have the page abstraction, and you can't add text. There's no "PostScript overhead" in PDF either, but of course you may get Adobe bloat instead...)

As for editing PDFs, I think the point was whether mere users would be able to edit them at all, rather than talking about the normal way to go about things as the creator. (You know, as in viewing sections that have been blacked out, etc.)

Now either piss off, or use English, or at the very least well-recognised contractions in initial-slang

Right, as if anyone didn't already know what i18n means. (Good one though! Next I suppose you'll say you never heard of m17n or l10n either.)

"a11y" was new to me though, I figured that "Ally" had to be some kind of screenreader or something.

I guess one shouldn't be that surprised that Wikipedia has an article on the subject. Apparently E15 is short for "The Icelandic volcano" -- that one might come in handy.


magetoo @47: "Right, as if anyone didn't already know what i18n means. (Good one though! Next I suppose you'll say you never heard of m17n or l10n either.)"

Correct, on all counts. Never heard of ANY of these terms. And I've been using computers since . . . oh, 1967 or so. Including having my own home systems since . . . er, 1980. None of which makes me ipso facto an expert on software, hardware, programming, or anything else in the field, actually. Merely suggests that microprocessors do not, by their mere proximity, automatically exude some esoteric jargon-recognition field that bestows the gift of script kiddies' tongue upon the fortunate user. Or is some more specific benediction required?


a short google search found 2-3 free PDF to LIT converters, and LIT is considered to be quite good IIRC- it is one of Calibre's preferred formats.


"quite good" for PDF-anything useful is a very variable term. PDF is actually a very flexible thing, and can enclose all sorts of random junk. Like Flash, for instance. So what you get out of your PDF->something converter can be anything from a neat text extraction to a really bad OCR to huge amounts of blank page. Which makes sense when you consider that you might have anything from Postscript "print this text using whatever you feel like" to a random admixture of images, text outlines, embedded videos and audio. Meat puppets struggle to make sense of that stuff, so expecting some random conversion program to cope is optimistic.

The objection to PDF comes from that area - some pdf ebooks are nicely built with a clean break between presentation and content, but the other 99% are almost entirely presentation (often vector outlines of the shapes on the page... hence OCR).


Numeronyms - yet another April Fool's joke inserted into Wikipedia and taken as gospel by the rest of humanity.

Honestly though I'd never heard of them before and I'm a long term programmer too. I think it's one of those things used by a small clique who assume everyone else is in the know.

Like knowing what Informed Consent means.


@43 THANKS - I'll go and look at it!

@ 47 & 48 FYI, and just for laughs (not) .... E15 has ONE MEANING ONLY in England: The London Postal District of Stratford, where, incidentally, they are holding the (Ugh, shudder, unspeakable-tentacles-rising-from-the-river-mud....) "XXXth Olympiad"


Sorry, again, but a11y I read as "an-11-year old" (presumably computer-literate ....) as for i18n, why that should read interNATIONALisation is beyond me.

Greg, you're out of order. These acronyms have been around for rather a long time, albeit not in broad public usage. For the utility of a11y, consider for a moment the requirements of the Disability Discrimination Act ... and the large number of big corporate websites that don't provide non-popup-menu-driven navigation for the visually disabled. Or the tendency of restaurants to go for FLASH-only websites. And so on.

Finally, consider it a case of linguistic drift in action; and would you really rather leave it to text-speak instead of tech-speak?


I certainly get why PDF is not particularly suitable for hand-held reading. And that it's not particularly necessary for books that are more-or-less just a huge hunk o' text. And that Adobe Reader is a huge mass of eldrich cr*p. (Though as others have said, Preview is quite nice as a PDF reader. Do NOT speak of the evil that is Adobe Digital Editions.) But may I speak to situations where it's perhaps not so bad?

For books that are more than just a big hunk o' text - technical books with lots of tables, diagrams and text boxes, the aforementioned RPG manuals, and so forth - PDF is frankly an improvement over the ePub-formatted equivalents I've read, on a standard desktop/laptop screen. (This is a direct comparison, by the by, between the same book in PDF and ePub from O'Reilly, who hopefully knows something about ebook formatting.)

Also, those of us with longer memories might still regard PDF as a godsend. Back in the mid-to-late 90's, I was (among other things) the digital librarian for a wholesale distributor. We had to keep reference Material Safety Data Sheets on hand for all of the products we carried - and make copies on demand for our customers. PDF was just starting to really penetrate the computing world, and it was like manna from heaven compared to the alternatives at the time. A clean, electronic copy direct from the manufacturer? That would actually print out reliably, instead of Word files (or worse, manually-formatted text files with hard line breaks) that would barf all over the page because the creator used a different font and/or wider margins than your printer supported? Which were still often an improvement over the multi-generation photocopies skewed on the page that we got from some companies? Oh, how wonderful that was... (And back then, Acrobat Reader was still a relatively fast and lean program, instead of the later mess it became.) Those positive impressions can last for a long time.


Greg Tingey @ 52:

"@ 47 & 48 FYI, and just for laughs (not) .... E15 has ONE MEANING ONLY in England: The London Postal District of Stratford, where, incidentally, they are holding the (Ugh, shudder, unspeakable-tentacles-rising-from-the-river-mud....) "XXXth Olympiad""

Well, as I heard it, London and Paris fought it out for the Olympics, and London lost :)

Just plan on renting out your London flat in 2012, and taking a long August vacation in Paris. It'll be quiet (except for the crowd of Londoners in exile), and good for catching up on your reading.


@ 55 & 53 "out of Order" (As in a notice on a bog-door, presumably!)

Well, sorry, but the use on not-appropriate numerals, as others have said, DOES take some deciphering, to the point where it is a waste of, rather than a shortening of time......

As for the olympics, they appear to be a corporate joint wet-dream of the politicians and the journalists. I have yet to meet an actual person who is really in favour of the olympics - maybe because I live 4km directly North of the main stadium, and we are getting NOTHING AT ALL in the way of improved services or facilities either during or after the events. And we, as LOndon residents are PAYING (extra) for this .... The Paris RATP workers very carefully went on strike the day the IOC visited - they knew what they didn't want! Then there's the "security" - I think I preferred the site when it was the largest steam-locomotive shed in England, grotty, smelly, smoky and FUN - provided you were careful. I've still got pictures taken from that time, when I was at school here, and visted the sheds at weekends.


Greg@42: "as for i18n, why that should read interNATIONALisation is beyond me"

Why you put spurious capitalisation in there is beyond me. But why you also thought it appropriate to swear at other posters on someone else's blog, and then wonder at being called out of order, is also beyond me.

The reason for the abbreviation is that the whole word - internationalisation - is an unwieldy 20 characters long. The abbreviation is the initial 'i', then '18' indicating the 18 missing letters, and then the terminal 'n'. It's the accepted abbreviation for people who have to deal with multicultural and/or multilingual stuff, even if you are ignorant of it.

The Wikipedia article that explains some of this is


Ah, the London Olympics. Yes, I've already vented about them. No, I'm not a fan. I think Cory Doctorow has the right idea: he said he's planning on moving for the year from London to that well-known haven of easy-going liberalism, Singapore. (Because in 2012 Singapore will be a haven of easy-going liberalism in comparison to Fortress London.)


He put the spurious capitalization in because I'd made a typo in explaining what "i18n" was, and he ran with it in the goal of insulting me. Not sure why, but it's what he set out to do.

They are obscure; that's why I expanded a11y. I only knew about it because I'd been dragged into some internationalization stuff, a long time ago, and had "i18n" thrown in my face -- and $DAYJOB tends to care a lot about accessibility, so I pieced it together from context.


Ah, right. I'd missed that.

As for the abbreviations, I'd prefer the practice of introducing the term in full, and then using the abbreviation thereafter, unless all those reading would already know it. It's especially a pain when one encounters an abbreviation that has one possibility for its meaning - not that that's likely for i18n.


Sorry: 'more than one', not 'one'.


I haven't read through all the replies so my apologies if I'm repeating something mentioned earlier.

PDF can be reflowed!

Yes, the files you get can't be reflowed - but the problem is not with the format itself, it's with the people who save the file or the applications they use.

PDF allows the source application to tag the text so it retains the document structure. I don't have MS Office 2007 at home so I can't tell you if it can tag PDF files - I assume it can - but if you have Open Office, when you export to PDF, just check the "tagged PDF" box and all will be well. If your word processor or virtual printer doesn't have such an option look for a "PDF/A" option.

You can then take the PDF file and either read it on a device - it should reflow perfectly - or copy the contents to an application of your choice and save in some other format.


I was given to understand that the shorthand forms i18n and l10n were originally coined to avoid the eternal controversy over whether they should be spelled with an ess or with a zed.

(The torch has been lit; let the divided-by-a-common-language flames commence! :-)


thoe point of language is to communicate - those acronyms fail at that task. As for the complaint that it's hard/boring/annoying to type out the actual word, a) it's a few more letters and b) you can setup Word and most other word-processors to replace A with B so someone can type ally, er excuse me, a11y and have it replaced automatically by accessibility and i18n by internationalization.

Most people who use these don't do it to further communication but to feel special. I've little sympathy for that.


64 has it spot-on.

The so-called abbreviations mentioned failed, spectacularly, to communicate ANYTHING, other thanthat the writer thought he was overly clever.

I agree, I lost my rag at that point, and went over the top, but are you suprised? This is SUPPOSED (Charlie will hopefully confirm that I am correct) to be a highly intellectual and educated and coherent discussion board, hosted by his gracious self. Then, a poster puts up a message that is, to many of us (who are educated and literate, and science/engineering trained) completely incomprehensible. Communication FAIL, I'm afraid!


The abbreviations in question are the equivalent of jargon -- used by people working in a specialty, and who use the terms a lot. Most people don't compose their email in Word, and don't run search & replace on them before sending them out. I use a couple dozen acronyms every day, and, you know what, sometimes I forget and use them around people who don't know what they are.

Someone who uses the term a lot used it by habit; someone else explained it. It is not a big deal.


rick, Greg:

Of course you can type out "internationalisation", "localisation" and "multilingualisation" if you want. But if you work with the damn unwieldy terms all day, contractions are bound to happen, as they do everywhere else. Why is "i18n" and "m17n" always wrong when similar things like "PDA" and "OCR" are acceptable?

(I'd even argue that terms like "m17n", just like other jargon, can have taken on narrower and more specific technical meanings than the words they are based on, and that expanding them would be the more confusing option. It's not txt speak at all -- but that's another discussion I guess.)

Failures in communication... When communication fails like it did here, it is because sender and receiver have different ideas of what is commonly understood and can be taken for granted. It is not because the fault is solely on either side, and taking a strict position that "expression X is wrong" doesn't do much to clarify things.

So do we want to help communication along, or do we want to "feel special" about our own superior word choice / style / accent (and therefore assume that the fault is on the other side)?


I agree with your general sentiment regarding PDFs, but here are some tips for computer-literate folk who have received one and don't know what to do with it.

PDFs are not quite as read-only as they seem to people who have always just used Adobe to read them. There is a free, open-source package called Ghostview (you'll also need Ghostscript -- has them) which can convert PDFs to PostScript source code. PostScript is a text format, so you can open it in any text editor and all the contents should be there.

(This doesn't always work for PDFs created with password protection, but it's amazing what you can decipher with another Gnu program, "rot13".)


The ghostscript and ghostview packages also contain a little program called ps2ascii that works directly on pdf files. Guess what it does?


'it is because sender and receiver have different ideas'

Thats a fair point but I think the bulk of the problem lies with the sender. If a sender is using terms or jargon that could be obscure then it is up to them to ensure that their message is understood by others.

Assuming everyone knows what your jargon terms are is a bit egotistical and won't help you get your message across.

Although flying off at the handle when someone uses a jargon term is a bit over the top too.


I have to say, the idea of a standard called EPub sounded good when I first heard about it, I was rather less impressed when I found that they were using CSS for formatting, and less impressed yet after reading a few dozen books in that format on my Sony and on my PC.

I don't know how much of this is due to poor production and/or poor rendering, but the damn things simply don't look very good, especially when you need anything more than the most bog-simple formatting. It's hard to take a "publishing" standard seriously when it can't even handle footnotes well.

PDF, being essentially an extension of PostScript, is pretty good, if rather CPU-heavy, as a page description language. It's no surprise that, since it's meant to describe a formatted page, it doesn't work well on devices that can't display a full page of whatever size the PDF describes. But things formatted for 15x12 pages look pretty darn good on my Sony Reader.


Homo sum humani a me nihil alienum puto, not with Google to look up weird obscure or new stuff like a11y and i18n. Use the blessed search engine and don't whinge about people doing their best to communicate, please. Graviora manent. Anybody else finds a new word please let me know, I love them. Work that language hard!


The problem with PDF is not that it can enclose postscript, but that it can also enclose other things. The list of things that can be embedded in PDF is long and scary. And just because you have stuff that looks like text inside a PDF doesn't mean it's literal text with a postscript header. Many applications "print to PDF" meaning they use PDF-wrapped-PostScript to draw on the page... your "PostScript text" is a conversion program's random collection of lines. That's much easier than embedding a font in your PostScript.

PDF, being essentially an extension of PostScript, is pretty good [...]
The problem with PDF is not that it can enclose postscript, but that it can also enclose other things.

A small nitpick, or perhaps interesting factoid: PDF is a bit more different from PostScript than that. PostScript is a full programming language - one that happens to have commands for text and graphics - but PDF only implements the PostScript drawing operators (and adds compression as standard). So the core of a PDF document is really just a list of move / draw / text commands.

PostScript is pretty neat and interesting, even if it's not very relevant today. Multi-byte character support is done by sublassing (kind of) the "show text" operator, you can write functions to draw things like graphs based on given parameters, and it has apparently even been repurposed for weird things like controlling homebrew CNC machines.

I suppose PDF on the other hand would be "merely evil". (rambling geek mode off, and back on topic!)


@magetoo: I wouldn't call your point quite as small as a "nitpick;" PDF does have some serious security issues if you're not careful (read: paranoid, and willing to disable functionality) on the display side of things.

However, it's perfectly reasonable in some cases to describe a page of text as a vector drawing instead of embedding a font. Perhaps you don't want others to get hold of the font, or perhaps it's a few characters of Japanese or Arabic for which it's not worth embedding a font. It's a page description language, not just a text container, after all.

If the complaint is that some people unwittingly use vectors when they should be using text, blithely unaware that they're disabling searching and cut and paste, well, I don't think it would be a good idea to redesign a PDL (rather than, say, the software the user is using) to try to avoid such operator error. That's too likely to cause harm to the usefulness of the PDL.

Oh, and just to remind people here: PDF is now an open standard: ISO/IEC 32000-1:2008.

So, Charlie, after all of this, do you still consider the PDF format itself to be the spawn of the devil, or do you think it has its place in the world, albeit not generally when schleping novels and short stories around?


Actually, I like reading PDFs on my Treo (or the Centro) - Adobe's Reader for Palm generally does a pretty good job of converting it for comfortable reading.



Thank you for doing this; I bought Palimpsest previously and was delighted to be able to get an ebook version of it. Overtime is neat too.



About this Entry

This page contains a single entry by Charlie Stross published on May 10, 2010 9:52 AM.

CMAP #9: Ebooks was the previous entry in this blog.

Meet the new boss. Same as the old boss ... not. is the next entry in this blog.

Find recent content on the main index or look in the archives to find all content.

Search this blog