Back to: Books I will not write, #1 (of an ongoing series) | Forward to: Service interruption

Lost in Translation

Automated translation is one of those ideas that collectively constitute the philosopher's stone of procedural artificial intelligence; we can translate our own formal constructed programming languages mechanistically (the tools to do so are called compilers) so why can't we translate human languages automatically?

Well, we can, after a fashion. But translation turns out to be very hard, even for humans, and the consequences of a mistake can range from hilarious to horrifying ...

First, machine translation. The following message was posted to a discussion group for users of Shibboleth, a federated internet single-sign-on system. As you may be able to puzzle out, the perplexed (Japanese) user assumed Google Translate would help their technical support request:

This is question, engish is faulty therefore the right excused is requested. Thank google to translate to help. SORRY!!!!!

At often, the goat-time install a error is vomit. To how many times like the wind, a pole, and the dragon? Install 2,3 repeat, spank, vomit blows

14:14:01.869 - INFO [edu.internet2.middleware.shibboleth.common.config.profile.JSPErrorHandlerBeanDefinitionParser:45] - Parsing configuration for JSP error handler.

Not precise the vomit but with aspect similar, is vomited concealed in fold of goat-time lumber? goat-time see like the wind, pole, and dragon? This insult to father's stones? JSP error handler with wind, pole, dragon with intercourse to goat-time? Or chance lack of skill with a goat-time?

Please apologize for your stupidity. There are a many thank you


I'm pretty sure that what's going on here is that, as with English, Japanese computer specialists have re-purposed the meaning of some older words; after all, if virtual, disk, memory, and printer all mean something very different in 21st century English from their long-established meanings in the late 19th century, why shouldn't goat-time, wind, pole and dragon have mutated in contemporary technical Japanese? Unfortunately Google Translate is still living in the past.


But it's not just in technical sub-dialects that problems arise.

Here's a menu from a Portuguese restaurant. It's bilingual, translated into something resembling English — either by an even stupider machine translator, or by a non-English-speaker with an English/Portuguese dictionary. Foodstuffs commonly have idiosyncratic local names, and not just in Portuguese (I'm thinking of having Toad in the Hole for lunch, followed by Spotted Dick). So a simple dictionary-driven word subsistution approach to translation will have unfortunate consequences.

The menu is unfortunately a JPEG of a photograph, and I can't be bothered to copy type the whole lot, but I suggest you click through and read it while I wait for my "Nose broth with ass entries" followed by "Bastard's lamprey rice done in the manner of between rivers" with "Rotten cake made out of wood" for dessert. (I think that last one is meant to be Black Forest Gateau.)

After all, who can resist a menu with starters like:

Black paw ham
Little cheeses from Big Olive Oil Land
Clams in the manner of the ever-fighting duck
Little bitches from the Algarve
Do you understand?
And finally, even humans get it wrong — with chilling consequences. From Clive Stafford-Smith's book about Guantanamo and the detainees, Bad Men: Guantanamo and the Secret Prisons (extract shamelessly nicked from another source on the internet):
Yusuf was technically a citizen of Chad, but he was born in Medina, in Saudi Arabia, and had lived his whole life there. The US military could have got his birth certificate from their Saudi allies with a telephone call. It was a little more difficult for me, but I got it and it showed he was born in November 1986. After four years of intensive interrogation, far from securing a rich harvest of "enormously valuable intelligence" (General Geoffrey Miller's words), the military could not even work out Yusuf's age, just as they could not spell Binyam Mohamed's name after years of torture in Morocco and Afghanistan.


[...] Yusuf explained how his [initial] interrogation quickly descended into farce. Early in his captivity the US agents questioned him with the assistance of a translator who used a dialect of Arabic in which the word zalat means money; in Yusuf's Saudi dialect it means salad, or tomato. Yusuf reconstructed the interrogation as best he could remember it.

"When you left Saudi Arabia for Pakistan, what zalat did you take with you?" demanded the translator, suspecting that the money must have come from al-Qaeda sources.

"What? I didn't have any zalat when I went to Pakistan." The 14 year-old was confused. He had been through a difficult time since his seizure by the Pakistanis. He was prepared for any trick the Americans might spring on him, but all this talk about tomatoes was beyond him.

"Of course you had zalat. What do you take me for? An idiot!" The translator flared into hostility.

"I didn't! Why would I?"

"Of course you did. Now tell me, where did you get the zalat you took with you?"

"I didn't take any zalat with me. I didn't!"

"Aha! So you got zalat in Pakistan when you arrived?"

"Well, yes, what zalat I wanted, I could get there. That's natural." Yusuf was trying to be conciliatory, though the conversation continued along this strange line.

The translator seemed suddenly excited. "Where could you get zalat in Pakistan, then? I want a list of places. Details. Descriptions, places. Details."

Yusuf wanted to keep him in a good humour. Trying to remember Karachi, he began to discuss places in the market where one might buy salad. With each description of a market stall the translator turned to the American interrogator, who took careful notes.

That evening Yusuf was returned to the cage where he was being held. He was a very muddled adolescent. He talked through his bizarre interrogation with other prisoners, turning over each of his recollections.

Finally one of the older prisoners solved the puzzle: "You were talking about tomatoes. They were talking about money. That's what it must have been."


Arabic isn't a monolithic language; while its written form is somewhat standardized, it is split into many regional dialects, some of which are mutually unintelligible, effectively different languages.

The unfortunate 14 year old Yusuf got himself sold to the US intelligence services by a Pakistani entrepreneur who saw the "reward paid for Al Qaida members" ads, read them as MAKE MONEY FAST, and grabbed the first passing stranger. And the US military, being extremely short on skilled Arabic translators, assigned an interrogator who spoke the wrong dialect and didn't know enough about Arabic to know how wrong they were.

(Cue visions of drones carrying out missile attacks on market stalls unfortunate enough to have sold Yusuf his tomatoes. Complete with lethal collateral damage ...)

152 Comments

1:

There are a many thank you

This appears to be an over literal translation of doumo arigato gozaimasu, which is a common stock phrase - a very polite one - that it's remarkable that Google Translate doesn't know it. Also, as Japanese people tend to avoid personal pronouns (it's impolite), I assume that the sentence before that is the writer apologising for their own stupidity.

2:

The Universal Translator of Trek infamy is still a long way away, if not impossible.

I have a standard thought-experiment on the supposedly less complex issue of speech-to-text transcription (which, judging from the amount of work being put into hand-held translators for US military use in the Middle East, would be an important first step)...

...Vox-to-text won't be effective until it can distinguish between "right you are, so" (Irish, conformation) and "right, you arsehole" (English, aggression).

3:

Just a quick heads up from a Portuguese reader. The menu you point out is a joke, as no computer would make such great bilingual puns ;)

4:

I hope so. But ... are you sure? (Consider the other two examples before you answer!)

5:

Mmm, drowned baby, and a cherry bombe.

6:

I was wrong about the Black Forest Gateau - the parallel Portuguese would appear to indicate it's a Madeira cake.

7:

The menu reminds me of the English menu of a restaurant here in Germany. They are serving "Leberkäse", a very typical and tasty German type of meat loaf. The English menu was selling it as "Liver cheese" ...

8:

I'd like to see any computerised system tell the different between "That ship's a tanker" and "That ship's at anchor".

Maybe if we all switched to speaking Esperanto?

9:

Maybe if we all switched to speaking Esperanto?

Bonan vesperon!

(that's about the extent of my knowledge of Esperanto)

10:

#1 - I strongly suspect that your (and Charlie's) Japanese are way better than mine, but I would concur fully with your interpretation of what the writer intended to do. It's what I thought they were trying to do before reading the comments, and it's not just the Japanese who do it. I also post on a car forum under my real name, and it's quite common for non-native English speakers to preface or suffix a query with an apology for their English.

11:

Charlie, I'm with Bruno, the Portuguese menu was done as a joke (I had seen it before), you can clearly see that some effort was put into making the funniest blind translations possible.

12:

This is a classic example of why you should always get a human to check the result of an automatic translator:

http://adweek.blogs.com/adfreak/2008/07/then-well-grab.html

Although even humans get muddled; in Cyprus I once saw a sign saying 'ΝΟ ΠΑΡKING'...

13:

Almost simultaneously, Mark Liberman posted about the Shibboleth email over at Language Log. In the comments they are discussing whether it's a parody or real bad translation.

14:

Since I currently make my living as a translator, it makes me happy to see that machine translation is, at best, a very, very distant prospect, and not simply because those dang machines can't unravel the twisty little passages of our intended vs. expressed meaning, our metaphors and similes, various accents and speech impediments, etc.

The thing is, most *people* doing translation seriously suck at it, despite the fact that they have a brain wired for language, a college education that was aimed squarely at making them good translators, and that they have been doing it professionally, often for years (and in one terribly sucky case I had the misfortune of cleaning up after, decades). It is even a popular local sport among the snarky, nerdy and linguistically inclined crowd here to make fun of unsuccessful movie titles and subtitles, with whole websites, Facebook groups and suchlike devoted to the pastime... and I'm not talking about errors in unraveling an obscure American pop-culture reference or untangling the exact dialect to work with, I'm talking about stupid mistakes made by students of a language who, for example, in a scene where the characters are all very obviously going to bed, still somehow manages to translate "nighty-night" as "99".

So yeah, I'm guessing the 'puters are pretty screwed with that one. But if they ever do figure it out, I, for one, will welcome our new Google Translate overlords. I could use some time off. :-)

15:

#8 Para 1 - I'd agree; aside from the position of the spacebar, they use the same phoneums in my dictionary (Collins, if anyone would like to check Chambers, the OED, or Websters [or Australian, Canadian or NZ equivalents if anyone has them]).

16:

I wouldn't be so sure it's a joke - I do remember seeing a lot of really hilarious menus in Lisbon last year. Another problem is that it's possible to have formally correct but practically unhelpful translation. ObRestaurant: years ago, I saw a menu in northern Spain that translated some ingredient or other as "Fresh Squill". It's possible that there really is a valid entry in some corpus of English vocabulary for this, but it meant nothing to us or to anyone I've ever met.

@8: Doesn't everybody speak
In fixed length groups?

17:

Of course, wikipedia knows squill. It's the family of plants including hyacinths, from which cough medicine and a rat poison are extracted. Perhaps I was well advised not to order it.

18:

I don't care if the menu is joke- it is still one of the funniest things I have seen in a long time and provided me with a much needed laugh- currently in the throes of organising an office move and have spent all morning dealing with BT.

19:

Charlie has been on the receiving end of my translations of German menu items: "half a pig's arse", "a pig's foot", "every sort of sausage with Ireland's quota of potatoes, and rotten cabbage", "pickled lump of cow". All perfectly accurate, of course.

20:

Perhaps the menu is an homage to English As She Is Spoke.

21:

Having worked with Shibboleth a bit, I'd say that goat-time and vomit are actually exactly what he meant, since goats and vomiting figure heavily into Shibboleth configuration.

22:

I remember once having a wonderful book about Japanese, one of whose main pleasures was its many literal translations of Japanese idioms. I think there was one (very old now) that was something like "my stomach was about to boil tea."

One of the most famous examples of this kind is, of course, the contrived "The vodka is good but the meat is rotten" for "the spirit is willing but the flesh is weak." Alas, there's no evidence that this was ever more than a thought experiment by a linguist.

23:

Very easy, actually, since the words are actually different. I ran both through a part-of-speech (POS) tagger:

1. That/DT ship/NN 's/VBZ a/DT tanker/NN
2.That/DT ship/NN 's/POS at/IN anchor/NN

In speech to text, this would only be marginally more difficult, since the current crop of systems do very well on simple ambiguities like this where the patter of stresses will produce different phonemes.

24:

As someone who makes a living from Japanese to English translation, and is quite often asked to "proofread" text that's been machine translated (something I won't do, on principle), my gut reaction to the ostensibly Japanese message is that this is probably also a joke. It doesn't make any of the common errors seen in J2E MT, such as multiple sentences starting with "It" as the subject, and phrases like "Please apologize for your stupidity" have no relation to Japanese grammatical structures. Not to mention that I can't think of any conceivable Japanese equivalents of "goat-time," "pole," "vomit," "dragon," etc. that might have anything at all to do with computing. I'm ready to be proven wrong, but it's extremely hard to believe it's genuine.

25:

In IT, the Japanese often use words than *sound* the same as the English term, i.e. phonetic representation in Kanji of an English word.

This can get very amusing in an email discussion about the contents of an Excel Spreadsheet - the automatic translators will insist that this "shite" contains the elements that are to be examined!

26:

@18 - funny: we don't document the goats and vomiting on the federation website. Of course, this could be what we're doing wrong.

@21 - we've made a little headway (we've been assured offlist it's deadly serious, by the way) and it'd appear that vomit in this context is to produce an error message, lumber is logs, of course, and some of the other words are translations of the syllables that some terms were chopped into.

Goat-time has us all stuck, though....

27:

Google translation seems to do better with engineering terms in my limited experience

28:

Written down, the characters used are, of course, different. But I assume you want to talk about the spoken form.

In my particular dialect (a form of RPE), the sounds and stresses are different. The 'a' in 'a tanker' is, for me, a schwa, whereas in 'at', the vowel is more stressed. Also, the rhythm is slightly different.

I have heard that some American viewers of LotR had difficulty with Merry's name, thinking he was called Mary (a different sound to the English).

29:

I can't remember which novel it was in, but there was a joke about the problems of a Japanese translator attempting to translate an English book that was described as "a gritty Northern drama". His misunderstanding of the line "Ee mother, I could murder a faggot" raised a smile.

30:

23: Purely guessing, but could goat = goto?

31:

The problem in general with machine translation is that translation needs deep context, i.e. an understanding of what is being said by the text.

Compilers don't have that problem, because programmers will hammer out all the ambiguity of the source code until it works.

32:

@26: It's by David Lodge, if that helps narrow it down....

33:

I would agree with Claire.
Japanese has 3 different writing systems that is used in conjunction: Kanji (ideograms), hiragana (syllabary) and katakana(syllabary). The last one is used mostly for foreigner words specially those that were not in the original language. Most of computer terms fall in this categories. Computer in Japanese is "konpyutaa", printer is "purintaa". The translation is this words are straight-forward so I don't think Google translator would mix them up as something else and it is probably a joke.
There is another possibility, that it could be actually Chinese and not Japanese. I don't know about Chinese but I think they do use kanji (ideograms) to represent new words, and probably they have to adapt old kanjis to new meanings.

34:

Machine translation can usually get the basic concepts over to the recipient if grammatical niceties are not required. It helps if the original text is written with this in mind. A statement like "The cat sat on the mat" will usually be comprehensible after machine-mangling.

http://i231.photobucket.com/albums/ee12/nojay_photo/Odds%20and%20Sods/Homecentre.jpg

shows the results of J2E machine translation I encountered in Japan a few years back; it made me smile at the time I first saw it but even so it is clear what the hoardings were advertising, despite the odd wording and phrasing in the English version.

The one that made me laugh when I was trying to hack-translate a bit of manga was "oshiriago"; figuring out this meant "cleft chin" took a while since oshiri means buttocks.

35:

26, 29:
It's in "Small World."

36:

I'm almost sure, because as @11 says, the translations are done a way to end up quite funny, and not just non-sense.

Although, as @14 witnessed, there are several cases of mangled english in Lisbon :P

37:

There's a joke Mike Harding tells about being on tour in the mid-Western US. His long-suffering road crew from Bradford were sick to the teeth with American fast-food and in a burger bar on a Native American reservation one of them, pining for his home town's cuisine announced in a loud voice "God I could murder an Indian".

38:

My hovercraft is full of eels. Do you want to come back to my place, bouncy bouncy?

Sorry, but 32 comments into a thread about bad translation without a Hungarian Phrasebook reference violates my sense of internet propriety.

Translation is definitely a "Turing hard" problem, on a par with generating original fiction. *Harder* than merely conducting a conversation in a known language. You've got to project an assemblage of concepts onto a different map, in a way that makes sense to the reader and still represents what the originator meant.

Even if Chomsky isn't full of shit with his "Universal Grammar" (and personally I think he is, it seems based more on his ideological need for everyone to be fundamentally the same mentally than any real evidence), it's not the kind of problem that current architectures are suited to. Perhaps hardware-based neural nets, or quantum computing.... But no "search and replace" based system, no matter how iterated, will ever be mistaken for for a native speaker.

--Dave

39:

@33 For sure, but those unintentional manglings are never as funny as the example Charlie provided. :-) (I live in Lisbon, btw)

40:

There are several books about the Napoleonic Wars which mention that Blucher, the Prussian commander at Waterloo, was completely insane - citing as evidence the fact that he apparently believed himself, during the Hundred Days, to be pregnant with a baby elephant.

There's only one I've found so far that notes, in passing, that "pregnant by an elephant" is a perfectly normal German expression meaning "extremely impatient".

The zalat story is great.

41:

I've dealt with computer-translated Japanese often enough that I can often translate from googleEnglish to Japanese and back to English. In this particular case, part of the problem is that Google is terrible at noticing the semantic meaning of words in katakana - e.g. the computer-jargon word "clean" is transliterated as "kirin", which also happens to be pronounced the same was as the word "dragon". (Of course, it doesn't help that "error" was translated as "vomit", either.)

Where you're much more likely to encounter repurposed words is either Chinese or Icelandic; Japanese, for the most part, uses English for most of its computer technical vocabulary in common parlance.

42:

I would say that goat-time is run-time and I suppose vomit could be some kind of Java stacktrace which actually looks like ASCII vomit.

43:

BTW, "Black paw ham" is a good translation. Black leg would be better. Its ham from the Iberian breed of pig which is dark haired, and as it is considered to be much, much tastier than "normal" pig ham.

So much that nowadays the amount of supposedly "pata negra" ham on the market may only be possible with eight-legged pigs, if you know what I mean :-P

44:

I think you could sell t-shirts that said "Please apologize for your stupidity." for $100 each.

45:

@ 16:

Charlie has been on the receiving end of my translations of German menu items: "half a pig's arse", "a pig's foot", "every sort of sausage with Ireland's quota of potatoes, and rotten cabbage", "pickled lump of cow". All perfectly accurate, of course.

Pepe Le Pew calls the object of his affections "my little cabbage", which is what the traditional endearment "mon petit chou" means in literal translation (and which a younger me used more than once, sigh.) And "Out of sight, out of mind" was famously (machine) translated out of English and back to become "invisible idiot". Or what about Kennedy's famous pronouncement that he was a jelly doughnut? Idiom and erratically irregular syntax really are funny and I have no reason to believe on that basis alone that the menu is a joke.

And of course, this:

"Of course you had zalat. What do you take me for? An idiot!" The translator flared into hostility.

"I didn't! Why would I?"

"Of course you did. Now tell me, where did you get the zalat you took with you?"

"I didn't take any zalat with me. I didn't!"

"Aha! So you got zalat in Pakistan when you arrived?"

"Well, yes, what zalat I wanted, I could get there. That's natural." Yusuf was trying to be conciliatory, though the conversation continued along this strange line.

is just Kafkaesque, and very, very believable. People tend to forget that about Kafka, giant insects notwithstanding.

46:

I'll post my two favorite translation efforts in separate posts.

The first one came from the minister of my childhood church (liberal protestant). He was a wonderful preacher, with a DD even, and he had learned Aramaic, the language of the New Testament.

He was fond of pointing out how the Bible had been warped through translation.

First in Aramaic, there are three genders: masculine, feminine, and neuter, as in Latin. He loved to point out that, in Aramaic, God is a neuter word.

Additionally and more importantly, faith is a verb in Aramaic. It's not something you have, it's something you do, and there's a different Aramaic word for belief.

In other words, the central tenets of the original Christians were different than what people preach now. Atheists reading this might want to stop and consider what it is they don't believe in.

47:

Another mistranslation anecdote:

I was involved in producing a bilingual English/Spanish gardening book. Where I live, the gardeners are often Latin American, while their employers speak English. The idea was that the book was laid out in parallel, English and Spanish sections mirroring each other on the same page or opposite pages.

However, finding a proper horticultural Spanish translator proved...difficult.

For instance, did you know that mulch and fertilizer can be the same word in Spanish? With that mistake, you might ask your gardener to add four inches of mulch for weed control, only to see him add four inches of fertilizer. That would be great for weeds, but not so good for the garden plantings.

Worse, the editor initially decided to hire a grad student as the translator. He looked good on paper, as he was studying botany and was a Mexican national. He got the translation done quickly, too.

What he did was to split the project with three other Spanish speaking friends. He divided the manuscript, apparently assigning pages at random to the four of them. They all translated their pages without talking with each other about little things like word choice and accuracy. Then they stapled the whole mess together and sent it back to the project's editor.

The editor didn't speak much Spanish, but she noticed the problem when she compiled the English and Spanish indexes. The same statements on different pages were translated in very different ways. The fertilizer and mulch problem above was one of the simpler errors to fix.

Two translators later, they had something that was at least consistent. The published version still has a lot of translation errors, which at least gives them a way to sell a second edition, revised and updated...

48:

Atheists reading this might want to stop and consider what it is they don't believe in.

The fact that the Bible has been mistranslated, both by accident and for political reasons, isn't exactly a deep secret. It also has no bearing on the existence of the deity described therein.

Similarly, the fact that some people understand Old Norse, and can point to differences between Nordic mythology and various translations, has no bearing whatsoever on whether Odin and his mates exist.

49:

Atheist's dont believe in a god regardless of what religious people do believe.

Ive always thought that a machine translator requires an enormous amount of faculties that seem to be forgotten by a lot of fiction. If I said to someone "I think your safe" the machine could translate that to mean "I think you are in no danger of harm", I may of meant that or I could be wearing a burberry cap and could have meant "I think you are a nice guy". Thing is even with a full understanding of different dialects and double meanings of words how could a machine differentiate which one I meant? There's nothing in the context to tell it. The only resolution could be if the machine asks for clarity but that just smacks of annoying microsoft paperclip like translators that ask "did you mean this?" at every utterance!

50:

In other words, the central tenets of the original Christians were different than what people preach now. Atheists reading this might want to stop and consider what it is they don't believe in.

Why do you assume that atheists are reacting specifically to Christianity? This one isn't ...

51:

And then there are the on-purpose translation errors made for diplomatic purposes... such as George H.W. Bush's purposeful (that is, carefully coached) mispronunciation of the then-President of Iraq's name during the run-up to the Baghdad Air Show (aka the air-warfare phase of Desert Storm). Let's just say that (rough approximations only!) "suhdahm" has a much more salutary meaning than does "sa'duhm," particularly in north-Gulf Arabic (as opposed to MSA). Now throw in the cultural significance of essentially accusing a national leader of not being "properly" named within the tenets of Islam... all based on the legendary American difficulty with languages as diplomatic cover.

What amazes me most of all, as a non-native English speaker, is that reliable communication is ever achieved, let alone how often; many of these examples are amusing precisely because they're relatively rare. But then, it doesn't necessarily take more than a few hundred miles in the same putative language to make communication improbable — just look at the difference in what you'll get if you order a "coke" at the Billy Goat Cafe in Chicago ("Pepsi, no Coke") or at a similar dive in Atlanta ("What kind of Coke? We're out of white soda.").

52:

"I will lift up mine eyes unto the hills, from whence cometh my help" != "Ich hebe meine Augen hinauf zu den Bergen. Woher wird meine Hilfe kommen?"

Apparently the second, with two sentences rather than a relative clause, is faithful to the Hebrew.

53:

Or what about Kennedy's famous pronouncement that he was a jelly doughnut?

Kennedy called himself a pfannkuchen?

54:

But do we even need to consider translation between languages, wouldn't the first step be to hope for automated extraction of meaning inside a single language ?

Until we get computers to smily wryly at stuff like "Is a lady barrister without her briefs a solicitor ?" there is absolutely no hope of them being able to translate it, or as a first step: not attempt to translate it, into another language.

Poul-Henning

55:

Distinction without difference?

     How do you say "have a nice day" in New York?
     Fuck you.

     How do you say "fuck you" in Los Angeles?
     Have a nice day.


56:

A German friend told me once that "Ich bin ein Berliner" is a perfectly sensible thing to say - it's obvious from context which of the two possible translations applies. She's from Hamburg :-)

57:

Exactly. To a German, it appears that there was no ambiguity at all - Kennedy declared himself spiritually one with the inhabitants of Berlin, but not an actual resident of the city. The possibility of ambiguity appears to come from English speakers who think of the foodstuff first (e.g. your Hamburger friend, or a Frankfurter) and of the place second. In Kennedy's case, the idea won't occur to your average Joe, since he won't have heard of the Berliner Pfannkuchen in the first place. It's the person with enough knowledge to have encountered the pastry who can be tricked, since the urban legend is trading on their feeling of superiority.

58:

Well, I don't think we spent enough time on the Arabic here. I am one of nine remaining students in seventh semester Arabic at the U of I (although I must admit that right now this feels more like "You're the best we could come up with" than "You're extremely good at Arabic," so professionals please go easy on me here), and I HATE google translate.

One of the errors I can describe most easily is that it doesn't take pronouns into account. This can really mess up translation pretty easily. For example:

أدرس "I Study"
تدرس "You (m.) Study"
تدرسين "You (f.) Study"
يدرس "He Studies"
تدرس "She Studies (not that it is visually identical to the "You (m.) Study" and relies on context)
تدرسون "You (pl.m. or pl.mixed group) Study"
يدرسون "They (m. or mixed group) Study"

All are translated as "Study" by google translate. Google translate also does not indicate that these are all in the present tense.

While this is good news for me as a future translator إن شاء الله I really hate actually having to use translation programs of any kind.

Also, whoever decided to use some one not fluent in MSA at Gitmo was an idiot.

59:

Could it be that "vomit" in our J2E translation is a mistranslation of a phrase similar to "bring up", as in "he brought up his dinner outside the pub"?

"This brings up an error" -> "This vomits an error"

60:

To be fair, the translation says "a error is vomit"; presumably an idiomatic, polite-company translation would be "throws up an error"...

Why in the world GT doesn't handle the a/an distinction is of course the more perplexing part.

61:

Snap!

62:

"we can translate our own formal constructed programming languages mechanistically (the tools to do so are called compilers) so why can't we translate human languages automatically?"

Heh. Programming languages have the problem in reverse.

Computers are great at reading them but even a tiny minority of unusually intelligent highly trained and experienced humans struggle to write in them.

63:

I'm guessing "goat-time" is "run-time", though why Google is translating "run" into "goat" is beyond me.

There's more on this at LanguageLog, http://languagelog.ldc.upenn.edu/nll/?p=2651. Someone ran the text through the Japanese-English translator a few times, and ended up with sentences like "The wind dragon and goat sex" - showing that on the Internet everything eventually does devolve to goatsex.

64:

It still has no bearing whatsoever on the actual existence or otherwise of any deity. Especially to someone who sees the book as nothing but a collection of Middle Eastern myths and traditions that some people take far too seriously.

65:

Again, it depends on what the god (or God) you don't believe in is.

Personally, my favorite religious scripture is the Tao Te Ching. The first line is "Tao k'o tao fei ch'ang tao." This literally means something like "the tao that can be tao'ed is not the eternal/absolute tao."

Tao can mean Tao, way, path, that, absolute, talked about, spoken of, walked, trodden, etc.

Gotta love it when a scripture starts off with a pun.

I have several English translations, but my favorite is by Jonathan Starr, because he lays out the Chinese text character by character in huge tables, and provides the possible English meanings of each character next to it. This gives you a better idea of just how much fun the original is.

And no, I don't talk about what I may believe about Tao (noun or verb). God may be a partial synonym of Tao, but whatever.

66:

When feeling particularly snarky, I've sometimes argued that 'Chomsky was wrong about linguistics too'. The regularities across languages which he sees as evidence of universal grammar are I think more easily explained as emergent properties of any communication system which 'sticks' over time.

A toy example: A communication system which requires a person to learn different irregular forms of every verb is going to be a lot more effort to learn than one which has a single suffix which one can add to a verb to denote e.g. the past tense. And so the latter one is more likely to be learned by new speakers where the two find themselves in competition. And following to its logical conclusion, one would also expect very commonly used verbs to stand more chance of retaining their irregular forms go/went, than less commonly used verbs - smirk/smirked. All without any need for a universal grammar. Grammar itself is of course a means of enabling information to be conveyed more quickly and simply - use of e.g. word order to convey information that might otherwise require many more words.

And the absence of a universal grammar - some deep underlying structure for translating between languages, does make the whole business of machine translation (and quite possibly, the learning of a second language) rather more difficult than it would otherwise be.

This guy (who taught me computational linguistics about ten years back) has some more coherent material on the subject: http://www.ling.ed.ac.uk/~simon/

67:

If you listen to the tape of Kennedy's speach in Berlin, it seems obvious that the wild cheering the erupts when he says that famous phrase means that the crowd understood perfectly well what he meant.

It's hard to believe that the crowd would erupt in enthusiastic cheering if he had called himself a sugar bun. Not Germans, anyway....

68:

It was always ridiculous. The story appears to have got more common with time. A key point is that actually, the cake isn't that common, not like frankfurters or hamburgers. It is almost certain that JFK's speechwriter chose to put it that way for euphony, knowing that it's not actually a slip in German. Ich..bin...Berliner! would work well if he was speaking German, but he wasn't, and the extra syllable pads it nicely into English prosody.

69:

Russian-speaking blogosphere just recovered from the uproar caused by the discovery that the Russian Academy of Sciences on their official website translated 'Institute of Protein' as 'Squirrel Institute'. And that wasn't the only bizarre translation either.

The website has been updated but here is the original screenshot:
http://www.ljplus.ru/img4/f/f/fftu/squirrel-institute.jpg

70:

Ferdinand de Saussure said something like "you can't translate an idea from one language to another without losing some of the meaning" - but whatever he actually said, he said in French.

Is Goat RAM?

71:

With regards to Japanese, I'd also mention that one of the things that gives foreigners (and computers) headaches is the fact that the language is often written without any spaces between the words, so sometimes just figuring out where a word begins and ends is a chore. There are rules, but as always, understanding the context is necessary to figure what the hell a sentence means.

72:

Re Aramaic and the New Testament; something has got garbled here. Aramaic has only 2 genders, masculine and feminine, and the most obvious word for God is masculine. (If you want a fun contrast, you can find "spirit" as feminine in Hebrew and Aramaic, neuter in Greek and masculine in Latin, but there's no evidence this had much effect on beliefs).
(Incidentally, the New Testament is in Greek except for a handful of quoted Aramaic words, though many sections are presumably based on Aramaic oral sources. It was of course translated into Aramaic (Syriac variety) later on).
While many Bible translations leave a lot to be desired on points of detail, it's very rare that any substantive argument involving Judaism or Christianity is affected by these. (I'm not linguistically competent to address how other major religious texts have been translated).

73:

Err what do you mean which god you don't believe in? Atheism is the lack of belief in any god or gods. If I do not believe there is a teapot orbiting the moon my belief in its existence is not changed by someone stating the different type of teapot they think it is

74:

The nice thing about communicating in Esperanto is that you've already self-selected for people who understand that not everybody speaks YOURNATIVELANGUAGEHERE if you talk VERY LOUD AND SLOW.

We watch out for mistranslations and "false friends" (pseudo-parallels), and enjoy telling stories about them among ourselves.

75:

I would suspect that, if real, the Portuguese menu was translated by a hungry traveller in exchange for a meal or two. Knowing they would not be back they were as inventive as possible.

Could vomit be output?

76:

Another one from Turkey: A traditional symposium for handicrafts?

77:

We were promised babelfish!

I think it's a common phrase 'to throw up an error' in some parts.

PS - OT - Is f'book dead for everyone here, or is it just me?

78:

@74: Facebook is down for me too.

79:

It's challenging enough with straightforward text, just try it with different accents & regional idiom. At a recent panel, we experienced the painful yet amusing experience of the Australian panelist having to be translated by a fellow panelist: the Americans in the audience were not coping with an Australian accent.

Also, some words, especially neologisms or brand names tend to be incorporated as transliterations. For example, try doing a Google translate of "Coca Cola" into Chinese, listen to the translation, then examine the meanings of the individual characters.

As pronounced, the Chinese version sounds like the English version, and taken as the full phrase, means the same thing, but individual characters in the phrase... They do not mean what you might expect.

80:

Do you think "Globish" might be the de facto replacement for Esperanto?

81:

And besides from actual words and their multiple, idiomatic, or metaphorical meanings, one also has to take into account general context and rhetorical strategy.

I read an article in high school composition that talked about this, no memory of the title, but this page at Dartmouth http://www.dartmouth.edu/~writing/materials/tutor/problems/esl.shtml is talking about some of the same ideas, although something about its general attitude toward "non-Western" rhetorical structures put my teeth on edge for reasons I can't quite pin down.

And of course none of this accounts for wordplay, as one might see in Élisabeth Vonarburg's "Chroniques du Pays des Mères" (translated as "In the Mothers' Land") where many words are altered to reflect the world of the story. (Good description in English here: http://www.tor.com/blogs/2009/07/history-language-identity-elisabeth-vonarburgs-the-maerlande-chronicles)

82:

Facebook appear to have had their DNS records hijacked.

Whoops ...

83:

Facebook is up for me as of this moment.

84:

When I was three to six, I thought we lived in O'Karber.

85:

Charlie: That Portuguese menu was written as a joke, but the pork prick here is real:

http://picasaweb.google.com/goblindegook/FRumFantStico2007#5131398175377770562

86:

>Unfortunately Google Translate is still living in the past.

For all there bluster, apparently that is not the only thing living in the past for Google. I find Google maps horribly dated as well (Simple test, google any new construction in your area). For me, I just google my home address. Yahoo maps shows my house, Google maps shows woods...

Seriously, are they trying to run the company on the cheap? Or is it just laziness on their part?

87:

At a recent panel, we experienced the painful yet amusing experience of the Australian panelist having to be translated by a fellow panelist: the Americans in the audience were not coping with an Australian accent.

The accent in question can be heard discussing the incident here.

88:

For those who would like to read SF with stories on language and translation, try Suzette Haden Elgin.

89:

Traduttore, traitore is an Italian saying that fortunately doesn't need translation (Into English at least). If we're talking about jokes and puns, then you're out of luck, it's just not going to carry.

Occasionally the translators prove resourceful and come up with a new joke or pun to slot in the same context, the meaning is not conserved but the intention is. It's always a pleasure to encounter this when it's done properly.

For example in "A fish called Wanda" Jamie Lee Curtis berates Otto's ignorance saying "The London underground is not a political movement", while in the Spanish version she says it's not a unit of measurement (As the word for meter and underground is the same in Spanish).

Asterix comics in English also are quite well translated, for example Obelix's dog "Ideefix" is gloriously translated into "Dogmatix". The translator who came up with that one deserves a medal.

90:

The work of translating the Asterix comics is probably easier because the languages come from the same family. I wonder how translators cope when translating works from Romance Languages into something completely different, like Swahili or Japanese?

I have nothing but admiration for people who can translate & do it well. I think it requires a certain twisty mindset; they're probably exceptional at cryptic crosswords too, the bastards!

91:

Nor is Spanish a monolithic language. A sometime directory of UC press spoke of being in Chile with a Mexican friend who broke out laughing on seeing the newspaper headline "Cojete Norteamericano Se Chinga". In Spanish as used in Chile, the rocket blew itself up, in Spanish as used in Mexico, well....

92:

I work with a lot of Japanese developers, and have a year or so of language instruction in the language myself, and from what I've seen, technical words are almost entirely borrowed from English, far beyond the extent that would even be necessary. (I.e., they'll take a term based on a English word rather than use the Japanese equivalent.

So, for instance, "email" is "me-ru" despite Japanese having perfectly serviceable words for "electric" and "mail".

Google translate is absolutely abysmal for Japanese. Some of it is likely just the differences in the languages. All languages have homonyms and when translating words from one European language to another those homonyms are likely to occur in similar places. In Japanese, though, they are apt to be utterly different from English. This makes it far easier for a machine to get an utterly wrong definition.

After a year in the language, I am also struck by how the difference in the two languages makes a one-to-one translation of words impossible. If you do a word-for-word translation from Japanese to English you get something that just sounds utterly bizarre to English ears. You really have to understand the original sentence and then build the English sentence up from scratch based on the meaning. Machines can't do that yet.

93:

Re: "pregnant with a baby elephant"...

Huh. I've seen someone claim to have done this as a joke.

He says he's pregnant with a baby elephant -- then turns his pockets inside-out, and unzips his fly.

I pondered the absurdity of the phrase having anything to do with being "extremely impatient" -- I mean, why?

But then it occurred to me, given the jocular equation of the male penis with a baby elephant's trunk in the joke, that "extremely impatient" might be an euphemism for "needs to urinate very badly".

Perhaps Blücher suffered from some sort of urinary tract problem?

94:

To be honest, a Japanese-English translation isn't being quite fair to Google. The two languages are completely alien to each other, and Japanese relies *heavily* on context--for sentence subjects, meanings for words written in kana or using multiple kanji, etc. (Similar problems, I think, exist with English-Arabic.) Still poorly done, yes, but understandably so. Google still works fairly well for languages that are close to each other.

95:

It's a problem for English speakers too of course. Some years ago at a backyard BBQ I was introduced by the hostess to a friend of hers from London. It transpired that he'd in fact landed in Sydney only a few hours earlier and had barely had time to shower and shave before being expected to drink copious quantities of cold beer and devour various grilled meats.

Wanting to express my sincere and heartfelt sympathy for the man I said "You poor bastard, you must be absolutely buggered."

The gentleman looked at me a little oddly and replied that I was not the first Australian he'd met so far that seemed under a severe misapprehension about the activities permitted on British Airways flights. And as for my assumption about his parent's marital status…

96:

Let's not forget the fun of language changing over time.

Another of my favorites comes from Grove and Rackham's Nature of Mediterranean Europe. It's a book about the historical ecology of Mediterranean Europe. In the first chapter, the authors explain why they did most of their translations of primary sources themselves, rather than trusting others' work.

They had a lot of fun with the way words change over time, and one of their favorite examples was the word "desert."

Back a few hundred years ago, desert meant "uninhabited," and we still have that meaning in English as "deserted." Our use of "desert" to mean an area with <250 mm of rain per year is relatively modern. According to the authors, this changing definition of desert also occurs in other languages.

The problem, they note, is that historical ecologists aren't terribly careful about translating "desert" properly. As a result, researchers have used historical records of "desert land" to talk about how various areas became drier over time, especially after wars and plagues...

97:

Of course some argue that their Peshita (Aramaic New Testament) is the original and the Greek the translation, and put forth evidence such as "camel" and "rope" being the same word in Aramaic - "it is easier for a /rope/ to pass through the eye of a needle than for a rich man to enter heaven" makes more sense than /camel/. Another: "why hast thou forsaken me?” becomes "why hast thou spared me?” (after several hours of crucifixion). There are dozens and dozens of similar cases, many very persuasive of mistranslation from Aramaic, some outright howlers, like "eunuch" for "believer" and "salted with fire" for "destroyed by fire" but everyone agrees that Jesus (presuming he actually existed) would have spoken Aramaic, and there was a period of oral transmission before it was written down, in whatever language.

The Peshitta originalists claim that Paul's works were also in his and his audience's predominant native language, Aramaic. Another case, but not quoting Jesus, rather from Romans: "For scarcely for a /righteous/ man will one die: yet peradventure for a good man some would even dare to die" (which makes no sense) but "wicked" and "innocent" differ in Aramaic only by a slight length of stroke that distinguishes ayin from nun, so that becomes: "For scarcely for a wicked man will one die: yet peradventure for a good man some would even dare to die".

Anyway, it is sadly amusing that so many people earnestly try to believe in such non sequiturs when they are likely just botched translation.

98:

Yup, I was here to say the same, but you got it in first. Shibboleth is...well. Let's say I worked with a website where there was a shibboleth sign on feature for an API. I don't know if I met anyone who could successfully integrate it without a good few hours of our programmer and their programmer on the phone.

99:

Consider the linguistic connection between "pregnant" and "expectant". "I am pregnant for/with [problem]" was a pretty widespread idiom across Europe at the time meaning "I want to get it over with" (or "I want it to happen", for a positive issue); "I am pregnant by an elephant" seems a perfectly reasonable jokey way of emphasizing the urgency of lightening your load.

For a really good discussion of translation issues, I recommend seeking out Umberto Eco's Mouse or Rat? : Translation as negotiation. As an author translated into several languages, a translator into his native Italian, and a professor of semiotics, he has an unusually thorough grasp of the problems presented.

100:

This has really been bugging me, because I can't see how even Google Translate could come up with some of these terms. For example, plugging in "rogu" in katakana (the correct term) into Google Translate, the Japanese Yahoo translation tool, and Babelfish just gives back "log(s)" in all three, nothing approaching the inspired idiocy of "lumber." And "eraa wo modosu" in Japanese, which can mean "return an error" but could conceivably have led to "vomit" because "modosu" also means "throw up," just gives the correct "return error" in Google Translate, "I return an error" in the Japanese Yahoo translation tool, and only the somewhat less accurate "error is reset" in Babelfish. Something weird is going on.

101:

Someone DID mention the Babelfish, eventually!

@ 83 Don't!
Using Google, or "Multimap" or ANYTHING at all around the vile "XXXth Olympiad" area (4km South of here) gets you complete nonsense ......

102:

Ref the RAM => goat thing, it sounds plausible to me.

I remembered hearing about a technical paper which was translated from Russian to English, and contained several references to a "water sheep". Eventually someone looked back at the original, and discovered that the term used in Russian was "hydraulic ram".

103:

One translation error that I've found often in older Norwegian translations of English mysteries and thrillers is that "black tie" and "white tie" are not taken as specific sets of clothes/dress codes, but read by the translator as black/white + tie, that is, as referring to a necktie that is of that colour. The translation then gives the impression of the character going to his tie rack to choose and put on a black necktie, worn with a regular suit, say light grey, before going down to join the rest of the party...

104:

David S@95: "You poor bastard, you must be absolutely buggered."

That would mean the same in working class English English too - not sure so much about elsewhere in the UK.

105:

Mary, marry, merry; in the Midwest many say and hear them as the same. I’ve lived there for years and still have a difficult time understanding how this could happen to an entire region.

106:

As it happens, my sister's middle name is Marie.

No, not the French pronunciation, but a Yorkshire variant which is (to her and I) somewhere between 'Marry' and 'Māori' by dint of stressing the first syllable rather than the second.

We blame our mother.

107:

Hi, another interesting thing that was sort of lost in translation was soon-to-be-ex chief of Nokia smart phones Anssi Vanjoki saying that switching phone OS's would be like "peeing in your pants to keep warm".

It's an often used idiom in Finnish, meaning (obviously) that you shouldn't make shortsighted decisions since if you're sitting outside in the cold the pee will at first make you warm but then start to freeze.

Well it's not an idiom in English and probably just sounds like a weird fetish to anyone not from a cold country.

A link a an article mentioning his quote: http://www.antipope.org/charlie/blog-static/2010/09/lost-in-translation.html

108:

"peeing in your pants to keep warm"

SCUBA divers and surfers and sailors and all sorts who wear wetsuits regularly do this!

109:

Strangely enough you'd think Spanish translations of French would be closer, but I wasn't aware of the puns in Asterix at all from reading the Spanish version, they were completely dropped. The English language just is handier for puns, probably. It also depends on the translator, of course.

All those amateur fan translated manga suffer horribly from over literal stilted language attempting to preserve all the japanese honorifics and circumlotions.

The Irish author Flann O'Brien (Or Brian O'nolan) was adept at multilingual puns and playing with accents. I recall one of his short stories featuring an Englishman in Ireland who responds to everything with "I pick up Auden" (Say it out loud)

111:

Translation (through various means) can be approached as a final object to be praised or mocked in comparison to an imagined ideal. (The usual translations of Lem and, say, Hard-Boiled Wonderland and the End of the World usually get "I can't believe somebody dared and mostly succeeded" reactions.) But translation can also be viewed through a pragmatic lens, and then possibly as a step in an iterative process. The process of communication through human language, especially when participants are prepared for it, is tolerant of noise in the channel. "What?" happens when error correction doesn't come up with a good result. "English, do you speak it?"

I don't know about you folks, but my senses of sight and hearing are pretty prone to dropouts, with usually unconscious but sometimes conscious fill-in-the-blank. We hear and see what we expect. Iterative translation for conversation works amazingly well with patient and experienced conversation partners, and false fluency in translation is one of the biggest problems.

112:

". . . showing that on the Internet everything eventually does devolve to goatsex." Goatwin's Law!

113:

Actually, Kennedy called himself "ein Berliner". Apparently the full name of the jelly doughnut is "Berliner Pfannkuchen", and most people shorten it to Berliner. However, it's not much worse than calling oneself a "Hamburger" in Hamburg.

From what I've read, everyone in Berlin at the time knew precisely what he meant, and nobody snickered.

114:

Hm.

Consider the linguistic connection between "pregnant" and "expectant".

Checking the OED, I see that there are two separate etymologies for two different adjectives, both spelled "pregnant", with notes saying that the two were and are often confused.


_________________________________________________

1: "[ < Middle French pregnant with child, pregnant (especially of an animal) (13th cent. in Old French; for earlier forms see note below; French prégnant; now arch.), (of a word) full of meaning (a1585) and its etymon classical Latin praegnant-, praegnāns with child, pregnant, swollen, (as noun) pregnant woman, in post-classical Latin also imaginative, inventive (6th cent.), compelling, cogent (1267, c1380 in British sources), variant (after the ending of present participles; compare -ANT suffix1) of praegnāt-, praegnās (Plautus), probably < prae- PRE- prefix + the stem of gnāscī (past participle gnātus) to be born, although the exact nature of the formation is not clear. Compare Italian pregnante with child (a1320), having an implicit meaning (a1606). Attested earliest in figurative use in sense A. 1, which is apparently not attested in other languages so early as in English; compare also PREGNANT adj.2, with which there is frequently semantic overlap in this sense."

2: [Apparently < Middle French, French pregnant, †preignant compelling, pressing (although this is apparently first attested considerably later: 1572), use as adjective of present participle of †prembre to press (end of the 11th cent. in a gloss in Rashi; also priembre (c1150)) < classical Latin premere to press (see PRESS v.1), unless originally a spec. sense development of PREGNANT adj.1 (probably subsequently reinforced by association with the French word).
The word has apparently been associated in sense with PREGNANT adj.1 from an early date, and the two words are freq. difficult to distinguish; compare the following examples, and also e.g. quot. 1582, or quots. c1680, 1753 at PREGNANT adj.1 1a:

_________________________________________________

Of course, that's in English, derived from French.

I wonder what, exactly, did Blücher say, and did he say it in Prussian German or in French?

Speaking of Blücher, I note that I first read of his alleged elephantine pregnancy in Jonathan Strange and Mr. Norrell, and was unsure (as with so much in that book) whether it was a genuine historical detail, or something Ms. Clarke added to a very odd depiction of the Napoleonic Wars.

Of course, no discussion of this matter would be complete here without noting that Charles Stross himself mined this same anecdote in Singularity Sky, giving Admiral Kurtz the recurrent delusion that he was in fact pregnant -- in the sense of literally with child -- with a baby elephant.

So ajay's note above (and linguistic clarification) was very useful.

I wonder if Kurtz's other recurrent delusion -- that his legs were made of glass -- also derives from Blücher, or from some other historical figure, and whether that, too, may have been based on a misunderstood idiom?

115:

Another fascinating book about translation (human and machine) and language is Douglas Hofstadter's "Le Ton Beau De Marot"

116:

being german i am, of course, finding this quite hilarious. thanks for the article btw ;)

117:

"Marilee" is a sufficiently unknown name that almost everybody gets it wrong the first few times. Most commonly is "Marilyn" or "Mary Lee," but also "Irene," Nadine," and "Natalie."

I'm having some dental work done and had two evals last week. I was pretty sure the women took my name down properly (and they did, as I looked at it later), but the other two women who called me to remind me of the appointments didn't pronounce it properly even while looking at my name in print. One said "Marilyn" and the other "Mariel."

118:

That URL only links here.

119:

Hah! For some New Zealanders, "beer", "bear" & "bare" are homophones. Ditto "steer", stair" & "stare".

120:

Tellingly, German Wikipedia knows nothing of the elephant.

However, Der Spiegel mentions it in 1976 - the key detail, though, is that they were quoting Roger Parkinson, a British historian. Parkinson apparently thought he had this delusion after the end of the war, and imagined the elephant was a punishment for his sins. Interestingly, he also thought it was connected with another detail of his character - like Winston Churchill, he suffered from depression, and risk-taking seems to have to relieved it (he also, like WSC, did a *lot* of gambling).

The story is interesting as a document of the time before 1914 when Britain and Germany were as close as any nations ever are, too.

By 1995 the same journal thought he also believed that the elephant had been fathered on him by a French soldier. They were quoting a British book, again.

Also, I've found Germans saying that it happened in 1807, 1809, 1813, and 1815, after his horse was killed and fell on him, after he was run down by his own cavalry at Ligny, and after he had a major attack of trachoma and nearly went blind.

However, I've not been able to find what the exact quote is or indeed what the Redewendung with the elephant is meant to be.

121:

I'm impressed that machine translation works as well as it does. I can read articles from Al Jazeera (the Arabic site, not the English edition) through Google Translate and the English emitted generally appears to tell logical, comprehensible stories. I originally tried this to see if the Arabic edition was more inflammatory than the (IMO) quite even handed English edition. The answer appeared to be "no." I am especially impressed at Google's translation between a pair of languages without shared ancestry.

Usually when I use machine translation I'm trying to extract information about chemistry or allied disciplines from older journal articles. Somewhat surprisingly to me, machine translation appears weaker here than in general news, perhaps because of the more specialized vocabulary and smaller available corpora. Beforehand I would have thought technical translation to be a strength of machines because of the greater profit opportunities that drove historical investment in MT for professional/technical use. I can sometimes comprehend passages better with multiple machine translations using different tools, and I have enough domain knowledge that I can often work with a mediocre translation even when I don't know the source language at all (e.g. Russian). The slowest part right now is correcting OCR errors in old source documents so that the translation program has a clean original.

When AltaVista first offered free machine translation to the masses, last millennium, it was rare that I could get anything even roughly intelligible out of it for any real-world need. So MT is still a gamble, but it seems to be getting better. As Peter Norvig says out in his famous AI textbook, mediocre machine learning on a data set of 100 million elements usually beats a "smarter" algorithm trained with a data set of only 1 million elements. Even if the core algorithms applied to statistically-informed translation don't see radical improvements, the increasing availability of training data from the Web and digitized books and news archives may drive continued improvement in MT quality.

122:

I searched books.google for [Blücher elephant], and found an interesting variation on the anecdote.


Recollections of society in France and England, Volume 1
By Lady Lucy Clementina Drummond Davies, 1872

It was at a soirée at the Palais Royal, given by the Duc and Duchesse d'Orléans, that I first saw Blücher, and I gazed with more than ordinary interest at his ungainly person. Many years afterwards I met a General von Gräven, who had fought at Waterloo under him, and who told me the following anecdote concerning him:--

 When the special messengers arrived to inform Blücher that Napoleon had escaped from Elba, and that his services would be immediately required in the field, they were astonished to find him literally running round and round a large room, the floor of which was covered with sawdust, and in which he had immured himself, under the delusion that he was an elephant. For the time it was feared that Blücher was hopelessly insane, or that he was so far suffering from delirium tremens that his active co-operation in the anticipated campaign would be impossible; but when the urgent news was brought him, he at once recovered himself, and proceeded to give his advice in a perfectly sound state of mind, the tone of which was thus, as by a sudden -shock, restored to him.

Thinking that one is an elephant is rather different than saying, possibly jocularly, that one is pregnant by one! And this does not appear to have been a misunderstanding of an idiomatic expression, at all. But perhaps the Lady Davies misunderstood? Or perhaps General von Gräven was an unreliable narrator? Note the anecdote that follows.

123:

Then again, here's another anecdote from books.google:


Notes of conversations with the Duke of Wellington, 1831-1851
By Philip Henry Stanhope Stanhope (Earl), Arthur Wellesley Wellington (Duke of) 1888

The date on this page is 1839, I note.

 The Duke of Cambridge told me that when Blücher* died he was under the delusion of his being pregnant with an elephant—exactly the delusion under which the Duke (as he has formerly mentioned to me) saw him labouring previously at Paris.

The asterisk in the paragraph is a footnote pointing to pg 119-120 in the same volume, which states:

 Poor Blücher went mad † for some time. He had shown off before some of our ladies at Paris, and got a fall from his horse and a blow on his head. This gave him all sorts of strange fancies. When I went to take leave of him, he positively told me that he was pregnant!

 And what do you think he said he was pregnant of ?— An elephant! And who do you think he said had produced it ?—A French soldier ! That is the human mind.

 " It was the last time I ever saw him. I went to him; he could hardly speak French, but he said (striking his side), Je sens un éléphant là ! "

 And what could you say to him ? "I could only say, Je vous assure que vous vous méprenez! and that he would soon get better. But he continued to express his surprise at there being a Frenchman in the case. Imaginez que moimoimoi ! un soldat Français ! I suppose he had dreamt it the night before.

 " He was a very fine fellow, and whenever there was any question of fighting, always ready and eager—if anything too eager."

The dagger in the paragraph is a footnote that points ahead to pg 176, linked to, and cited in the blockquote, above. These pages are dated 1838.

So... it was in French. But was it an idiom, or a joke, or a genuine delusion? Clearly it was understood, rightly or wrongly, as an actual delusion. Yet note that Blücher was speaking a language he did not know well, and the paragraph about being eager to fight...

*shrugs*

124:

Running around trumpeting and rolling in sawdust sounds like an excellent way to relieve tension and get the blood flowing. Though it might be a bit difficult to explain it to your staff when they catch you at it.

125:

As somebody who deals with the Shibboleth software, it seems worth pointing out the boring explanation that Shibboleth does, in fact, occasionally vomit a dragon at you.

Well, more of a griffin...

You configure your nice error page that matches the rest of your sites, but sometimes it ignores that and gives you the standard error template page, which has an orange griffin on it that I've heard described as a dragon as well as a few other animals...

126:

In my car 'air' and 'ear', discovered when telling my English landlady that I was going to an airshow.

127:

How should one write English if one's goal is to avoid mistranslation by simple-minded translation software?

Perhaps there are some simple rules that eliminate most problems.

128:

...some simple rules that eliminate most problems?

I guess you could pare it down to the most basic of vocabularies, using the simplest words, e.g. like "Globish" does.

You'd have to sacrifice the beauty of of the carefully chosen word to convey the exact shade of meaning, basically remove one of the strengths of language.

129:

An actor whose name escapes my cursory googling suggested that everyone should learn Pidgin (or perhaps Pijin blong Solomon, Tok Pisin or Norfuk might be good) as a lingua franca. Honed to perfection by many years of use.

Probably the easiest to learn and very expressive with a small vocabulary. How could you not love a language that uses 'bagarap' (bugger up) for 'broken' and 'bulsit' for 'nonsense'?

And now our Economics Correspondent:

"Bulsit blo' waitman go bagarap. Hu nao bae save helpim mifala weitim diswan rabis? Nem blo' mi Robert Peston. Lukim iu!"

130:

the "goat-time" would be more easily parsed if the hardware still used SCSI device chains...

131:

Over here in Korea, machine translation is used a lot, but mostly by students, lazy office workers, and so on. The menus I've seen have been less hilarious than the joke menu above, but pretty funny all the same, with "Jack and Cock" being a ubiquitous cocktail menu offering a few years ago. Obviously, that's an artifact of human error, not machine.

But the best story I have about machine translation is about this lecherous old German prof we had at a small uni where I was working. He received an email from a student (in English) in which every sentence ended with the phrase, "The bedspread." She was, according to him, a very pretty young lady, so he was excited when he read stuff like,

Dear Professor,

I would like to talk to you the bedspread. I am hard studying the bedspread. Maybe you can promise me for studying more the bedspread...

He was convinced that she was coming on to him, and quite delighted. Oh, how it took the wind from his sails when one of us (I think it was me, but I can't recall) caught on and pointed out that "the bedspread" is "yo" in Korean, which is a homophone for the little tag-word that gets tacked onto the ends of sentences to make them polite/moderately formal, ie. appropriate for addressing to commonplace authority figures, older people who aren't yet grannies and grandpas, or to people you don't know well. (Like, "Annyeong hasaeyo.")

I can't remember if it was Google or Babelfish, but whichever it was, I was surprised that they'd not yet caught on to the fact that "yo" was usually just a grammatical/politeness marker. Then again, how do you translate that? You'd need a whole matrix of somewhat-more-polite phrases that can substitute for less-formal equivalents, since we simply encode politeness or formality differently in English.

One thing that's neat and annoying at the same time about Korean, and makes it a pain to translate, is how often subject and/or object are often left out of sentences. The rule seems to be, when you can imply, do so, and this isn't as big a problem as you might imagine. When they're included, they're often conveniently tagged with a subject or object marker, but when they're left out, there's a lot of implication or guesswork. One should not necessarily marvel at this: while a lot of conversations are suitably coherent, you still hear a lot of people on the phone here shouting, "Eh?!?!?!?" or other equivalents of "Say what?" three or four times in a row, and it's not all because of phone reception.

The official explanation I've read from linguists and writing teachers is that in English, the burden of fundamental listener/reader comprehension is on the speaker/writer, but in Korean, it's on the listener/reader. I image the former is going to become more machine translatable sooner than the latter sort of language.

132:

Ahhh, thank you!

I know a number of Korean-Americans, and on occasion, I tease them about using vague pronouns rather than subjects and/or objects in their sentences.

Good to know this issue has linguistic roots, and isn't them just being lazy. "Could you get the thing for me?" just took on a whole new meaning.

133:

Another anecdote from Korea:

Machine translation from English to Korean, or from Korean to English, is a madness, but machine translation between Japanese and Korean works very well.

It works so well that there is a bulletin board where both Korean and Japanese can post, and postings are machine translated to the reader's language. I have seen some high quality discussion done this way.

134:

Yeah, peeing yourself to keep warm works when you're already wet and don't have to worry about the subsequent evaporation cooling.

135:

The Wikipedia page on Kennedy's famous phrase ("Ich bin ein Berliner") is pretty good, explaining that while a genuine resident of Berlin would say, "Ich bin Berliner" (a plain statement of fact), someone wanting to figuratively identify themselves as a resident of Berlin (Kennedy's intent) would have to say, "Ich bin ein Berliner." (As bellinghman pointed out earlier.)

The pastry in question is sometimes called a "Berliner" in other parts of Germany, but not in Berlin itself or the area around it, so there wouldn't have been an ambiguity in any case.

136:
How should one write English if one's goal is to avoid mistranslation by simple-minded translation software?

Very carefully. This is actually an important, and under-recognised*, task of copy-editors working on scientific and medical journals: the removal of all ambiguity to help avoid unnecessary mistakes resulting from machine translation. [Q:"Why did you change sentence x to sentence y, which means exactly the same thing?" A:"Because if you translate x sloppily you stand a good chance of getting z, and the patient gets a double dose of $DRUG."]

*including under-recognition of its importance by many of the people who ought to be doing it, alas, as well as those who ought to be paying for it.

137:

Mary/Merry/Marry -- Some Americans pronounce all three alike, some all differently. And some pronounce two of them alike.

Some Americans and Canadians pronounce "Don" and "Dawn" alike.

My Unhelpful Guide to US Pronunciation:
a as in pan; aa as in Aaron; au as in caught. e as in pen. i as in pin. l as in calm. o as in cot; ou as in route. r as in car. s as in greasy.

138:

Salvadoran eatery in Minneapolis with a bilingual menu: "carne asada" "beef with stake." I believe a literal translation would be "roasted meat," so it's unlikely to be a machine translation.

A supermarket near me has aisle signs in English and Spanish. On one of the Spanish signs: "habas de refried."

139:

No, it´s even more complicated. The jelly donut is called "Berliner" only outside of Berlin. In Berlin the same thing is called "Pfannkuchen". "Pfannkuchen" on the other hand means pancake everywhere in germany except for Berlin. This is called "Eierkuchen" there. So all the berlin people cheering at Kennedy understood perfectly well what he meant.

140:

Ken Campbell, that'd have been. His Wol Wantok (one world language) project got as far as a pidgin production of Macbeth....

141:

@135:

How should one write English if one's goal is to avoid mistranslation by simple-minded translation software?

Very carefully. This is actually an important, and under-recognised*, task of copy-editors working on scientific and medical journals: the removal of all ambiguity to help avoid unnecessary mistakes resulting from machine translation.

That's hard enough to do with human interpreters. I had a counting problem on one of my quizzes about that asked how many three-letter four-digit passwords are there if all three-letter possibilities are allowed except for double-letters. I got a number of answers depending on whether or not 'bab' counted as a double-letter but not 'bba', or whether or not 'bbb' counted as a double-letter. I imagine anyone converting customer specs into software has encountered the same problems, doubled and cubed :-)

142:

Thank you, how could I have forgotten such a genius? I was very sad to find out he had died 2 years ago, I must have missed the News for a while.

143:

I think it is clear to most who use MT that it is not equivalent to human translation because, yes a computer is merely producing a data transformation. So it should be used with care.

However, it is going to be increasingly useful in high volume, repetitive kinds of projects and when human translation is just not an option.

Here are some links that go into this further:

http://kv-emptypages.blogspot.com/2010/01/why-machine-translation-matters.html

http://kv-emptypages.blogspot.com/2010/01/why-machine-translation-matters-for.html

144:

Another case of funny mistranslation, the kind that happens when you look of a word in a dictionary and don't read the whole entry:

Camping in the Tuscan coast, some years ago: placard in English in front of the sinks outside the restrooms, the ones to be used only for washing themselves, shaving and such, stating "Washing dishes and alimonies here is forbidden". Being a native Italian speaker, it was easy to understand how the mistake took place: the Italian word "alimenti" means both "foodstuff" (like vegetables to be washed) and "alimony" (the money paid to a former spouse after divorce.)

A worse case happened in a supermarket near my home in Rome, where pineapples were labeled "Gold RAPE" instead of "RIPE"... and it's a place where often you hear customers speaking English. I still wonder what an English-speaking female customer thought if she read that label...

145:
"Traduttore, traitore is an Italian saying that fortunately doesn't need translation (Into English at least)."

It shouldn't, yet it does: the original Italian is "traduttore, traditore", and "traduttore, traitore" is some weird mashup of the original with its word-for-word English translation (translator, traitor).

"If we're talking about jokes and puns, then you're out of luck, it's just not going to carry."

Which is why the best translators deserve recognition as creative writers, as opposed to the data-conversion automatons they're often mistaken for.

146:

#121 - Ref English Al-jazeera, I'd agree that they almost seem to take pains to report the facts, and avoid inflamatory laguage.

Of course, this can be a positive thing. I remember one story about an Airbus (A320 I think) crashing on an internal Iranian flight. AJ just reported the facts (airline, type, route, estimated death toll), whilst the BBC ran almost word for word the same factual paragraphs, and then went off on a rant about the poor safety record of Ilyushin and Tupolev airliners! Er, why? It's scarely relevant to the story!!

147:

It's not just the words, it's the concepts. I've been to a couple of lectures by the late John Erickson; brilliant guy, expert on Soviet strategic thought, and organiser of one of the Cold War back-channels called the "Edinburgh Conversations" (now there's a basis for a story by Charlie). His point was that equivalent words carried different implications.

Excuse spelling and a fading memory... His first example was the Russian "Bezopasnost" - the B in KGB - and its root which was "the absence of threat". There was no real equivalent to the word "security" in Russian.

His second example was "violence", as used in Serbo-Croat. In the UK, you could use the word to describe an unarmed fight between a few young males outside a nightclub; in the Balkans, the word had rather more severe implications (think Former Yugoslavia in the 1990s; you get the idea).

The point was that Yugoslavia lost ten per cent of its entire population in WW2 (civilian, not just military); as did the Soviet Union over a slightly wider period. Both countries were invaded multiple times over the last two centuries. That has a cultural impact.

I'm also surprised that no-one has mentioned Kruschev's famous "we will bury you" quote; apparently a better translation is "we will attend your funeral", i.e. that the Marxist-Leninists would outlast Capitalism, not that they would murder it.

148:

My understanding is that the form of words in Farsi that is usually translated into English as "death to ..." has a much more casual, general meaning, and is used where English-speakers would use "down with ..."

<sarcasm>I have no idea why western media outlets might pick "death to America!" as a translation instead of "down with America!" -- the two are, after all, utterly indistinguishable sentiments.</sarcasm>

149:

My first visit, and this post leaves me feeling utterly inadequate. I've just erased my entire blogsite and left a "So sorry" sign hanging there in its place. Damn. Great site in general. Cheers

150:

An example from Russian:
There isn't any word for "gun". There are words for rifle,pistol,machine gun etc, but not for gun. The closest you can come is ognestrelnoye oruzheye which means"flame firing weapon" but it doesn't quite have the same ring.

So where in the US someone would shout "He's got a gun!" in Russia would shout "He's armed!".

151:

I'd have thought it's also worth adding that the religion of Jesus was Judaism, so even where the language in use was Aramaic the more relevant tongue for the theology would be Hebrew. And linguistically at least that God is male (though if I understand correctly the question of God's actual gender has been a matter of some considerable - though inevitably empty - debate). But anyway...

152:

Among the dozens of way to refer to god in Hebrew many are feminine, Yah & Shekhinah for example (most words ending in -ah and -ot for plural are feminine), though the more widely known, Adonai & Eloheinu are male. So, the presumed deity of Judaism doesn't necessarily have a gender, except for the ultra-orthodox. Many modern rabbis would point to Genesis' creating man in our image as a sign of genderlessness, though the orthodox would say it refers to angels.

Meanwhile a new survey shows that American christians are pretty ignorant of their own religion as well as others, and that atheists and agnostics know more. Two things I've always thought. After all why do you think they became atheists?

Specials

About this Entry

This page contains a single entry by Charlie Stross published on September 23, 2010 1:31 AM.

Books I will not write, #1 (of an ongoing series) was the previous entry in this blog.

Service interruption is the next entry in this blog.

Find recent content on the main index or look in the archives to find all content.

Search this blog

Propaganda