Automated translation is one of those ideas that collectively constitute the philosopher's stone of procedural artificial intelligence; we can translate our own formal constructed programming languages mechanistically (the tools to do so are called compilers) so why can't we translate human languages automatically?
Well, we can, after a fashion. But translation turns out to be very hard, even for humans, and the consequences of a mistake can range from hilarious to horrifying ...
First, machine translation. The following message was posted to a discussion group for users of Shibboleth, a federated internet single-sign-on system. As you may be able to puzzle out, the perplexed (Japanese) user assumed Google Translate would help their technical support request:
This is question, engish is faulty therefore the right excused is requested. Thank google to translate to help. SORRY!!!!!
At often, the goat-time install a error is vomit. To how many times like the wind, a pole, and the dragon? Install 2,3 repeat, spank, vomit blows
14:14:01.869 - INFO [edu.internet2.middleware.shibboleth.common.config.profile.JSPErrorHandlerBeanDefinitionParser:45] - Parsing configuration for JSP error handler.
Not precise the vomit but with aspect similar, is vomited concealed in fold of goat-time lumber? goat-time see like the wind, pole, and dragon? This insult to father's stones? JSP error handler with wind, pole, dragon with intercourse to goat-time? Or chance lack of skill with a goat-time?
Please apologize for your stupidity. There are a many thank you
I'm pretty sure that what's going on here is that, as with English, Japanese computer specialists have re-purposed the meaning of some older words; after all, if virtual, disk, memory, and printer all mean something very different in 21st century English from their long-established meanings in the late 19th century, why shouldn't goat-time, wind, pole and dragon have mutated in contemporary technical Japanese? Unfortunately Google Translate is still living in the past.
But it's not just in technical sub-dialects that problems arise.
Here's a menu from a Portuguese restaurant. It's bilingual, translated into something resembling English — either by an even stupider machine translator, or by a non-English-speaker with an English/Portuguese dictionary. Foodstuffs commonly have idiosyncratic local names, and not just in Portuguese (I'm thinking of having Toad in the Hole for lunch, followed by Spotted Dick). So a simple dictionary-driven word subsistution approach to translation will have unfortunate consequences.
The menu is unfortunately a JPEG of a photograph, and I can't be bothered to copy type the whole lot, but I suggest you click through and read it while I wait for my "Nose broth with ass entries" followed by "Bastard's lamprey rice done in the manner of between rivers" with "Rotten cake made out of wood" for dessert. (I think that last one is meant to be Black Forest Gateau.)
After all, who can resist a menu with starters like:
Black paw hamAnd finally, even humans get it wrong — with chilling consequences. From Clive Stafford-Smith's book about Guantanamo and the detainees, Bad Men: Guantanamo and the Secret Prisons (extract shamelessly nicked from another source on the internet):
Little cheeses from Big Olive Oil Land
Clams in the manner of the ever-fighting duck
Little bitches from the Algarve
Do you understand?
Yusuf was technically a citizen of Chad, but he was born in Medina, in Saudi Arabia, and had lived his whole life there. The US military could have got his birth certificate from their Saudi allies with a telephone call. It was a little more difficult for me, but I got it and it showed he was born in November 1986. After four years of intensive interrogation, far from securing a rich harvest of "enormously valuable intelligence" (General Geoffrey Miller's words), the military could not even work out Yusuf's age, just as they could not spell Binyam Mohamed's name after years of torture in Morocco and Afghanistan.
[...] Yusuf explained how his [initial] interrogation quickly descended into farce. Early in his captivity the US agents questioned him with the assistance of a translator who used a dialect of Arabic in which the word zalat means money; in Yusuf's Saudi dialect it means salad, or tomato. Yusuf reconstructed the interrogation as best he could remember it.
"When you left Saudi Arabia for Pakistan, what zalat did you take with you?" demanded the translator, suspecting that the money must have come from al-Qaeda sources.
"What? I didn't have any zalat when I went to Pakistan." The 14 year-old was confused. He had been through a difficult time since his seizure by the Pakistanis. He was prepared for any trick the Americans might spring on him, but all this talk about tomatoes was beyond him.
"Of course you had zalat. What do you take me for? An idiot!" The translator flared into hostility.
"I didn't! Why would I?"
"Of course you did. Now tell me, where did you get the zalat you took with you?"
"I didn't take any zalat with me. I didn't!"
"Aha! So you got zalat in Pakistan when you arrived?"
"Well, yes, what zalat I wanted, I could get there. That's natural." Yusuf was trying to be conciliatory, though the conversation continued along this strange line.
The translator seemed suddenly excited. "Where could you get zalat in Pakistan, then? I want a list of places. Details. Descriptions, places. Details."
Yusuf wanted to keep him in a good humour. Trying to remember Karachi, he began to discuss places in the market where one might buy salad. With each description of a market stall the translator turned to the American interrogator, who took careful notes.
That evening Yusuf was returned to the cage where he was being held. He was a very muddled adolescent. He talked through his bizarre interrogation with other prisoners, turning over each of his recollections.
Finally one of the older prisoners solved the puzzle: "You were talking about tomatoes. They were talking about money. That's what it must have been."
Arabic isn't a monolithic language; while its written form is somewhat standardized, it is split into many regional dialects, some of which are mutually unintelligible, effectively different languages.
The unfortunate 14 year old Yusuf got himself sold to the US intelligence services by a Pakistani entrepreneur who saw the "reward paid for Al Qaida members" ads, read them as MAKE MONEY FAST, and grabbed the first passing stranger. And the US military, being extremely short on skilled Arabic translators, assigned an interrogator who spoke the wrong dialect and didn't know enough about Arabic to know how wrong they were.
(Cue visions of drones carrying out missile attacks on market stalls unfortunate enough to have sold Yusuf his tomatoes. Complete with lethal collateral damage ...)