One of the most potent stories about language is the Tower of Babel. It occurs in Genesis 11. In the account, all humanity began by speaking a single language, even after exile from the Garden of Eden, and the tower is not exactly a mark of pride, nor is the curse of multiple languages exactly a scourge. Instead, the Tower is the natural consequence of the migrating people settling into a city, and "And the Lord said, Behold, the people is one, and they have all one language; and this they begin to do: and now nothing will be restrained from them, which they have imagined to do" (Gen. 11:6). The confusion of the languages is to keep humans frustrated and wandering, to keep them from achieving their goal of "(making) a name for ourselves."
Pieter Breugel the Elder's "Little Tower of Babel"
Babel came to be seen as a second Fall, or at least a second exile, when man's fundamental punishment -- being denied communion with God -- was amplified through being denied communion in society. However, until the twentieth century linguists (excepting people like Ludovico Vico) believed that words derived from things, or things were manifestations of ideas (and so were words), and the division of languages and loss of the original language meant loss of magical and spiritual power.
I recommend Umberto Eco's The Search for the Perfect Language for the medieval to enlightenment European invention of linguistics as a side effect of the quest for the pre-Babel tongue.
For every person mounting Pegasus to overtop Babel, there have been five standing on the plains below having a giggle at our human noise and ten, at least, offering to translate for a fee. John Gay has a very funny letter to Mrs. Howard complaining about how he only knew French poetry; thus, to say that they went hunting, he would have to write that they had declared war against the feathered inhabitants of the air.
Americans have been accused of monoglot ignorance for a very long time, but American humorists have avenged themselves by making fun of the affectation of other languages. Even better, American authors have wondered at translations of idiomatic Americanisms. The new nation generated new environments, and the adaptations immigrants made to each other and the land made for colorful habits of voice.
Mark Twain's "The Celebrated Jumping Frog of Calaveras County" was originally published in 1867, and it was a hit. In 1875, Twain published "The Notorious Jumping Frog of Calaveras County," which is one of the funnier things you can read. It is an account of Twain reading his own story translated into French and then translating the French back into English. Click this link and enjoy.
"'Rev. Leonidas W. H'm, Reverend Le--well, there was a feller here, once by the name of Jim Smiley, in the winter of '49 --or maybe it was the spring of '50--I don't recollect exactly, somehow, though what makes me think it was one or the other is because I remember the big flume warn't finished when he first come to the camp; but anyway, he was the curiousest man about always betting on anything that turned up you ever see, if he could get anybody to bet on the other side; and if he couldn't he'd change sides.'" Becomes
"It there was one time here an individual known under the name of Jim Smiley; it was in the winter of '89, possibly well at the spring of '50, I no me recollect not exactly. This which me makes to believe that it was the one or the other, it is that I shall remember that the grand flume is not achieved when he arrives at the camp for the first time, but of all sides he was the man the most fond of to bet which one have seen, betting upon all that which is presented, when he could find an adversary; and when he not of it could not, he passed to the side opposed."
Inspired by Twain's retranslation of a translation, there was a "Babelfish" game that people would play to get comic results in the early years of the world-wide web. ("Babelfish" comes from the universal translating parasite in
The Hitchhiker's Guide to the Galaxy. The Infocom game by the same name was legendary for the absurd lengths one had to go to in order to get the Babelfish.) I remember seeing the game played in the early naughts, but it doesn't work any more.
There are reliable ways of "breaking" machine translation, but Google Translate has killed the Babelfish game. . . sort of. Follow me, below, and I will show you two attempts to break Google Translate, the Googlefish experiment, and then an example of Google Translate failing on something easy -- along with a potential explanation for why.
The reason Twain's story ended up in questionable French and worse English is in the denotation ("dictionary" meaning) and connotation ("associated" meaning, or meanings from symbol and cultural convention) split, and the proliferation of syntactic units that are unique to given speech communities known as idioms. When Gregor Samsa is killed by an apple lodging in his carapace in Franz Kafka's The Metamorphosis, young students will say that the apple "is a symbol." Strictly speaking, this is not correct. "Apple" was a symbol once, but by the 1920's it had been a symbol for so long "apple" carries symbolic associations without any symbolic discourse. Thus, "fruit of sin" and "knowledge of good and evil" belong to associations of the apple, and Kafka does not have to make the apple into a symbol -- it already has the symbolism in it as a set of connotations.
One reason poetry is very difficult for second language readers is that it traffics in connotation. Modernist poetry, in particular, traffics in connotation and syntactic displacement to foreground terms.
In Ezra Pound's "Middle Aged," we get:
"'Tis but a vague, invarious delight.
As gold that rains about some buried king.
As the fine flakes,
When tourists frolicking
Stamp on his roof or in the glazing light
Try photographs, wolf down their ale and cakes
And start to inspect some further pyramid;
As the fine dust, in the hid cell beneath
Their transitory step and merriment,
Drifts through the air, and the sarcophagus
Gains yet another crust
Of useless riches for the occupant,"
where the "As" repeats in an unusual place to remind us that we have one very, very long comparison, and we have no object yet. English delivers "as, so" in pairs, and we can set up compound objects on either side of the analogy, but it feels unnatural, suspenseful, or confusing. This "as" does not meet its "so" until more than half-way through the poem, and the "so" reveals that the poet is now admired and unable to love; the suspension of the syntax mirrors the message of power without potency. It demonstrates, for English speakers, that "middle age" is "waiting."
Another element of language that isn't in the dictionary is idiom. Native speakers are generally unaware of their idioms. George Carlin was good at pointing out the irrationality of English idioms. Why do we get "on" a bus, he wondered, but "in" a car? Shouldn't we get "in" an airplane, rather than "on" one? An obvious case of idiom is the use of "the." There is some logic for "the" over "a," but when to require an article ultimately comes down to speech community. On the other hand (and that is an idiom), French uses "faire" (to make) in a number of idiomatic constructions. A literal translation, "We will make a party," sounds silly, because the idiom is not present in English, but a literal translation of "I just saw the doctor for my condition" into another language would be "I went to view the doctor as part of my habit of being."
1. Babelfishing Key 1: One way to break a machine translation is to use idioms. In English, this means prepositional phrases.
2. Babelfishing Key 2: Stretch syntax beyond an algorithm's capacity to resolve the object/referent relationship.
3. Babelfishing Key 3: Push every button on the elevator in the Tower of Babel: Translate from one language family to another to another before returning.
The last of these keys is obvious. The more one understands of language families, the easier it is to make a translation difficult. The reason is not because the languages are inherently more or less alien to one another. "Language" shares "human," after all. Rather, languages in different families are historically remote from one another. The farther one language is from another in "family," the farther that language is in time from being a shared tongue, the farther the speakers are in time from being one community. Therefore, separate language families will reflect as much divergence as is reasonably possible between material cultures, all else being equal, and this means that the expectations and habits embedded in language will be distant, too.
English is an analytical language. Latin and ancient Greek are synthetic languages. Word order and word placement determines meaning, rather than word ending, and that means that we multiply prepositional phrases and pronouns. Hence it is that when we construct very elaborate sentences that employ many relative clauses without the use of simple compounds, or embed negatives and qualifiers, such as I am attempting to do at present, stringing out the meaning and asking the reader to hold onto the primary object while admitting qualifications, then a computer program which is looking at 'verb, helping verb, main verb' can get tangled, and this is particularly true if one merely implies a subject or switches between various voices and moods of verb by, should such an occasion arise, the use of a subjunctive or vocative.
Therefore, I set out, after many years of giving it a rest, to break Google Translate.
I had two pieces of objectionable English. The first used metaphor and idiom (but not slang -- slang is easy to machine translate), and the second used as arachnid a sentence structure as I thought might be found in real life. I then took the passages from English to French, French to German, German to Russian, Russian to Spanish, Spanish to English. My goal was to move within Indo-European languages, but to hop back and forth between branches and limbs, between conserved and non-conserved languages.
What was surprising was that . . . well, see for yourself.
Original text: English, spider sentences:
"Since any victory in Afghanistan, even a putative one, is out of reach for U.S. and NATO troops that have spent more than a decade pursuing counter-insurgency tactics instead of a program oriented toward taking and holding territory, the real question for the publics of the nations whose troops are represented must be not whether or not there has been a "win," but what sort of political, military, and cultural entity they will be willing to endure in the political boundaries of Afghanistan. The answer to this question, in turn, will undoubtedly hinge upon what the men with guns and dollars want to do with, and about, Hamid Karzai. If the military powers seek only stability -- a government under any head, under any ideology, under any method -- that can speak with one voice and enact what it says in all areas of the territory, then Taliban is better than Karzai. If these powers seek human rights reflective of the charter of the United Nations, then nothing in place now resembles what they seek. If the powers want a political state -- meaning a place whose people think of themselves as being Afghanistani, who think of duty to the government of the nation immediately after duty to God and family, a place where people think that any person born in the nation is as good as any other person born in any other place -- then the tribal government of Karzai is the opposite of satisfactory. Only by knowing the answer to "what kind of place does Afghanistan need to be so that we never feel the need to interfere with it again" can we fix a goal and work backward to methods by which we might achieve that goal. If the United States and other members of NATO have made such a policy decision, they seem to have kept it secret."
Resulted in:
"Since the victory in Afghanistan also alleged that the scope of U.S. and NATO troops, who have spent more than ten years pursued counterinsurgency tactics and decision-oriented program is not holding territory, if the real question for the countries represented audience whose troops, not whether it is "victory" but what is the political, military and cultural community will be willing to bear the political boundaries of Afghanistan. The answer to this question is no doubt that the weapons and dollars are based men to make and Hamid Karzai. If only military powers seek stability of government within the same chapter , under the ideology of a process - to speak with one voice and take what he says can be used in all areas of the country, while the Taliban better than Karzai. When these forces that seek to reflect the Charter of Human Rights , there is nothing of Nations in place now seem to want . If the authorities want the political state - the only place it as the Afghan people who think that immediately after the service of God and family , a place where people believe that the government means that the nation was born everyone in the nation , as well as any other person born elsewhere - while the tribal government Karzai is the opposite of good. This is just to know the answer to the question "What is this place in Afghanistan, did not feel the need to intervene again and again, " We can set a goal and work backwards to the methods to achieve this goal. If the United States and other NATO members have taken a decision in this type of policy, seems to have kept secret."
Funny? I don't think so. In fact, it seems more that there are lacunae than laugh lines coming out of the cracks in the translation. After that many passes, the information should have vanished altogether, and yet, even though the end result doesn't make sense, the original wasn't all that coherent to begin with. Google Translate should have died on that passage, and it merely caught a cold.
Let's try the other extreme -- something from the realm of catty Hollywood reporting.
Original text: English, idiomatic
"Ever since bomb that was the re-cut "Star Wars," fans have quaked as studios baked, and stinker after stinker has come out in a gold plated coffin. There are "reboots," and there are all-new 3-D versions that contain no new footage. Moviegoers get R-rated cynicism long before they can see an R-rating in the theater. What sort of innocence would it take to go see a new "Wizard from Oz" that can't include any material from the L. Frank Baum books? How old and young is the audience to remember the black and white "Lone Ranger" show with its single big rock and want to see it as a movie with Johnny Depp? What Adderal junkie is going to find drama in a movie about a Hasbro toy jive talking? If Hollywood were anything but an MBA, if it were a place or a mindset for film making, then it would have committed suicide shooting itself with an unloaded gun.
Result:
Since the pump recut fans of "Star Wars" shook the study of the bakery, and after turnip turnips came in a golden coffin. There is a "reset", and there are all new version in 3 - D, containing new images. Viewers cynicism R-rated, long before they see the R-rated drama. What innocence to go see the new "Wizard of Oz" can not contain any material from the books of L. Frank Baum? As young and old is a black & white audience, "The Lone Ranger" will remember the show with her a large stone and want to look like a movie with Johnny Depp? What Adderall addict find in a movie theater on a Hasbro toy Jive Talk? If Hollywood was not an MBA if it was a place or state of mind to make movies, it would be suicide to shoot themselves to be his weapon.
Ok, I admit that that's pretty good. I don't know where the "pump" came from. "Depuis bombe," "Da die Bombe," Poskol'ku bomba," "Dado que la bomba," and then "Since the pump" shows up from out of Google Translate's magical fairy dust. I promise that I did not study the bakery, either.
Still in all, what's amazing is how not entirely awful the translations are. The gist of the second one is lost amid a flurry of word play (which was cheating on my part), but Google Translate is actually performing Atlean tasks (there's a word for you, folks).
Is this success due to the genius of the program and the assured superiority of Google? Well, no. The success of these translations proves how bad machine translation is, because their successes prove how much the machines are using human beings. Want me to prove it?
The man you know as Flann O'Brien you might also know as Brian O'Nolan, although he was also Brian Nolan. He was also Myles na gCopaleen, and he was a genius. An Irish writer in the generation after Joyce, he had little choice but to grapple with the great man's shadow. He got into Trinity claiming to have done an interview with James Joyce's father, but Richard Ellmann told a class I had with him that there was no evidence that John Joyce ever gave an interview to anybody. (Opinion now favors the idea that Nolan did get the interview, but that's opinion for you.)
For decades, O'Nolan, as he called himself when he was home, wrote a column for The Irish Times as Myles na gCopaleen, and the collections of the columns are essentials for any civilized house. The writings would cover any topic, and they would veer into Latin without warning. In The Best of Myles, we get a passage called "The District Courts," where "The Da" is arrested and proves a difficult defendant, not least for speaking in Latin.
"The Sergeant said that it was now necessary to charge defendant with loitering, trespass and burglary.
Defendant: Loquitur Agamemnon.
Continuing, the Sergeant said that defendant had been treated with great latitude by the court the previous day and allowed out on bail. When released from Mountjoy, however, he refused to quit the premises and had to be ejected.
Justice: How?
Sergeant: With a hose.
1 Defendant: Pro di immortales! Quid?
The Sergeant asked the justice to hear a warder as to defendant's condition. In the court of his evidence, the warder said: "This man is infested be hoppers."
Justice: Hoppers? What is a hopper?
Defendant: (excitedly to Justice): 2 Habeo igitur quod ex eum quaesisti, quod esset 'hopper.' Non est hopper, non sunt hoppers, ut dixit; quod ego verbum agnovi. 3 Sunt fratres minimi mei (de quibus lex non curat), numerus quorum generis late et varie diffusus est. 4 Sunt amici, sunt fideles milites nostri qui neque nocentes sunt nec natura improbi nec furiosi nec malis domesticis impedi! (Pointing toward warder.) 5 Nunquam putavi -- vera dicam! -- tantum esse in homine sceleris, audaciae, crudelitatis!
The Sergeant mentioned that defendant was continually conversing in this strain, leaving all as wise as if he were speaking double dutch.
That is one of the funnier things I've read. If, however, you need help, Latin is one of the languages that Google Translate can handle. Being an analytical language, where endings convey syntactic position, word order has no influence, and the language is almost easy as a computer program for decoding. Heck, if there were a language designed for a machine to translate, it's Latin. All the idioms are known, there isn't slang, the dictionaries are voluminous, and the rules are well etched.
Here is what Google Translate did with it, though:
"The Sergeant said that it was now necessary to charge defendant with loitering, trespass and burglary.
Defendant: Speaking of Agamemnon.
Continuing, the Sergeant said that defendant had been treated with great latitude by the court the previous day and allowed out on bail. When released from Mountjoy, however, he refused to quit the premises and had to be ejected.
Justice: How?
Sergeant: With a hose.
1 Defendant: Immortal gods! What?
The Sergeant asked the justice to hear a warder as to defendant's condition. In the court of his evidence, the warder said: "This man is infested be hoppers."
Justice: Hoppers? What is a hopper?
Defendant: (excitedly to Justice): 2 I have therefore that it is from him, beseech thee, that it would be 'Hopper.' It is not Hopper, they are not Hooper, As she said this, which I have the word I recognized him. There are the least of my brothers (of which the law does not care), the number of which kind of wide, and is scattered abroad in various ways. There are friends, are the faithful our soldiers who have neither criminals nor madmen, nor men that are not wicked by nature, domestic baggage! (Pointing toward Ward.) I never thought - I'll tell the truth! - Only be in a man of the crime, of daring, of cruelty!
The Sergeant mentioned that defendant was continually conversing in this strain, leaving all as wise as if he were speaking double dutch.
Excuse me? If you didn't get the jokes in Latin, you sure won't get them in this translation. "Only be in a man of the crime?" Would a seventh grade student make such a mistake? Google Translate, which had been capable of juggling four modern languages at the same time, seems to have had a slight malfunction. It doesn't recognize the pun, of course, on the "non lex curat," but it also breaks its noggin on the implied pronouns of Latin, doesn't recognize allusions, and makes a fool of itself in general.
Why would Google Translate be so good that it could survive a torture test with modern languages and so bad that it falls apart on elementary Latin? The answer is that Google scours human translations. Its algorithm learns by checking the web against human translators, and there aren't so many of those with Latin.
Oh, and for my very doubtful translation of the Latin, here it is:
1: Out, by the gods! Where" ("Quo" can mean "where" in poetry.)
2. Immortal gods! What?
3: "I have it, and therefore it came from him, I beseech you, 'it' being the "Hopper." It is not a Hopper"; they are not "Hoppers." Thus he calls them, but this is not the word by which I know them."
4: There are the least of my brothers (for which the law offers no remedy), the number of which is scattered abroad in various ways and of a wide race.
5: There are my friends, they are my faithful who comprise neither criminals nor madmen, nor men that are wicked by nature, you domestic baggage! (Pointing toward warder.)
6: I never thought - I swear truly! - of being a man of crime, daring, or cruelty!
The old man calling his fleas his "fratres minimi mei" cracks me up, and getting in the "de
minimis lex non curat" (there are things too minor for the law to remedy) is brilliant.
So, what does this all mean? We end where we began, with Babel. The sin of Babel was pride. Google Translate has gotten extremely good by silently scrubbing the work of humans and blending it with the speed of artificial intelligence. It has gotten so good, in fact, that we can be tempted to see translation as an invisible act, as something automatic and, most dangerously of all, lossless. We can forget the opacity of language, forget that in language the container is the content. Therefore, when a banner at the top of Google Translate invites you to try Chrome with automatic web page translating turned on, it is offering to sell you a penthouse suite at the top of Babel.
When we push the button for the basement and make Translate show us its own work, without cribbing from humans, we can hear the familiar strain of strain again, see the familiar need for human agents, note the role of a translator as a person who speaks multiple cultures that just happen to use separate words from one another.