Editorial Snapshot: To err is computers: Why machine translation cannot catch up to human translators

- K.N., Senior Translator

While machine translation may sometimes be able to provide a gist translation, it almost invariably generates low-quality translations with major flaws in terms of accuracy and readability. Why is it that, in an age in which cars can drive themselves, robots can perform neurosurgery, and computer software can beat shogi masters, and after many decades of research and development, computers still fail to capably render text between different languages?

A simple and commonly cited reason is that words can have multiple meanings, and computers cannot make the appropriate choice based on the context. Many words have multiple meanings and usages—as many as 179—and their meaning depends on the context. The word “pitch” for example has different meanings in music, baseball, mountaineering, and business, as well as additional meanings in British English. Even if computers could identify the field, they would still be powerless to select the correct term in cases of homographs used within the same field, such as 生物 (may mean “organism”, “uncooked food”, or “biology”) and バリウム (“barium” or “Valium”). The same applies to phrases and sentences; “chemistry test” has entirely different meanings in classroom settings and in the film industry, and the meaning of “He turned two.” depends on whether said subject is a toddler or an infielder. It is thus necessary to judge the context not only on a sentence level, but also on a broader scale far beyond the reach of machine translation. Moreover, many terms diverge into multiple concepts in other languages; for example, in Spanish “fish” splits into “pez” and “pescado” depending on the state of the fish, and “understand” translates into either “entender” or “comprender” depending on the situation. Still others have no true equivalents, such as やるせない and ~してしまった; translation of such expressions requires human ingenuity. These are just a few of a countless number of examples in which no one-to-one correspondence exists between different languages. The meaning of words and phrases always depends on the context. Because machine translation is not capable of accurately judging context, it inevitably produces erroneous translations.

Second, in addition to the challenge of translating the words that appear in the text, it is also necessary to determine and reflect information that is not explicit. For instance, since subjects are often omitted in Japanese sentences but are necessary in English, a Japanese-to-English translator must deduce the subject from the context. Similarly, because Japanese lacks articles and generally has no singular and plural forms, these details must also be added when translating into English, by examining the context. The choice of words in the translation also depends on inferred elements such as flow, tone, placement of emphasis, subtle nuances, and interplay between words, as well as background information such as stylistic needs, purpose, and target readership. Consideration of the above—which is essential for fully understanding and translating the meaning of the text—entails reading between the lines, and may also require research skills, specialist knowledge, and cultural fluency. These are uniquely human skills that computers do not possess. If the meaning of a given text were the proverbial iceberg, individual words would merely constitute its superficial tip; much of the meaning lies embedded beneath, and it takes human brainpower to understand and express the meaning in its entirety.

Third, because machine translation relies too heavily on dictionary definitions and produces overly literal translations, it struggles with figurative language, euphemisms, false friends, jargon, colloquialisms, neologisms, and various other facets of language.

Fourth, because machine translation possesses neither good grammar nor the ability to properly rearrange text, it often results in sentences that are so grammatically unsound and unnatural as to be incomprehensible, a weakness that is amplified in the case of language pairs with significant grammatical differences such as Japanese and English.

Finally, what one says in the source text may not necessarily match what one means. The source text may contain errors such as typographical errors and mistakes in punctuation, grammar, and logic. Whereas these issues can be picked up by a trained eye, computers cannot detect, much less correct, such errors, and thus reflect them in the translation. There may also be ambiguities in the text, particularly in long, convoluted sentences that are difficult to parse (which are common in Japanese writing). It takes an intimate knowledge of the source language and subject matter to untangle and determine the meaning of such sentences.

Therefore, translation cannot be performed simply by substituting each word or phrase with the equivalent expression in the target language. Translation is not a process of decoding of words, but one of complex rendering of the meaning (both explicit and implicit) based on the context. It thus requires the deep insight, linguistic sensibilities, and flexible thinking of an informed human mind, and is far too intricate to be performed using a simplistic, probability- or prediction-based, by-the-book approach such as machine translation—a fact that no amount of technological advances or artificial intelligence, or even the arrival of the singularity, can change. Unlike cars, surgery, and shogi, translation is not a technology, science, or a game; it is an art that requires the human touch. Consequently, computers remain inept and unreliable translators, and will never be able to catch up to human translators.

Click here for the Japanese version.