Machine translation (MT) is a hot topic these days, and computers and algorithms are constantly evolving to produce better translation results than ever before. Scroll through the news, and you may see articles with headlines like “Will GPT replace your human translator?”
With all the hype, you may be wondering: what do advances in machine translation mean for my business and our localization needs?
In this article, we will look at developments in machine translation before examining where the technology still falls short. Then, we will discuss what role, if any, machine translation should play in your translation workflow.
Advances in Automated Translation Technology
In 2016, Google Translate shifted to neural machine translation (NMT). NMT is generally superior to the statistical method translation (SMT) that the tool had relied on since it launched in 2006. Neural machine translation is self-learning, meaning that the process can be fine-tuned to produce better output over time.
Google Translate and other tools have been enhanced by natural language processing (NLP), which improves how computers understand human communication. Simply put, humans do not speak in code or queries. NLP models aim to bridge this gap by digesting large quantities of spoken and written language, then making statistical predictions about what word should appear next in a sentence based on content. GPT-3, the language model behind ChatGPT, is perhaps the best-known example.
As a result of NMT, NLP, and other technology advances, Google Translate and a handful of other translation platforms, such as DeepL, have become useful in producing fit-for-purpose translations for a variety of industries.
Learn more about the latest translation technologies and trends.
The Pitfalls of Using Machine Translation
Many media articles would have you believe that you can now produce perfect translations from one language to another at the push of a button—but the reality is quite different. Machine translation still has several prominent shortcomings that affect the quality of the translations it produces, especially when compared to human translations.
Loss of Brand Voice and Style
The technology behind machine translation is predictive in nature, meaning that it produces text based on the texts that already exist in its database.
This means that machine translation tools do not consider brand voice in their output. Your source-language content may have a clear, well-defined brand voice, but machine translation may flatten it so that you sound like anyone else in your industry.
Consistency may also be an issue—there is nothing to prevent the tool from translating a word one way in a sentence and then using a synonym elsewhere.
Confidentiality and Data Security Issues
When you input text into a publicly available machine translation engine, such as Google Translate, your content is being used to train future iterations of the tool. In other words, any sensitive data you enter could be stored, which may violate your company’s data security policies (or those of a partner organization), as well as ISO certifications.
Professional—paid—versions of tools, such as DeepL Pro, often have different terms and conditions than the free version, encrypting data and ensuring that it is not stored or used to train their tool.
One of the major areas where machine translation tools come up short is assessing cultural context in the source text, then producing a translated text that works for the target audience.
For example, a Brazilian refugee used an AI-based translation app to guide him as he applied for asylum in the United States. The tool was unable to recognize the words Belo Horizonte as the name of a Brazilian city and translated it literally instead—a mistake that a Portuguese-speaking translator would be highly unlikely to make in context. (Mexico’s official tourism website featured this same mistake in 2020, with the city of Puerto Escondido translated as “Hidden Port,” among other errors.)
The same literal translations often occur with personal names as well as idiomatic phrases. For example, take the sentence, “I was left at the altar on my wedding day.” Both DeepL and Google Translate render the sentence in Spanish as Me dejaron en el altar el día de mi boda, which translates back to “They left me at the altar on my wedding day.” A human who speaks English inherently understands that the speaker is one half of a couple, referring to another individual. Machine translation does not always understand societal norms.
Another issue is that differentiation between language variants is limited at best—many engines have a single Spanish output, even though Spanish is spoken across 20+ countries on three continents. The vocabulary used in Argentina, Mexico, and Spain will differ significantly, and some aspects of grammar also differ—especially in less formal texts. Yet machine translation is unable to take this into account, which makes it ill-suited for creative and marketing translations.
Legal translation is another area where machine translation may struggle, because each country uses its own legal terminology. What is understandable for a lawyer in Montevideo might be confusing to one in Madrid. Worse still, machine translation may freely mix and match terminology from multiple countries.
Learn more about Spanish translations into different dialects.
Another problem that commonly arises with machine translation is bias, particularly with regard to gender.
For example, the English noun “doctor” does not specify gender. But Spanish, among other languages, requires the noun (and the article) to be gendered. A machine translation engine will choose whichever word is statistically more likely based on their training data, which can lead to incorrect gender identification.
That means that if 49.9% of the texts used to train the engine use the noun la doctora and 50.1% uses el doctor, the output will use the latter.
Machine translation will also often struggle with how to handle the singular ‘they’ that is used in English, producing garbled output or assigning a gender where none was implied.
Lastly, because engines such as Google Translate and DeepL are trained on material from across the web, their translations can reproduce racial and cultural biases in their training datasets. The same is true of inaccurate information, which can be consumed and reproduced by an MT engine.
Since machine translation evolves quickly, it is difficult to determine how accurate it currently is. The most commonly cited study dates from 2019. That study found that machine translation was 94% accurate when translating medical discharge instructions from English to Spanish. In English to Chinese translations, it was 82% accurate, and in English to Armenian, a language with less training data, it was only 55% accurate.
In particular, machine translation struggles with out-of-context words or phrases, such as those that appear as strings when translating a user interface or as menu items in a website’s dropdown.
For example, the word “Home” in a department store’s navigation bar could be referring to the homepage, or it could be referring to the home goods section of the site. Similarly, a page titled Relojes in Spanish could be selling clocks—or wristwatches,
Likewise, when translating content with variables, machine translation is unlikely to be able to come up with solutions that can accommodate gender and number, as is necessary in languages like German and Russian. (For examples, see our article on the challenges of German translation.)
In general, accuracy is lower in languages with complex grammar and syntax; think Japanese or Russian. If a word has more than one meaning or changes its meaning depending on the syntax, machine translation cannot match a trained human translator.
Similarly, less-commonly spoken languages receive lower accuracy scores because they have a smaller corpus of texts to rely on. You can expect better translations from English to Spanish than to Bulgarian, Finnish, or Quechua, for example.
Is there ever a good time to use machine translation?
Under specific circumstances, it may be beneficial from a cost and/or time perspective to leverage machine translation tools.
At Art One, our project managers assess the viability based on the type of content, the language pairs, and the corpus that is available for translation.
We have had the best results with machine translation when the content is general and not industry-specific. Texts with layers of meaning, including both marketing copy and legal documents, are not well suited for machine translation. Nor are documents that contain bilingual elements, such as help documents that refer to buttons by their original name with a translated gloss in brackets.
An example of a document that is well suited to machine translation is an automotive manual that undergoes annual updates. Using a machine translation engine trained on previous versions of the manual, we can accelerate the translation process.
The other important factor is using a custom machine translation engine that has already been trained on a corpus of high-quality translations specific to the client. It is important to remember that the quality of neural machine translation depends fully on the quality of the translations used to train it. If there are errors or the corpus is not sufficiently large, the output will not be high-quality.
Even in situations where machine translation is beneficial, we always use it only as a starting point. The translation then will go through very thorough editing (checking the accuracy against the source text) and proofing by professional human translators who specialize in the relevant subject matter.
Learn more about our QA Process and procedures.
If you have an upcoming localization project and you would like to see if machine translation would be beneficial, contact Art One Translations. We are happy to explore whether machine translation would be a valuable addition to your translation workflow. We can also discuss our editing and proofing process for machine-translated content and offer alternative options if MT is not suitable for your business.
Comments are closed.