Deccan Chronicle

Machine translatio­ns lose human touch

Software giants are now taking people’s help to give better results

- NAVEENA GHANATE | DC

Translatio­n into different languages by machines sounds easier that it is. It is thought of as merely a matter of replacing words in one language with correspond­ing words in another. But it is much more complicate­d than that as there can be many ways of saying the same thing. This is evident from the gaffes committed by Google Translate.

The popular American TV show hosted by Jimmy Fallon recently had a segment on Google Translate, mocking the technology. The lyrics of an English language song are converted to a different language and then the output is converted back to English. For example, if you translate the line from the famous song “We will we will rock you” into Telugu, the output is “Memu ninnu raksistamu” which means “we will save you”.

A machine translatio­n system like Google Translate has come a long way since when IBM first translated 60 sentences from Russian to English in 1954, to Microsoft achieving human parity in translatio­n from Chinese to English recently.

“We have a model for four Indic languages — Hindi, Bangla, Tamil and Urdu. We have gained at least 20 per cent improvemen­t compared to the previously deployed models. This is significan­t in terms of end user experience,” a Mircosoft spokespers­on told this newspaper.

Companies were using the Statistica­l Machine Translatio­n technique which struggled to make sense of words in local context and their dynamics with other words. Very recently, giants like Google, Microsoft, and Facebook have started relying on Neural Machine Translatio­n in which a large neural network is built and trained to mimic neuron brain cells.

With Neural Machine Translatio­n, companies achieved better translatio­n, but it is still in its early days. Although plain texts are converted easily, in case of idioms and jokes, the translatio­n loses the human touch. “Deep neural networks have large parameter spaces and need ample amounts of data in order to generalise adequately,” the Microsoft spokespers­on said.

Machines are certainly struggling to make sense of words in local context and their dynamics with other words. “This is because while Indian languages are widely spoken (in terms of native speakers), most of these languages have very little or no parallel resources available to build a general domain in Machine Translatio­n system. In the absence of readily available parallel corpora, comparable resources are often used to extract good quality parallel data from the web,” said Microsoft.

Meanwhile, companies are taking the help of people to correct machine translatio­ns. The idiomatic “Call it a day”, which means “Stop working on something”, is translated into Hindi as “Ant karana”.

This update was suggested by Google translate users evident from a shield symbol which appears next to such translatio­ns. Microsoft Bing translates “Speak of the devil...” correctly as “Hum abhi isake bare men hee baat kar rahe the” (we were just talking about it).

The idiom “Cutting corners” which means “Doing something poorly in order to save time or money” translated into Telugu is “Mulalanu kattadam” (Building corners) Translatio­n varies based on the context of the words being used. Translatin­g “cakewalk” into Telugu is “labhalalo vaata” (Share in profits). “It is not a cakewalk” is translated as “idi oka kyarekki kadu” (This is not a caricature). Due to the use of multiple fonts in Indian languages, a significan­t portion of web data is not usable to extract useful parallel content - MICROSOFT

 ??  ?? A forest in Madhya Pradesh — Pench Tiger Reserve in Hindi was translated to “Screw Tiger Reserve” in English!
A forest in Madhya Pradesh — Pench Tiger Reserve in Hindi was translated to “Screw Tiger Reserve” in English!

Newspapers in English

Newspapers from India