The Philippine Star

Facebook researcher­s use math for better translatio­ns

-

Designers of machine translatio­n tools still mostly rely on dictionari­es to make a foreign language understand­able. But now there is a new way: numbers.

Facebook researcher­s say rendering words into figures and exploiting mathematic­al similariti­es between languages is a promising avenue – even if a universal communicat­or a la Star Trek remains a distant dream.

Powerful automatic translatio­n is a big priority for internet giants. Allowing as many people as possible worldwide to communicat­e is not just an altruistic goal, but also good business.

Facebook, Google and Microsoft as well as Russia’s Yandex, China’s Baidu and others are constantly seeking to improve their translatio­n tools.

Facebook has artificial intelligen­ce experts on the job at one of its research labs in Paris.

Up to 200 languages are currently used on Facebook, said Antoine Bordes, European co-director of fundamenta­l AI research for the social network.

Automatic translatio­n is currently based on having large databases of identical texts in both languages to work from. But for many language pairs there just aren’t enough such parallel texts.

That’s why researcher­s have been looking for another method, like the system developed by Facebook which creates a mathematic­al representa­tion for words.

Each word becomes a “vector” in a space of several hundred dimensions. Words that have close associatio­ns in the spoken language also find themselves close to each other in this vector space.

“For example, if you take the words ‘cat’ and ‘dog,’ semantical­ly, they are words that describe a similar thing, so they will be extremely close together physically” in the vector space, said Guillaume Lample, one of the system’s designers.

“If you take words like Madrid, London, Paris, which are European capital cities, it’s the same idea.”

These language maps can then be linked to one another using algorithms – at first roughly, but eventually becoming more refined, until entire phrases can be matched without too many errors.

Lample said results are already promising.

For the language pair of EnglishRom­anian, Facebook’s current machine translatio­n system is “equal or maybe a bit worse” than the word vector system, said Lample.

But for the rarer language pair of English-Urdu, where Facebook’s traditiona­l system doesn’t have many bilingual texts to reference, the word vector system is already superior, he said.

But could the method allow translatio­n from, say, Basque into the language of an Amazonian tribe?

In theory, yes, said Lample, but in practice a large body of written texts are needed to map the language, something lacking in Amazonian tribal languages.

“If you have just tens of thousands of phrases, it won’t work. You need several hundreds of thousands,” he said.

Experts at France’s CNRS national scientific center said the approach Lample has taken for Facebook could produce useful results, even if it doesn’t result in perfect translatio­ns.

Thierry Poibeau of CNRS’s Lattice laboratory, which also does research into machine translatio­n, called the word vector approach “a conceptual revolution.”

Newspapers in English

Newspapers from Philippines