The Philippine Star

Face­book re­searchers use math for bet­ter trans­la­tions

- Translation · Natural Language Processing · Linguistics · Social Sciences · Computer Science · Facebook · Laos · William Shatner · Star Trek 2 · Google · Microsoft · Russia · China · Baidu, Inc. · Paris · Madrid · London · France · National Centre for Scientific Research · Yandex.ru

De­sign­ers of ma­chine trans­la­tion tools still mostly rely on dic­tio­nar­ies to make a for­eign lan­guage un­der­stand­able. But now there is a new way: num­bers.

Face­book re­searchers say ren­der­ing words into fig­ures and ex­ploit­ing math­e­mat­i­cal sim­i­lar­i­ties be­tween lan­guages is a promis­ing av­enue – even if a uni­ver­sal communicat­or a la Star Trek re­mains a dis­tant dream.

Pow­er­ful au­to­matic trans­la­tion is a big pri­or­ity for in­ter­net gi­ants. Al­low­ing as many peo­ple as pos­si­ble world­wide to com­mu­ni­cate is not just an al­tru­is­tic goal, but also good busi­ness.

Face­book, Google and Mi­crosoft as well as Rus­sia’s Yan­dex, China’s Baidu and oth­ers are con­stantly seek­ing to im­prove their trans­la­tion tools.

Face­book has ar­ti­fi­cial in­tel­li­gence ex­perts on the job at one of its re­search labs in Paris.

Up to 200 lan­guages are cur­rently used on Face­book, said An­toine Bordes, Euro­pean co-di­rec­tor of fun­da­men­tal AI re­search for the so­cial net­work.

Au­to­matic trans­la­tion is cur­rently based on hav­ing large data­bases of iden­ti­cal texts in both lan­guages to work from. But for many lan­guage pairs there just aren’t enough such par­al­lel texts.

That’s why re­searchers have been look­ing for an­other method, like the sys­tem de­vel­oped by Face­book which cre­ates a math­e­mat­i­cal rep­re­sen­ta­tion for words.

Each word be­comes a “vec­tor” in a space of sev­eral hun­dred di­men­sions. Words that have close as­so­ci­a­tions in the spo­ken lan­guage also find them­selves close to each other in this vec­tor space.

“For ex­am­ple, if you take the words ‘cat’ and ‘dog,’ se­man­ti­cally, they are words that de­scribe a sim­i­lar thing, so they will be ex­tremely close to­gether phys­i­cally” in the vec­tor space, said Guillaume Lam­ple, one of the sys­tem’s de­sign­ers.

“If you take words like Madrid, Lon­don, Paris, which are Euro­pean cap­i­tal cities, it’s the same idea.”

These lan­guage maps can then be linked to one an­other us­ing al­go­rithms – at first roughly, but even­tu­ally be­com­ing more re­fined, un­til en­tire phrases can be matched with­out too many er­rors.

Lam­ple said re­sults are al­ready promis­ing.

For the lan­guage pair of EnglishRo­ma­nian, Face­book’s cur­rent ma­chine trans­la­tion sys­tem is “equal or maybe a bit worse” than the word vec­tor sys­tem, said Lam­ple.

But for the rarer lan­guage pair of English-Urdu, where Face­book’s tra­di­tional sys­tem doesn’t have many bilin­gual texts to ref­er­ence, the word vec­tor sys­tem is al­ready su­pe­rior, he said.

But could the method al­low trans­la­tion from, say, Basque into the lan­guage of an Ama­zo­nian tribe?

In the­ory, yes, said Lam­ple, but in prac­tice a large body of writ­ten texts are needed to map the lan­guage, some­thing lack­ing in Ama­zo­nian tribal lan­guages.

“If you have just tens of thou­sands of phrases, it won’t work. You need sev­eral hun­dreds of thou­sands,” he said.

Ex­perts at France’s CNRS na­tional sci­en­tific cen­ter said the ap­proach Lam­ple has taken for Face­book could pro­duce use­ful re­sults, even if it doesn’t re­sult in per­fect trans­la­tions.

Thierry Poibeau of CNRS’s Lat­tice lab­o­ra­tory, which also does re­search into ma­chine trans­la­tion, called the word vec­tor ap­proach “a con­cep­tual rev­o­lu­tion.”

Newspapers in English

Newspapers from Philippines