Mi­crosoft says speech recog­ni­tion tech­nol­ogy reaches “hu­man par­ity”

Fiji Sun - - Weather / Tv, Movie Guide -

Ar­ti­fi­cial in­tel­li­gence just keeps get­ting smarter and smarter. Now Mi­crosoft re­searchers say they’ve de­vel­oped speech recog­ni­tion tech­nol­ogy that can grasp a hu­man con­ver­sa­tion as well as peo­ple do. The work out of the Mi­crosoft Ar­ti­fi­cial In­tel­li­gence and Re­search depart­ment was pub­lished in a sci­en­tific pa­per this week. It shows that when the speech recog­ni­tion soft­ware “lis­tened” to peo­ple talk­ing, it was able to tran­scribe the con­ver­sa­tion with the same or fewer er­rors than pro­fes­sional – hu­man – tran­scrip­tion­ists. The tech­nol­ogy de­liv­ered a word er­ror rate (WER) of 5.9 per­cent, which is roughly the same as that of peo­ple who were asked to tran­scribe the same con­ver­sa­tion. “We’ve reached hu­man par­ity,” Xue­dong Huang, Mi­crosoft’s chief speech sci­en­tist, said in a press re­lease. “This is an his­toric achieve­ment.” The achieve­ment is no small feat. The com­pany says this is the first time that a com­puter has been shown to equal hu­mans in the abil­ity to rec­og­nize words.

“Even five years ago, I wouldn’t have thought we could have achieved this. I just wouldn’t have thought it would be pos­si­ble,” Harry Shum, ex­ec­u­tive vice pres­i­dent in charge of Mi­crosoft’s Ar­ti­fi­cial In­tel­li­gence and Re­search group, said in the re­lease. Just as IBM’s Watson cog­ni­tive com­put­ing sys­tem and per­sonal smart­phone as­sis­tants like Ap­ple’s Siri have been mak­ing waves and be­com­ing more use­ful and ver­sa­tile, this tech­nol­ogy has the po­ten­tial to have a sig­nif­i­cant im­pact. Mi­crosoft could use it to make its own mo­bile as­sis­tant, Cor­tana, more ef­fec­tive, and to boost the voice com­mand ca­pa­bil­i­ties of its Xbox gam­ing sys­tems.

Of course, even at the level of “hu­man par­ity,” the tech­nol­ogy isn’t fool­proof. The com­puter did not rec­og­nize ev­ery word ac­cu­rately. The re­searchers found that the com­puter’s er­ror rate for mis­hear­ing “have” for “is,” for ex­am­ple, was the same as you’d ex­pect from a per­son in a nor­mal con­ver­sa­tion. One of the main ways the team achieved its progress in the field was by uti­liz­ing “neu­ral net­work tech­nol­ogy,” where huge chunks of data were used as train­ing sets. Words were pre­sented as “con­tin­u­ous vec­tors in space,” placed close to­gether to teach the com­puter to rec­og­nize pat­terns com­mon in ac­tual hu­man speech. Ge­of­frey Zweig, man­ager of the com­pany’s Speech and Di­a­log re­search group, said that “this lets the mod­els gen­er­al­ize very well from word to word.” The sys­tem that was used to reach the mile­stone was the com­pany’s Com­pu­ta­tional Net­work Tool­kit (CNTK). The CNTK pro­cessed deep learn­ing al­go­rithms across mul­ti­ple com­put­ers us­ing a spe­cial­ized chip that im­proved the speed. The next phase of the re­search in­volves im­prov­ing speech recog­ni­tion in re­al­life set­tings, like lo­ca­tions with heavy back­ground noise. The team also hopes that the com­puter could even­tu­ally give names to in­di­vid­ual speak­ers to dis­tin­guish when cer­tain peo­ple are talk­ing. For those tech­nol­ogy alarmists who worry that this could lead to sen­tient ma­chines, a la “Ter­mi­na­tor,” the re­search team of­fered some re­as­sur­ances. While com­put­ers are learn­ing to process lan­guage bet­ter than ever, true com­pre­hen­sion is still a long way off.

“The next fron­tier is to move from recog­ni­tion to un­der­stand­ing,” Zweig said.

Mi­crosoft Ar­ti­fi­cial In­tel­li­gence and Re­search depart­ment’s work shows that when the speech recog­ni­tion soft­ware “lis­tened” to peo­ple, it was able to tran­scribe the con­ver­sa­tion with the same or fewer er­rors than pro­fes­sional hu­man tran­scrip­tion­ist

Mi­crosoft’s chief speech sci­en­tistg WXue­dong Huang.

Newspapers in English

Newspapers from Fiji

© PressReader. All rights reserved.