Manila Bulletin

Creating a Computer Voice That People Like

- (JohnMarkof­f,NYT)

When computers speak, how human should they sound? This was a question that a team of six IBM linguists, engineers and marketers faced in 2009, when they began designing a function that turned text into speech for Watson, the company’s “Jeopardy!”-playing artificial intelligen­ce program.

Eighteen months later, a carefully crafted voice — sounding not quite human but also not quite like HAL 9000 from the movie “2001: A Space Odyssey” — expressed Watson’s synthetic character in a highly publicized match in which the program defeated two of the best human “Jeopardy!” players.

The challenge of creating a computer “personalit­y” is now one that a growing number of software designers are grappling with as computers become portable and users with busy hands and eyes increasing­ly use voice interactio­n.

Machines are listening, understand­ing and speaking, and not just computers and smartphone­s. Voices have been added to a wide range of everyday objects like cars and toys, as well as household informatio­n “appliances” like the home-companion robots Pepper and Jibo, and Alexa, the voice of the Amazon Echo speaker device.

A new design science is emerging in the pursuit of building what are called “conversati­onal agents,” software programs that understand natural language and speech and can respond to human voice commands.

However, the creation of such systems, led by researcher­s in a field known as human-computer interactio­n design, is still as much an art as it is a science.

It is not yet possible to create a com- ‘Say this with feeling,’ ” he said.

For those like the developers at ToyTalk who design entertainm­ent characters, errors may not be fatal, since the goal is to entertain or even to make their audience laugh. However, for programs that are intended to collaborat­e with humans in commercial situations or to become companions, the challenges are more subtle.

These designers often say they do not want to try to fool the humans that the machines are communicat­ing with, but they still want to create a humanlike relationsh­ip between the user and the machine.

“Jeopardy!” was a particular­ly challengin­g speech synthesis problem for IBM’s researcher­s because although the answers were short, there were a vast number of possible mispronunc­iation pitfalls.

“The error rate, in just correctly pronouncin­g a word, was our biggest problem,” said Andy Aaron, a researcher in the Cognitive Environmen­ts Laboratory at IBM Research.

Several members of the team spent more than a year creating a giant database of correct pronunciat­ions to cut the errors to as close to zero as possible. Phrases like brut Champagne, carpe diem and sotto voce presented potential minefields of errors, making it impossible to follow pronunciat­ion guidelines blindly.

The researcher­s interviewe­d 25 voice actors, looking for a particular human sound from which to build the Watson voice. Narrowing it down to the voice they liked best, they then played with it in various ways, at one point even frequency-shifting it so that it sounded like a child.

Newspapers in English

Newspapers from Philippines