Adding a voice to the future of computing is trickier than it sounds
There’s a reason why Siri is a young woman with no accent
Jason Mars is an African-American professor of computer science who also runs a tech start-up. But when his company’s artificially intelligent smartphone app talks, it sounds “like a helpful, young Caucasian female.”
“There’s a kind of pressure to conform to the prejudices of the world” when you are trying to make a consumer hit, he said. “It would be interesting to have a black guy talk, but we don’t want to create friction, either. First, we need to sell products.”
Mars’ start-up is part of a growing high-tech field called conversational computing. This technology is being popularized by programs like the Siri system in Apple’s iPhone and Alexa, which is built into Echo, Amazon’s artificially intelligent home computing device.
Conversational computing is holding a mirror to many of society’s biggest preconceptions around race and gender. Listening and talking are the new input and output devices of computers. But they have social and emotional dimensions never seen with keyboards and screens.
Do we, for example, associate the stereotypical voice of an English butler — think of Jarvis the computer in Iron Man — with a helpful and intelligent person? And why do so many people want to hear a voice that sounds like it came from a younger woman with no accent?
Choosing a voice has implications for design, branding or interacting with machines. A voice can change or harden how we see each other. Where commerce is concerned, that creates a problem: Is it better to succeed by complying with a stereotype, or risk failure by going against type?
For many, the answer is initially clear. Microsoft’s artificially intelli- gent voice system is Cortana, for example, and it was originally the voice of a female character in the video game Halo.
“In our research for Cortana, both men and women prefer a woman, younger, for their personal assistant, by a country mile,” said Derek Connell, a senior vice-president at Microsoft. In other words, a secretary — a job that is traditionally seen as female.
Earlier this month, Google introduced several voice-based products, including Google Home, its version of Echo. All use Google Assistant, which also speaks in tones associated with a young, educated woman.
Google Assistant “is a millennial librarian who understands cultural cues and can wink at things,” said Ryan Germick, who leads the personality efforts in building the assistant. “Products aren’t about rational design decisions. They are about psychology and how people feel.”
But sometimes, if you want people to figure out quickly that they are talking to a machine, it can be better to have a man’s voice.
For example, IBM’s Watson, when it talks to Bob Dylan in TV commercials, has a male voice. When Ashok Goel, a professor at the Georgia Institute of Technology, adapted Watson to have a female voice as an informal experiment in how people relate to machines, his students couldn’t tell it was a computer.
But Watson’s maleness is the exception. Amazon’s A.I. technology is another in the comforting female voice camp. “Alexa was always an assistant, and female,” said Peng Shao, who worked at Amazon on the Echo and is now at a Seattle start-up.
Gender is just the starting point. Can your A.I. technology understand accents? And can it respond in a way that feels less robotic and at least mimics human empathy?
“You need a persona,” Shao said. “It’s a very emotional thing — people would get red, even get violent, if it didn’t understand them. When it did understand them, it felt like magic. They sleep next to them. This is heading for hospitals, senior care, a lot of sensitive places.”
Capital One developed a banking app on Alexa and found it had to dial down the formality to make people comfortable talking about their finances with a computer.
“Money is inextricably linked to emotion, enabling and preventing things in your life,” said Stephanie Hay, the head of content strategy, culture and A.I. design at Capital One. At first, the app said, “Hello,” but that seemed too tense. “‘Hi, there’ worked better,” she said. “She’s my friend, hanging out with me in the kitchen. I need her to be reliable and approachable, but not invasive.”
And, of course, there are regional issues to consider when creating a robotic voice. For Cortana, Microsoft has had to tweak things like accents, as well as languages, and the jokes Cortana tells for different countries.
Local accents can be found in various versions of Siri. It’s possible to localize the accent on an iPhone for the United States (“Samantha,” on to the phone’s settings), Australia (“Ka- ren”), Ireland (“Moira”), South Africa (“Tessa”) and Britain (“Daniel”). Apple could not say whether the English tradition of male butlers influenced its British choice.