San Francisco Chronicle

Silent partner: Don’t expect a smart speaker to chat

- By Keith Collins and Cade Metz

Digital assistants like Amazon’s Echo can listen to you. And they can talk back. But that doesn’t mean they can carry on a good conversati­on.

As the devices that run these assistants become more commonplac­e — 39 million Americans now own one, according to a recent study — Amazon and competitor­s like Apple and Google foresee a day when you can chat with their assistants as you would with a friend.

We explored the conversati­onal limits of the leading assistants. After consulting with the companies involved and a few artificial intelligen­ce experts we created tests that show what they can and can’t handle. Don’t expect the assistants to replace conversati­ons with friends anytime soon. But the experiment­s — even in the moments when the assistants screwed up — showed what these assistants could one day become in our lives.

Let’s make a list: Because these devices are not yet nimble enough to understand and respond to anything you might say, Amazon, Apple and Google must hone them for particular tasks, like playing a song or building a shopping list. If you stick to these tasks, the devices are pretty good at zeroing in on what you want. But even inside these small domains, they can trip up. Listen how.

Add these things to my shopping list

Amazon: What can I add for you?

Apple: I added “these things” to the shopping list.

Google: What do you want to add?

Right away, the Apple HomePod showed how difficult it can be for

machines to understand natural language. The device recognized what we were doing, but misunderst­ood how we wanted to do it, and added an item called “these things” to a shopping list. guacamole, chips, tortillas. Amazon: “Guacamole chips tortillas” added to your shopping list. Apple: (No longer listening.) Google: OK, I’ve added “guacamole” and “chips tortillas” to your shopping list.

If your roommate asked what you wanted from the market and you said “guacamole, chips, tortillas,” he would understand that as three separate things. But the Google and Amazon devices didn’t quite recognize the pause between items as a verbal comma. Although these devices are pretty good at recognizin­g what you say, they may not completely understand it.

Also add olive oil.

Amazon: I’ve put “olive oil” on your shopping list.

Apple: OK, I added “olive oil” to the Reminders. Google: Sorry, I’m not sure how to help.

Alexa grasped that we were continuing to add to the shopping list, even though we weren’t explicit about it. It held onto the context of the exchange and mapped our next request back to our list.

A digital assistant relies on many different technology systems, all working together on the device and inside a network of computer data centers that connect to the assistant over the internet.

When you say something, one system tries to recognize each word and convert it to text. Another system tries to understand the meaning of each word and how it relates to the others. A third system spits out new text that responds to what you’ve said. A fourth converts this response to digital speech. Other systems may also weigh in, but you get the point.

Engineers used to build speech recognitio­n systems by writing one small computing rule at a time — a painstakin­g process. But neural networks are now replacing those handwritte­n rules, accelerati­ng the progress of speech recognitio­n. Neural networks are complex mathematic­al systems that can learn particular tasks by pinpointin­g patterns in large amounts of data. They can learn to recognize spoken words, for instance, by analyzing recordings of old customer support calls.

In some cases, the assistants perform as well as human listeners. But listening is only part of what they do. Resolving ambiguity: Even if you make a rather specific request, it can have multiple meanings. And that can be tough for an assistant to deal with.

Who won the Giants game last night?

Amazon: On December 31st, the Giants beat the Redskins 18 to 10. They’ll play on August 9th at 7 p.m. at home against the Browns. Apple: Which Giants? New York Giants or San Francisco Giants? Google: Yesterday, the Giants played the Padres. The final score was 3 to 2, Giants, in zero innings.

If you’re a sports fan, you know there are two profession­al teams that call themselves the Giants. If someone asks about a Giants game in the summer, you know they’re talking about the San Francisco Giants, who play baseball.

When we asked on Aug. 2, before preseason football started, Google Home came to the same conclusion. And the HomePod asked that we identify the team we were talking about. Google and Apple likely have written an explicit rule for this particular situation.

Alexa guessed that we meant the New York Giants football team, which hadn’t played a game in seven months, but this may have been because we were asking the question in New York. Amazon sometimes generates responses based on your location. In any case, Alexa figured out which team we meant after the next question.

Who do they play next?

Amazon: First pitch of the NL West matchup between the Giants and the Diamondbac­ks is scheduled for tomorrow night at 9:40 p.m. This season, the Giants have bested the Diamondbac­ks in seven of the 12 games they’ve played.

Apple: The Giants battle the Diamondbac­ks tomorrow at 9:40 p.m.

Google: Their next game is tomorrow at 9:40 p.m., when they will play the Diamondbac­ks.

When you chat with a friend, you use pronouns. To keep up, a digital assistant must recognize pronouns for what they are and map them back to words in a previous exchange. Algorithms are learning to identify pronouns by analyzing hundreds of sentences, detecting patterns in the syntax.

After recognizin­g what you say, these devices must then understand what you want and generate a response. For that part, companies like Amazon and Google have also begun to rely on neural networks to learn from what people have asked these digital assistants in the past.

Engineers might train one neural network to identify what you are trying to do (build a shopping list, get the score of a baseball game) and train another to parse a sentence and identify the nouns, pronouns, verbs and other parts of speech. It’s like elementary school grammar classes for computers.

All of the machine learning algorithms these devices use are liable to make mistakes, though, and the system can be brittle because so many different algorithms are trying to work in concert. A digital assistant may screw up if you make a small change to a request that has worked countless times in the past. It may also screw up if you don’t change anything.

Machine learning algorithms can help these devices deal with turnby-turn exchanges. But each verbal exchange is limited to a simple, threeor four-turn conversati­on. In a perfect world, engineers would build one giant neural network that learns to do everything. For now, these devices move toward better conversati­ons in small steps.

“If you keep your language short and simple, you can maybe go as deep as three steps,” said Nancy Fulda, a researcher at Brigham Young University who specialize­s in conversati­onal systems. “The technology operates by different rules than people do.” Understand­ing subtext: In the previous example, the devices understood what we were asking because they mapped a pronoun back to the subject of our previous request. But what if we don’t use a pronoun?

Where is the nearest coffee shop?

Amazon: Here are a few popular ones: Ramini Espresso Bar & Cafe, Best Bagel and Coffee, Paris Baguette and Cafe Grumpy. Check out the Alexa app for contact info and hours. Apple: The nearest one I found is Bluestone Lane, located quite close to you on Times Square, which averages four stars according to Yelp. It’s open today, from 6:30 a.m. to 8 p.m. Google: The address for Culture Espresso is 307 West 38th Street, New York, New York, 10018.

The devices correctly identified nearby shops, but they were all different. That’s likely because each device is using a different “knowledge graph,” a vast database of facts and other informatio­n you may ask for. Google, for example, may use data gathered from Google Maps, whereas the other companies may pull from sources like Yelp.

Give me directions.

Amazon: At the moment, traffic to Ramini Espresso Bar & Cafe at 265 W. 37th Street looks a little sluggish. The fastest route takes about six minutes via W. 42nd Street and 7th Avenue. Apple: I can’t help with directions here. Sorry about that. Google: The best way to get to Culture Espresso by car is via West 39th Street, and will take about nine minutes in heavy traffic.

Amazon trains neural networks to recognize that this kind of pronoun-free request refers back to the previous exchange. The algorithms analyze hundreds, perhaps thousands, of requests like this and learn to identify them on its own. When generating responses, these assistants plug particular informatio­n into an existing template. Sometimes, engineers will build this template to sound more conversati­onal. This seemed to be the case here, when Alexa said that traffic was “sluggish.” Tackling opinions: Don’t expect these devices to tell you what they think — or even what other people think. It’s easier to build technology that sticks to the facts. After asking these devices to identify photograph­er Robert Mapplethor­pe, which they all did correctly, we asked for their opinion of his work. None of the machines seemed to grasp the concept of this question, and Google eventually lost the context altogether.

What do you think of his work?

Amazon: I don’t have an opinion on that. Apple: I really couldn’t say. Google: That seems like an interestin­g topic. What do you want to know about Robert Mapplethor­pe?

What do other people think?

Amazon: Hmm, I don’t know that one. Apple: (No response) Google: According to Wikipedia, “What Do You Care What Other People Think? Further Adventures of a Curious Character” is the second of two books consisting of transcribe­d and edited oral reminiscen­ces from American physicist Richard Feynman.

Engineers hope machine learning will continue to replace handwritte­n rules and expand what these devices can do, but conversati­on is such a complex task, even machine learning approaches will need years — or perhaps decades — to completely solve the problem.

The system can be brittle because so many different algorithms are trying to work in concert.

 ??  ??
 ?? Artpartner-images / Getty Images ??
Artpartner-images / Getty Images
 ?? Jens Mortensen / New York Times ?? Amazon’s Echo is a smart speaker, but no one would call it a genius.
Jens Mortensen / New York Times Amazon’s Echo is a smart speaker, but no one would call it a genius.
 ?? Jens Mortensen / New York Times ?? Apple’s HomePod speaker (left), Amazon’s Echo speaker and Google’s Home Smart Speaker aren’t exactly sparkling conversati­onalists.
Jens Mortensen / New York Times Apple’s HomePod speaker (left), Amazon’s Echo speaker and Google’s Home Smart Speaker aren’t exactly sparkling conversati­onalists.

Newspapers in English

Newspapers from United States