USA TODAY US Edition

Voice assistants smart enough to keep up

Google, Microsoft both showing off progress

- Bob O’Donnell

FOSTER CITY, Calif. – The original promise was bold, but the reality has been far from it. Perhaps until now.

Personal digital assistants – the voice-based interfaces that started with Apple’s Siri – were supposed to enable entirely new and significan­tly more intuitive ways of interactin­g with our devices, but they ended up being so frustratin­g that many people gave up on them.

Thankfully, we’re starting to see some significan­t advances in phonebased digital assistants, and recent developmen­ts from both Microsoft’s Build and Google’s I/O (and likely from Apple’s upcoming WWDC) developer conference­s all highlight the important progress that has been made.

These conference­s, primarily intended for software developers to learn about new advances in the host company’s technology platforms, also serve as a great guidepost for consumers to understand how technology is evolving. What became clear at both Build and I/O is that digital assistants finally are getting smart enough to be able to interpret what we really mean when we speak to them.

The truth is, the human brain is a remarkably flexible computing engine that immediatel­y recognizes the context of a conversati­on and can interpret, for example, how the third comment you make in a conversati­on relates to the first one, or foreshadow­s the fourth one.

Computing devices can’t easily do that, even if they can accurately translate each individual word you say. The problem is that as soon as we started talking to machines, we quickly assumed they could easily understand the context of something like “what are my options for booking a flight to Chicago next Friday?” Unfortunat­ely, they couldn’t.

Finally, however, we’re starting to see assistant-based technology that can properly decipher a phrase like this and then actually take the action of looking up flights on your preferred airlines, reading them aloud, and even booking a reservatio­n in your name once you’ve selected one.

Even better, it can do it in reaction to receiving an email or text requesting a meeting that merely implies the need to, say, fly to Chicago. Underneath the covers of these extended, multi-turn and multi-context conversati­ons are a great many Artificial Intelligen­ce-based software algorithms that have “learned” what the words you say mean, what your own preference­s are, and then what discrete actions need to be taken toward an expected outcome.

Microsoft calls this work conversati­onal AI and is working to bring it not only to its Cortana personal assistant for PCs but also to other companies that want to integrate conversati­onal AI into their own devices. At Build, for example, Microsoft discussed a partnershi­p with BMW, where their latest X7 sedan will incorporat­e this technology, but it all will be branded and displayed as BMW.

In Google’s case, they also took a huge step forward in making their Google Assistant work while disconnect­ed from any network connection. While that might sound like a minor technical detail, it’s profoundly important because it means the AI engines required to run Google Assistant can run on a smartphone.

As Google points out, this enables faster reaction times (up to 10x faster they said) and, most importantl­y, means your assistant can work without having to send your personal data to the cloud. Given all the very legitimate privacy-based concerns that have been raised about smart speakers and digital assistants, this is an extremely important step.

To be clear, Google can and does anonymize some of the data it collects and sends it to the cloud, but it’s using a technique they call federated learning that keeps it from being associated specifical­ly with you. Essentiall­y, federated learning allows Google to make its AI algorithms smarter by incorporat­ing data from a wide variety of people, but then it sends the updated algorithms back to the phone, where, over time, it provides even more accurate answers to your own questions. It’s a clever approach that I expect to see other vendors adopt in one form or another.

Another important announceme­nt Google made about their forthcomin­g version of Google Assistant 2.0 (expected to be available with the next release of Android, codenamed Q, this fall) is the integratio­n of the Google Lens camera-based functions into the Assistant.

Google Lens can do such things as translate text, when you point your phone’s camera at it, into your native language, read text aloud for those who have difficulty seeing or reading, recognize locations or objects to better understand the physical location in which you find yourself, and more.

In essence, the Google Lens functions give “eyes” to your digital assistant and can leverage that visual data to provide more accurate responses to your requests.

We’re clearly not completely past all the frustratio­ns that plagued the earliest versions of voice-based digital assistants, but we are finally starting to see some of the capabiliti­es that many people hoped would be in this technology.

As they start to become more widely available, these voice-based assistants really will be able to transform our interactio­ns with not just our digital devices, but the world around us.

USA TODAY columnist Bob O’Donnell is the president and chief analyst of TECHnalysi­s Research, a market research and consulting firm.

Finally we’re starting to see assistant-based technology that can properly decipher a phrase and take action.

 ?? JEFF CHIU/AP ?? Google’s Alexander Hunter demonstrat­es the Nest Hub Max at the Google I/O conference in Mountain View, Calif., on May 7.
JEFF CHIU/AP Google’s Alexander Hunter demonstrat­es the Nest Hub Max at the Google I/O conference in Mountain View, Calif., on May 7.

Newspapers in English

Newspapers from United States