Hey AI, let’s talk: Of conversational skills in chatbots
Apair of glasses from Meta shoots a picture when you say, “Hey, Meta, take a photo.” A miniature computer that clips to your shirt, the Ai Pin, translates foreign languages into your native tongue. An artificially intelligent screen features a virtual assistant that you talk to through a microphone.
Last year, OpenAI updated its ChatGPT chatbot to respond with spoken words, and recently, Google introduced Gemini, a replacement for its voice assistant on Android phones. Tech companies are betting on a renaissance for voice assistants, many years after most people decided that talking to computers was uncool. Will it work this time? Maybe, but it could take a while. Large swaths of people have still never used voice assistants like Amazon’s Alexa, Apple’s Siri and Google’s Assistant, and the overwhelming majority of those who do said they never wanted to be seen talking to them in public, according to studies done in the last decade. I, too, seldom use voice assistants, and in my recent experiment with Meta’s glasses, which include a camera and speakers to provide information about your surroundings, I concluded that talking to a computer in front of parents and their children at a zoo was still staggeringly awkward.
It made me wonder if this would ever feel normal. Not long ago, talking on the phone with Bluetooth headsets made people look batty, but now everyone does it. Will we ever see lots of people walking around and talking to their computers as in sci-fi movies?
I posed this question to design experts and researchers, and the consensus was clear: Because new AI systems improve the ability for voice assistants to understand what we are saying and actually help us, we’re likely to speak to devices more often in the near future — but we’re still many years away from doing this in public. New voice assistants are powered by generative artificial intelligence, which use statistics and complex algorithms to guess what words belong together, similar to the autocomplete feature on your phone. That makes them more capable of using context to understand requests and follow-up questions than virtual assistants like Siri and Alexa, which could respond only to a finite list of questions. For example, if you say to ChatGPT, “What are some flights from San Francisco to New York next week?” — and follow up with “What’s the weather there?” and “What should I pack?” — the chatbot can answer those questions because it is making connections between words to understand the context of the conversation. (The New York Times sued OpenAI and its partner, Microsoft, last year for using copyrighted news articles without permission to train chatbots.)
An older voice assistant like Siri, which reacts to a database of commands and questions that it was programmed to understand, would fail unless you used specific words, including “What’s the weather in New York?” and “What should I pack for a trip to New York?”
The former conversation sounds more fluid, like the way people talk to each other. A major reason people gave up on voice assistants like Siri and Alexa was that the computers couldn’t understand so much of what they were asked — and it was difficult to learn what questions worked.