SPEAKING TO COMPUTERS
BILL BENNETT. TRACES THE EVOLUTION OF COMPUTER- BASED SPEECH RECOGNITION AND DISCOVERS THAT, MORE THAN EVER, THE TECHNOLOGY'S MAKING LIFE EASIER.
People have always talked to computers. But it wasn’t always rational. The practice stopped being a sign of eccentricity four decades ago when computers started shipping with microphones. From then they started to make more sense of what they were being told. At first computers could only recognise a handful of words. The first specialist speech recognition computer build in the 1970s managed just ten.
Thankfully things have improved since those days. For the past 20 years or so we’ve be able to dictate documents and even control basic computer functions.
While the technology wasn’t reliable, it was a boon to anyone who couldn’t, or preferred not to, use a keyboard.
As computers got more powerful their ability to recognise speech has improved.
More recently computers have learnt to do more with speech input and often they talk back with realistic voices.
Most of the hottest action is in the consumer space. Technology giants compete with products like smart speakers and wireless earbuds that respond to speech. There’s less fuss about it, but speech recognition has also found its way into the business world.
Early PC-based software would only pick up about 90 percent of speech. That sounds OK in theory, but in practice it meant the technology wasn’t up to serious work. At that level of performance it often takes more time and effort to correct a dictated document than to tap out the words on the keyboard.
Over time computers have become more powerful again. They now have more storage and connect to the Internet. This means they can throw more resources at interpreting speech. The
Internet means some of the hard work can take place in huge computers elsewhere.
BUILT-IN SPEECH SYSTEMS
Today tech firms bake basic speech recognition into computers, phones and tablets. It is often part of the operating system. These products pick-up between 95 and 97 percent of spoken words. That makes them far more practical than their predecessors.
Most speech systems can improve performance with training, although this is personal. You can train a system to learn your voice, but that won’t help it understand your colleagues or friends.
Microsoft Windows 10 has speech recognition software in Cortana, the virtual assistant software. You can ask it questions or get it to search for information online using the Bing search engine. Apple has something similar with Siri. Google Assistant is an alternative.
Windows 10 includes Windows Speech Recognition. It can handle dictation, but it is old software that has been around for a decade now. It’s the least reliable speech software you might find, but if you have Windows on your computer it is free.
Apple’s MacOS has a dictation feature you can switch on from the settings screen. The basic software only works with an Internet connection. An enhanced version of dictation works when you are offline.
You may need to set up your computer’s microphone to get any of these programs to work. But once that’s done the software is always there and waiting for you to talk.
Phone users can use Siri on an iPhone and Google Assistant with Android. Both also have dictation tools and they only work with an Internet connection.
You may need to investigate your phone’s online support to find out how to use these. Neither of them offer fabulous performance, but they are both more than good enough to dictate short lists and reminder notes.
IT’S PERSONAL
As always with speech, your experience here may differ from mine. Speech recognition is far more personal than other digital technologies. What works great for one person may not work at all for someone else, and there are anomalies. I have a hybrid New Zealand-UK accent, but on the iPhone, the Australian settings work better than either of those two.
Built-in speech recognition tools are OK; they work up to a point. But if you are serious about voice recognition, it pays to invest more in specialist software.
The best known voice recognition product range is Dragon from Nuance. Dragon technology often turns up embedded in products, but you can also buy it as a separate package. It will definitely improve your computer’s speech recognition.
There are many Dragon options, including specialist versions for legal and medical users. I recently tested Dragon Professional version 6. It costs US$300, which may seem expensive. Yet if you need voice recognition, you can expect to be more productive. Most buyers get a quick return on the investment.
Nuance says Dragon Professional has 99 percent voice recognition accuracy. My testing suggests that’s about right. It performs better than Apple’s built-in voice recognition and much better than Microsoft’s.
In practice, the difference between, say, 95 and 99 percent accuracy is huge. In one case you need to correct about one in every 20 words, in the other it is around one in 100. Apple is about 95, while Windows is around 93 percent accurate.
As with the other systems, Dragon Professional gets better with training. In my tests it managed to get almost everything I said when using it to dictate. There’s a secondary function where it can transcribe recorded speech. Because it usually means dealing with more than one voice it can be tricky. But the results are impressive. I turned an hour-long recorded interview into a text document. It was readable, although with some hilarious errors.
Dragon Professional is particularly good at controlling a computer. If you have a disability, or need to work hands-free, perhaps in a dirty workplace, this is invaluable.
SMART SPEAKERS
The other place where speech recognition shines is when its applied to smart (Internet-connected) speakers. They can play music, but can also understand your commands and talk back to you.
Smart speakers can tell you the weather forecast, read news bulletins and even buy things for you online. Some people find them creepy because they listen to everything you say.
Smart speakers can be purchased from Amazon, Google and Apple, among others.
For now they have limited functionality and they aren’t suitable for business users – so you might struggle to get the IRD to accept them as tax deductible. But they are a sign that there is going to be lot more speech recognition in our lives in the future.