NZ Business + Management

SPEAKING TO COMPUTERS

BILL BENNETT. TRACES THE EVOLUTION OF COMPUTER- BASED SPEECH RECOGNITIO­N AND DISCOVERS THAT, MORE THAN EVER, THE TECHNOLOGY'S MAKING LIFE EASIER.

- BILL BENNETT IS AN AUCKLAND- BASED BUSINESS IT WRITER AND COMMENTATO­R. EMAIL BILL@ BILLBENNET­T.CO. NZ

People have always talked to computers. But it wasn’t always rational. The practice stopped being a sign of eccentrici­ty four decades ago when computers started shipping with microphone­s. From then they started to make more sense of what they were being told. At first computers could only recognise a handful of words. The first specialist speech recognitio­n computer build in the 1970s managed just ten.

Thankfully things have improved since those days. For the past 20 years or so we’ve be able to dictate documents and even control basic computer functions.

While the technology wasn’t reliable, it was a boon to anyone who couldn’t, or preferred not to, use a keyboard.

As computers got more powerful their ability to recognise speech has improved.

More recently computers have learnt to do more with speech input and often they talk back with realistic voices.

Most of the hottest action is in the consumer space. Technology giants compete with products like smart speakers and wireless earbuds that respond to speech. There’s less fuss about it, but speech recognitio­n has also found its way into the business world.

Early PC-based software would only pick up about 90 percent of speech. That sounds OK in theory, but in practice it meant the technology wasn’t up to serious work. At that level of performanc­e it often takes more time and effort to correct a dictated document than to tap out the words on the keyboard.

Over time computers have become more powerful again. They now have more storage and connect to the Internet. This means they can throw more resources at interpreti­ng speech. The

Internet means some of the hard work can take place in huge computers elsewhere.

BUILT-IN SPEECH SYSTEMS

Today tech firms bake basic speech recognitio­n into computers, phones and tablets. It is often part of the operating system. These products pick-up between 95 and 97 percent of spoken words. That makes them far more practical than their predecesso­rs.

Most speech systems can improve performanc­e with training, although this is personal. You can train a system to learn your voice, but that won’t help it understand your colleagues or friends.

Microsoft Windows 10 has speech recognitio­n software in Cortana, the virtual assistant software. You can ask it questions or get it to search for informatio­n online using the Bing search engine. Apple has something similar with Siri. Google Assistant is an alternativ­e.

Windows 10 includes Windows Speech Recognitio­n. It can handle dictation, but it is old software that has been around for a decade now. It’s the least reliable speech software you might find, but if you have Windows on your computer it is free.

Apple’s MacOS has a dictation feature you can switch on from the settings screen. The basic software only works with an Internet connection. An enhanced version of dictation works when you are offline.

You may need to set up your computer’s microphone to get any of these programs to work. But once that’s done the software is always there and waiting for you to talk.

Phone users can use Siri on an iPhone and Google Assistant with Android. Both also have dictation tools and they only work with an Internet connection.

You may need to investigat­e your phone’s online support to find out how to use these. Neither of them offer fabulous performanc­e, but they are both more than good enough to dictate short lists and reminder notes.

IT’S PERSONAL

As always with speech, your experience here may differ from mine. Speech recognitio­n is far more personal than other digital technologi­es. What works great for one person may not work at all for someone else, and there are anomalies. I have a hybrid New Zealand-UK accent, but on the iPhone, the Australian settings work better than either of those two.

Built-in speech recognitio­n tools are OK; they work up to a point. But if you are serious about voice recognitio­n, it pays to invest more in specialist software.

The best known voice recognitio­n product range is Dragon from Nuance. Dragon technology often turns up embedded in products, but you can also buy it as a separate package. It will definitely improve your computer’s speech recognitio­n.

There are many Dragon options, including specialist versions for legal and medical users. I recently tested Dragon Profession­al version 6. It costs US$300, which may seem expensive. Yet if you need voice recognitio­n, you can expect to be more productive. Most buyers get a quick return on the investment.

Nuance says Dragon Profession­al has 99 percent voice recognitio­n accuracy. My testing suggests that’s about right. It performs better than Apple’s built-in voice recognitio­n and much better than Microsoft’s.

In practice, the difference between, say, 95 and 99 percent accuracy is huge. In one case you need to correct about one in every 20 words, in the other it is around one in 100. Apple is about 95, while Windows is around 93 percent accurate.

As with the other systems, Dragon Profession­al gets better with training. In my tests it managed to get almost everything I said when using it to dictate. There’s a secondary function where it can transcribe recorded speech. Because it usually means dealing with more than one voice it can be tricky. But the results are impressive. I turned an hour-long recorded interview into a text document. It was readable, although with some hilarious errors.

Dragon Profession­al is particular­ly good at controllin­g a computer. If you have a disability, or need to work hands-free, perhaps in a dirty workplace, this is invaluable.

SMART SPEAKERS

The other place where speech recognitio­n shines is when its applied to smart (Internet-connected) speakers. They can play music, but can also understand your commands and talk back to you.

Smart speakers can tell you the weather forecast, read news bulletins and even buy things for you online. Some people find them creepy because they listen to everything you say.

Smart speakers can be purchased from Amazon, Google and Apple, among others.

For now they have limited functional­ity and they aren’t suitable for business users – so you might struggle to get the IRD to accept them as tax deductible. But they are a sign that there is going to be lot more speech recognitio­n in our lives in the future.

 ??  ??
 ??  ??
 ??  ??

Newspapers in English

Newspapers from New Zealand