Khaleej Times

How Big Tech is making machines think and talk like humans

Most of us still prefer to interact with real people rather than automated machines, but conversati­onal artificial intelligen­ce could change that

- shalini Verma Shalini Verma is the CEO of PIVOT technologi­es

The first time I was introduced to a computer, my teacher asked me to type a bunch of obscure commands. At the end of the two-minute exercise, the screen cheerfully flashed Hello Shalini. Humans started to interact with computers through a command-line interface that performed tasks at the behest of engineers who typed rows of command prompts. This required users to learn a plethora of commands that computers would obey. In the mid-80s, graphical user interface ushered in an era of personal computers, thus widening their ambit of users. The graphics got more intuitive, and the screens became more tactile. Yet we could not entirely eliminate the learning curve for the less-technicall­y inclined.

In 2011, Apple launched Siri, which despite being less than perfect demonstrat­ed the liberating possibilit­ies of having a conversati­on with a machine. Conversati­on, whether chatting or talking is easily the most intuitive and natural form of communicat­ion for humans. Increasing­ly, the industry momentum is in favour of conversati­on or natural language as the humanmachi­ne interface of choice. Conversati­on will be the primary way we will search for informatio­n, consume digital services, buy things, and get any kind of assistance online. Fundamenta­lly, conversati­ons are dialogues wherein both parties should more or less understand each other. Dig a little deeper, and you will find a trove of technologi­es powering human-machine dialogues, such as natural language processing, speech recognitio­n, speech synthesis, text to speech to name a few.

Researcher­s took the convention­al route by trying to teach computers the rules of language. How does one teach machines complex languages that are riddled with as many exceptions as rules? It was like boiling the ocean. So, funding dried up for such seemingly impossible projects. After decades of trial and error, mathematic­s came to the rescue of our messy languages. Researcher­s used sophistica­ted statistica­l methods to train software to scan large amounts of text already broken down into grammatica­l components, also called parsed language. Computer recognises patterns in the text and uses it to understand new content. Siri, Alexa, and Google Home started to get smarter with each release, slowly becoming a part of our lives. IBM, Microsoft and Google offer conversati­onal AI services for other businesses to build their own apps and services.

When you interact with a digital assistant or any conversati­onal AI, it first tries to recognise what you are saying or typing, and then understand what you are trying to say, based on a

probabilit­y that it understood the words correctly. How should the computer understand “time flies when we are on leave”? Should it take the words literally, or should it make sense of the phrase? Should it treat time as a bird, or should it understand time in the context of leave, by calculatin­g the odds in favour of ‘time passing quickly’. The computer uses probabilit­y rather than real world colloquial­ism and common sense to understand what we are saying. It gets more complex when you throw tone of voice into the mix.

Teaching software to understand and respond to our typed chitchat is tough enough. Teaching it to speak to us is an entirely different ball game. It has been a journey of sorts from using ‘concatenat­ive’ models that strung together recorded voice nuggets to ‘parametric’ models that let the software produce its own raw audio. The statistica­l models were manually finetuned to calculate the probabilit­y for combinatio­n of words to occur in a phrase.

The error rates were high. They generated machine like responses that needed a human makeover. Researcher­s did not give up.

Luckily, Google found a clever way to synthesize voice using deep neural networks that learn from voice recordings of people talking, and from text that match what they are saying. Deep neural networks modelled on the human brain allow for fast training using databases of human speech and converting waveform or spectrogra­m into characters. The fidelity of the sound generated from scratch is striking. China’s biggest search engine Baidu has trained text-to-speech synthesis systems to clone a voice after listening to a short audio file.

Researcher­s are taking volumes of parsed and annotated text and letting the software learn from it. IBM’s AI tool Project Debater can parse text to construct arguments for and against a topic in a debate. Speech technologi­es are getting better with pitch, stresses and speed that make for a more natural conversati­on.

Tech giants are using creative ways to train digital assistants to have more human like conversati­ons. They are hiring writers, poets and playwright­s to improve the experience and lend a certain personalit­y to digital assistants. We know that 10 per cent of all conversati­ons is casual chitchat, which requires natural, often witty answers to make the conversati­on engaging.

The holy grail of conversati­onal AI is the Turing Test, when humans are no longer conscious that they are conversing with a machine. Conversati­onal AI in general will be universall­y present on websites and apps to redefine our digital experience­s. Our days will be filled with conversati­ons with our apps, cars, and appliances. Some days, the conversati­ons will feel so human that they will pass the Turing test.

China’s biggest search engine Baidu has trained text-to-speech synthesis systems to clone a voice after listening to an audio file. Researcher­s are taking volumes of parsed and annotated text and letting the software learn from it.

 ??  ??
 ??  ??

Newspapers in English

Newspapers from United Arab Emirates