Sound+Image

‘AI’ WARS

Everything’s gone ‘AI’. Derek Powell seeks for real rather than faux AI, and debates how this could affect the future of audio technology.

-

Everything seems to be going ‘AI’, but is it really intelligen­t?

Artificial Intelligen­ce (AI) is everywhere at the moment. And frankly, the claims for products that are said to incorporat­e “AI technology” are often pretty dubious. Everyone, it seems, has a product that is now artificial­ly intelligen­t: ranging from AI lawnmowers (from Roomba, Honda and even LG) to AI lawyers. Yes, lawyers (and in Darwin, of all places... Ailira, which is short for Artificial­ly Intelligen­t Legal Informatio­n Resource Assistant, can write a properly certified will for you based on your responses to some standard questions).

There are even AI fridges, like the Samsung Smart InstaView refrigerat­or, which allows you to see inside your fridge from any smartphone, if you forget whether you need milk while at the supermarke­t.

The AI industry has even appropriat­ed the ‘dot ai’ domain. While .ai should (like .au for Australia) properly refer to a website located in the Caribbean island nation Anguilla, shameless proponents of supposedly smart products are now beating a path to the domain registries of the most northerly of the Leeward Islands in the Lesser Antilles to claim their “dot ai” websites, like minimally intelligen­t moths to the proverbial flame.

Intelligen­ce vs automation

But are products that claim AI actually intelligen­t — or just automated? It can be hard to determine the difference, as there is no really strictly applied standard for AI. For now, let’s take the commonly applied criterion that AI applies to systems that can perform two specific functions: problem solving, and learning from experience.

So can AI be usefully applied to audio systems? Can we use artificial intelligen­ce (AI) to create IA — Intelligen­t Audio?

Many of the common ‘AI’ systems, like the digital assistants Alexa, Cortana and Siri, appear to reside in audio products, of course. However, these rely on remote processing power for answers, which are merely delivered via audio systems that are themselves really only convention­al microphone­s and speakers acting as the interface to the real AI, which is in the cloud.

Intelligen­t Audio

But there is an expanding list of tasks in sound reproducti­on where AI takes the lead in improving the audio experience. Some of these show real promise and it is worth spending time in studying these early steps to Intelligen­t Audio.

I’d classify the current research into Intelligen­t Audio into two broad categories: first, using AI to analyse sound (and make useful suggestion­s as a result), and second, to enhance sound in ways that convention­al audio techniques can’t manage.

The first category is currently the most active, spurred on by some big players who are close to really making money from recognisin­g and recommendi­ng music to their users. Music recommenda­tion software forms a vital part of the appeal of programs like Spotify and others. Finding new music based on what you enjoy is an essential part of the business model of these services. Having the largest catalogue is important, but what really brings subscripti­ons is the ability to accurately predict that ‘if you enjoy this’ then ‘try that’.

A commonly used method is called Collaborat­ive Filtering, which assumes that people who can be shown to be similar in terms of behaviour or demographi­cs to the target user, and who rate other items in a similar way, will have similar preference­s in music — and thus their music choices will be good recommenda­tions. The method relies on data mining and harvesting informatio­n from sources like social media to come up with a group of like-minded individual­s and applying their preference­s to the target user.

This method works well, but has a bias toward recommendi­ng songs that are already popular; it doesn’t analyse and recommend new music. Analysing and classifyin­g new music automatica­lly would be very useful, but this requires a higher level of AI.

Music classifica­tion

Plenty of researcher­s are working on this problem. The appropriat­ely named Yading Song, of Queen Mary University of London, has written a useful paper1 comparing the many approaches. He notes that the contentbas­ed approach to music classifica­tion attempts to extract and compare the acoustic features of music, such as timbre and rhythm, to recommend songs similar to those the user has listened to in the past. This is more difficult than it sounds (if you’ll pardon the pun) and requires a lot of small steps, including tonal analysis and beat tracking, that are still only imperfectl­y understood as yet. One small area of study is ‘onset detection’, which pinpoints the start of an audio event (like an individual note). This is the necessary first step in many of the analysis techniques mentioned above. While trivial for human musicians, a computer algorithm has to resort to the following complex steps to do this — first computing a “spectral novelty function”, then finding the peaks in that function, and finally backtracki­ng from each peak to a preceding local minimum.

Going a little further, the analysis of music is now being applied to classifyin­g sounds in general. Can a computer detect that a certain sound is actually a dog barking rather than explosions from fireworks?

New soundtrack­s

Going a lot further, researcher­s at the University of North Carolina have demonstrat­ed how they can train a machine learning algorithm to generate realistic sound effects to match with video clips2. Taking a video clip of a dog, or a chainsaw, as input, their algorithm has come up with a matching soundtrack so realistic and well synchronis­ed with the vision that it is difficult to tell the fake sound from the original sounds of the video recording.

There are all sorts of applicatio­ns for creating sounds with AI, and some of them are downright scary. One applicatio­n I came across uses a deep learning voice system to copy and reproduce the voices of literally thousands of people using around half an hour of sample recordings of their speech. On the one hand, voice cloning technology could be used to allow people who have lost the use of their voice through degenerati­ve disease or injury (like the late Professor Hawking) to speak naturally, rather than with the familiar robot-like intonation­s. On the other hand, it could equally be used by the unscrupulo­us to spoof someone’s identity on the phone.

Enhancing sound

But let’s go back a step. If AI can classify sounds, can it go further and actually separate out components in a complicate­d audio signal? This belongs to the second of the two categories we set out to examine — enhancing sound. Writing in the online blog “Towards Data Science”, software developer Daniel Rothman has rounded up a collection of advancemen­ts in audio processing. He describes how AI techniques such as “deep learning” are being used in software by Izotope to “separate spoken dialogue from background noise such as crowds, traffic, footsteps, weather, or other noise with highly variable characteri­stics.”

Humans do this all the time — we can easily follow conversati­ons in noisy environmen­ts. But as anyone who has recorded an interview in such conditions will tell

you, separating speech from such variable background noise simply can’t be done by analogue filters or any convention­al audio technique. Indeed this exact task is the Holy Grail for hearing aid manufactur­ers, so as you can imagine lots of research effort is currently going on. There is great promise that deep neural networks (the kind of AI technology that is used by Google in its ‘image search’ algorithm or by Shazam to identify songs from a small sample) may one day be used to allow hearing aids to first recognise then zero in on particular components of a complicate­d audio signal. Such a system could just amplify speech while ignoring unwanted sounds like passing cars.

Beyond audio

While we’ve mainly looked at AI in audio this time, the benefits of nearly all these methods are also being applied in the video domain.

AI-enhanced ‘smart speakers’ like Amazon Echo, Apple Home Pod and Google Home are rapidly being joined on the market by upmarket TVs with AI smart voice interfaces. As Stephen Dawson reports this issue (p12-3), LG’s latest range comes with a new operating system “web OS with AI”, pushing back against audio-only smart devices.

Like Spotify, Netflix uses AI Analysis techniques in its recommenda­tions to viewers. In a 2015 article4 Netflix revealed that the recommende­r section of their site is responsibl­e for 80% of users’ viewing hours, while searching for content that viewers already know about accounts for only 20%. Presenting great recommenda­tions is therefore vital, especially since their research shows users will take on average no more than 90 seconds to move on from Netflix to another service if they don’t find something they would like to watch. They calculate that the AI techniques helping users to find something is saving up to $1bn a year in potential lost viewers.

These statistics probably sum up the driving force behind the incorporat­ion of AI into new entertainm­ent products and services. Where AI can meet an important revenue goal (like driving viewing or listening hours by making intelligen­t recommenda­tions) — and if it can do it automatica­lly and quickly — then expect to hear lots more about AI in Sound+Image! Derek Powell REFERENCES: avhub.com.au/soundoff 1: https://www.researchga­te.net/profile/ Yading_Song/publicatio­n/277714802_ A_Survey_of_Music_Recommenda­tion _Systems_and_Future_Perspectiv­es/ links/5571726608­aef8e8dc63­3517.pdf 2: http://bvision11.cs.unc.edu/bigpen/ yipin/visual2sou­nd_webpage/ visual2sou­nd.html 3: https://www.izotope.com/en/ products/repair-and-edit/rx/ features/dialogue-isolate.html 4: The Netflix Recommende­r System: Algorithms, Business Value, and Innovation published 2015 in ACM Trans. Management Inf. Syst.

 ??  ??
 ??  ??
 ??  ??   KNOWING YOU: Spotify makes recommenda­tions of music you may like, but are these really intelligen­t, or simply preference matches with other users?
KNOWING YOU: Spotify makes recommenda­tions of music you may like, but are these really intelligen­t, or simply preference matches with other users?
 ??  ??

Newspapers in English

Newspapers from Australia