Do robots have musical dreams?
Software makers iZotope have used machine-learning tech on new releases of Ozone, Neutron and Nectar. Their CTO, Jonathan Bailey, fills us in…
The terms ‘machine learning’ and ‘deep learning’ are used a lot these days. What exactly do they mean, in layman’s terms?
“Machine learning refers to specific techniques within the broader field of AI that allow a system to find patterns within large amounts of data or to make a decision in response to previously unseen data. A common example is the facial recognition technology. The software on your phone has obviously never seen your photos before – because they didn’t exist before you took them – and yet it can identify (‘classify’) faces and group (‘cluster’) them.
“Machine learning techniques have been around for decades, largely centered around the use of neural networks. Neural networks are connected, statistical models that are inspired by the way the neurons in your brain function as a system of connected nodes.
“Over the past ten years, two forces have combined to allow for breakthroughs in the use of machine learning techniques: the explosion of digital data, and the cheap availability of computing resources (due to cloud computing solutions such as Amazon Web Services). This is where deep learning comes in. Deep learning refers to the use of highly complex neural network models that use several layers of nodes, connected in complicated configurations that require powerful computers to train – on large data sets – and operate.”
How does machine/deep learning help improve software tools for musicians and audio professionals?
“iZotope has invested heavily in these techniques over the past few years. One example from Neutron, our intelligent channel strip, uses deep learning to identify (‘classify’) which instrument is represented by the audio in any given track in your music session, and based on that categorisation, and some additional acoustical traits we analyse within the audio, we make a recommendation for which dynamics, EQ and/or exciter settings to apply to prepare that track for your mix.
“We’re now using deep learning to not only analyse audio content but also process it. In our recent release of RX 7, the Music Rebalance feature uses deep learning to ‘unmix’ a musical mixture into individual stems that can be rebalanced or otherwise processed separately. We’re exploring how deep learning might be used to synthesise content in the future.”
What are the big pros and cons?
“Deep learning has solved some problems we struggled to solve in the past. For example, many of our customers asked us for a way to remove lavalier microphone rustle noise from recordings, which was tough to solve even using our powerful spectral analysis and processing technology.
“For companies interested in developing this capability, using techniques from deep learning is getting easier, but it’s still not that easy. One of the main challenges in implementing a working deep learning solution is having access to good training data. This is kind of a new territory for companies that have traditionally focused on algorithm development. The software you use to create a neural network is freely available, commodity technology (Google TensorFlow is a common example). As I say, for companies of a certain size, access to large amount of computing power is reasonably affordable. Data has become a big bottleneck and poses an interesting problem. Google gives their software away, and charges pennies for their cloud computing service, but they closely guard their data.
“That said, deep learning is not a panacea. We still rely heavily on knowledge that come from the canon of digital signal processing. Learning how to effectively use and train a deep neural network is getting easier, but the cutting-edge research is still done by highly skilled scientists (usually with PhDs). Neural networks can be very difficult to debug and sometimes they function as a kind of ‘black box’ – you don’t totally know what’s going on inside. They are also computationally and resource intensive, so that makes engineering them to work in certain real-time applications – such as synthesisers or audio plugins – very challenging.
“Deep learning makes for an exciting story, but ultimately, we want the magic to be in the result a customer gets, not how she got there.”
How can musicians harness this technology while retaining creativity?
“There are a couple of different research camps. One from the world of musicology, focused on algorithmic musical composition. In this space you have Amper Music, who have a product that can create generative music examples for your content, like your YouTube video or ad. Others focus on applications like autoaccompaniment. So some groups are trying to automate creativity, and others are trying to enhance it.
“This is a really delicate balance but iZotope is firmly in the camp of enhancing creativity. I greatly admire research teams like Google Magenta, whose stated purpose is to use machine learning to create art – but that’s not iZotope’s philosophy or strategy. We want to use deep learning to help you create your art. We are currently more focused on technical applications, but I do see us pushing into more creative domains as long as we stay true to our purpose of enabling creativity. We’re not out to replace human creativity.”
So will software end up writing and mixing our music for us?
“In some cases, it already is. If you’re a great singer-songwriter but you’ve never opened up a DAW in your life, deep learning will be able to help you get a great-sounding recording without having to learn what a compressor is. If you work in a DAW all day, it will learn what effects you like and don’t like, what visual and auditory information you need to get your work done, and allow you to focus on the music itself.
“Photography was supposed to kill painting. It didn’t. I have faith in our ability to invent new ideas.”
“Deep learning is getting easier, but it’s still not that easy”