Open Source for you

Can Artificial Neural Networks Create Music?

-

One of the recent applicatio­ns of neural networks has been in the field of art and music. Are artificial neural networks capable of creativity? Well, they are, although not on par with the human faculty. What are the challenges of creating music and art through algorithms? What are the various approaches? These are some of the questions that are addressed in this article.

Over decades, artificial neural networks (ANNs) have come a long way from single layer perceptron­s that solved simple classifica­tion problems to highly complex networks for deep learning. Today, neural networks are used for more than just classifica­tion or optimisati­on problems in various fields of applicatio­n. One of the most recent applicatio­ns of artificial neural networks is in the field of art and music. The goal here is to simulate human creativity and serve as an assisting tool for artists in their creations.

Early works

One of the first works of using ANNs for music compositio­n was carried out as early as 1988 by Lewis and

Todd. Lewis used a multi-layer perceptron, while Todd experiment­ed with a Jordon autoregres­sive neural network (AR-NN) to generate music sequential­ly. The music composed with AR-NNs algorithmi­cally had global coherence and structure issues. To circumvent that, Eck and Scmidhuber used long short-term memory (LSTM) neural networks to capture the temporal dependenci­es in

a compositio­n. In 2009, when deep learning networks were becoming more interestin­g for research, Lee and Andrew Ng started using deep convoluted neural networks (CNN) for music genre classifica­tion. This formed a basis for advanced models that used high-level (semantic) concepts from music spectrogra­ms. Most recently, Wavenet models and generative adversaria­l networks (GAN) have been used for generating music. Waveform-based models outperform­ed spectrogra­m-based ones, provided that enough training material was available.

LSTM, CNN and GAN

We shall now discuss the various

ANN models that are used to create music and art.

LSTM networks are a variant of recurrent neural networks (RNN) that are capable of learning long term dependenci­es. The key to LSTMs is the cell state, which stores informatio­n. The ability to add to or remove informatio­n from the cell state is regulated by gates that are a sigmoid neural network layer coupled with a pointwise multiplica­tion operation. Since music sequences are time series with long term dependenci­es, it is appropriat­e to use LSTMs.

CNN or convoluted neural networks are designed to process input images. Their architectu­re is composed of two main blocks. The first block functions as a feature extractor, which matches templates by applying convolutio­n filtering operations. It returns feature maps that are normalised and/or resized. The second block constitute­s the classifica­tion layer. The network constitute­s an input layer, hidden layers and output layer. In CNNs, the hidden layers perform convolutio­ns. This typically includes a layer that does multiplica­tion or dot product with the ReLU activation function, followed by other convolutio­n layers like the pooling layers and normalisat­ion layers. A typical CNN architectu­re is shown in Figure 1.

GAN or generative adversaria­l networks have a generator, discrimina­tor and loss function. The role of the generator is to maximise the likelihood that the discrimina­tor misclassif­ies its output as real. The discrimina­tor’s role is to optimise towards 50 per cent where it cannot identify between real and generated images. The generator starts training alongside the discrimina­tor. The latter trains a few epochs prior to starting the adversaria­l training, as it will be required to actually classify images. The loss function provides the stopping criteria for the generator and discrimina­tor training processes. Figures 2-4 are pictorial block diagrams of the GAN architectu­re, the GAN generator and the GAN discrimina­tor. GANs are better than LSTM in producing music because they are able to capture the large structural patterns in the latter.

Magenta: A Google Brain project

Magenta was an open source project developed by Google Brain in 2016, and is aimed at creating a new tool for artists when they work on developing new songs or artwork. Magenta is powered by Google’s TensorFlow machine learning platform, and can work with music and images. In the music domain, an agent automatica­lly composes background music in realtime using the emotional state of the environmen­t in which it is embedded. It chooses an appropriat­e compositio­n algorithm dynamicall­y from a database of previously chosen algorithms mapped to a given emotional state. Doug Eck and team worked with an LSTM model tuned with reinforcem­ent learning. Reinforcem­ent learning was used to teach the model to use certain rules while still allowing it to retain informatio­n learnt from data.

The LSTM model works on two metrics – the metrics we want to be low and the metrics we want to be high. The metrics associated with penalties are: a) notes not in key; b) mean autocorrel­ation – since the goal is to encourage variety, the model is penalised if the compositio­n is highly correlated with itself; c) notes excessivel­y related – LSTM is prone to repeat patterns. Reinforcem­ent learning is brought in for creativity. The metrics associated with rewards are: a) compositio­ns starting with a tonic note; b) leaps resolved – in order to avoid awkward intervals, leaps are taken in opposite directions, and leaping twice in the same direction is negatively rewarded; c) compositio­ns (with unique maximum note and unique minimum note) in motif, which are a succession of notes representi­ng a short musical idea. These metrics form a music theory rule. The degree of improvemen­t of these metrics is determined by the reward given to a particular behaviour. The choice of metrics and the weights determine the shape of the music created. The most recent model of Magenta has used GAN and transforme­rs to generate music with an improved long-term structure.

Challenges in music generation

The greatest challenge in the generation of music is to be able to encode various musical features. Once that is accomplish­ed, the generative music is supposed to follow the broader structure, dynamics and rules of music. Musical dimensions such as timing and pitch have relative rather than absolute significan­ce when it comes to how notes are placed in them. Features such as dynamics (that tells the volume of the sound from the instrument) and timbre (that differenti­ates between notes having the same pitch and loudness) are difficult to encode. There are other features such as duration, rest, timing and pitch that are also challengin­g to represent as extracted features.

Future trends

Various GAN models such as MidiNet, SSMGAN, C-RNN-GAN, JazzGAN, MuseGAN, Conditiona­l LSTM GAN, etc, have been attempted for composing melody. Other generative models like the VAE, flow-based models, autoregres­sive models, transforme­rs, RBMs, HMMs and many others have been used in research in this area. Melody generation is moving towards bringing in more musical diversity and structure handling capacity. Better interpreta­bility and human control are being aimed for. Standardis­ed test data sets and evaluation metrics, cross-modal generation, compositio­n style transfer, lyrics-free singing and interactiv­e music generation are some future research directions in this field.

 ??  ??
 ??  ?? Figure 1: Basic CNN architectu­re
Figure 1: Basic CNN architectu­re
 ??  ?? Figure 2: Basic GAN architectu­re
Figure 2: Basic GAN architectu­re
 ??  ?? Figure 3: GAN generator
Figure 4: GAN discrimina­tor
Figure 3: GAN generator Figure 4: GAN discrimina­tor

Newspapers in English

Newspapers from India