YOU SAY YOU WANT A REVOLUTION

It’s here – now can they use AI to make the lyrics to Beatles hit songs better?

2021-08-25 -

Here is a quick pub quiz question for you. Which megahit began as “Scrambled Eggs”, with the working lyrics:

Scrambled eggs

Oh my baby how I love your legs

Not as much as I love scrambled eggs...

It is, of course, “Yesterday” by The Beatles, which Guinness World Records lists as the most covered song of all time, amassing over 1,600 recorded versions to date, and 7 million performances in the 20th century alone.

If we can accept this as a working definition of the most successful pop song ever written and performed, we’re left with an intriguing question: could the song have achieved its staggering success if the lyrics hadn’t eventually morphed into “Yesterday, all my troubles seemed so far away...”? Or would it have been consigned to the same pile of novelty hits as David Bowie’s “The Laughing Gnome”, or The Beatles’ own “Yellow Submarine”?

By 1976, Paul Mccartney was apparently determined to answer this himself, adorning the slick R&B of “She’s My Baby” with the immortal lines:

She’s taking me by surprise, she’s my baby Like gravy, down to the last drop

I keep mopping her up, yeah, yeah, she’s my baby

Surprisingly, the song did not become a modern standard.

This leads us to a second question. What role do the lyrics actually play in a song? It is depressing to note that none of the songs by pop’s only Nobel laureate Bob Dylan have amassed more streams on Spotify than Eiffel 65’s “Blue (Da Ba Dee)”, which mostly consists of the words “Da Ba Dee Da Ba Di” repeated over and over again.

The reality is that, when listening to a pop record, we respond instinctively to the melody, harmony, rhythm, arrangement, vocal tone and fashion sense of the singers, and the lyrics play a very different role to written or spoken language. Indeed, the sleeve notes of records by the cult English band Pulp include the instruction, “Please do not read the lyrics whilst listening to the

recordings”, to emphasise that lyrics are not poetry, but the words to a song. Divorcing them from their full context by reading them can make even the most mind-blowing songs seem banal.

Given that lyrics are often the hardest part of a song to finish, one might wonder if there is any need to slave over them at all, or whether modern science can deliver an efficient method for automatically producing them.

Artificially intelligent lyrics

In recent years, the field of artificial intelligence has made great strides in designing software and hardware that can perform human tasks. Within artificial intelligence sits the subfield of machine learning, which redefines how we traditionally program computers.

Whereas a conventional software program solves a task by giving a computer explicit instructions for how to solve it down to the last detail, a machine-learning approach instead gives the computer some data to learn from, and gets it to learn the solution by itself. For example, one could show a computer many different images of John, Paul, George and Ringo, plus a label for each image that tells you which Beatle it is. The computer would then figure out how to correctly classify new images of the Beatles. This approach, which is underpinned by a complex mathematical framework, is the fundamental technology that allows computers to beat humans at chess, drive cars, automatically recognise features in pictures, and replace many of our mundane day jobs with machines.

In particular, a vast swathe of our current machine learning advances are based on neural networks – detailed computer models that mimic the basic structure of the human brain. By first building an abstract mathematical representation of a single neuron with a computer program, then stitching large numbers of these neurons together inside a computer, it is possible to solve problems that humans have not yet managed to solve themselves – including automatically producing song lyrics!

Recurrent neural networks

To understand a neural network in detail, we first need to understand a single neuron. The human brain contains approximately 100 billion nerve cells, or neurons, which collect electrical inputs from other neurons. Each neuron sums the electrical signals it receives and, if the resulting voltage is greater than a threshold, it fires an electrical signal through its output. In animals and people, this occurs through various quirks of biochemistry. In machine learning, we instead use a simple model of this behaviour called a “perceptron”, which represents the inputs as positive or negative numbers, and the neuron itself as a function that acts on those numbers. Assuming that we have some input data, represented by the series of numbers which we have labelled X1, X2, X3 and X4, we can sum the inputs through a mathematical function that sends an output under a specified condition: threshold” condition, or we can choose more complicated behaviour. We can then chain a bunch of neurons together, and the simplest approach is to have the neurons passing their outputs forwards through a network, which becomes a “multilayer perceptron” network.

In the next example, two input variables pass on to a “hidden layer” of three neurons, each of which sums the inputs of the two neurons in the first layer, acts on them, and decides what output to send to the final layer:

To make things even more complicated, each neuron in the hidden layer has an adjustable weight on each input, allowing the amount of each input reaching that neuron to be tuned up or down before the neuron acts on the sum. We can build ever more complicated networks by increasing the number of hidden layers, and increasing the numbers of neurons in each layer.

The modern fashion for “deep” neural networks describes networks with extremely large numbers of neurons, which have only become possible to deal with in recent years as computers have gotten faster and cheaper.

So far so good, but how does this relate to lyric writing? The full details are extremely technical, but allow me to offer a potted summary.

It turns out that many of the tasks we wish to solve in daily life correspond to being able to guess an output in terms of some input variables.

Taking our Beatles photo identification challenge, for example, we want the computer to learn which photos show Paul, which show John, which show Ringo and, last but not least, which show the wonderful songwriter who gave us “Something” to be thankful for.

In this case, each of our images provides a set of input values (e.g. the colour contents of the pixels of the image), and the output is either “Paul”, “John”, “Ringo” or “George” depending on what the image shows. We can then feed a large number of Beatles images into the network, and play with its mathematical definition until it correctly predicts the category for most of the images.

The resulting network is now a machinelearned system that can recognise each Beatle. For lyrics, we can play a similar game: by feeding in a huge set of previous pop lyrics, we can train a network to produce examples of word sequences that mimic real lyrics, using a similar technology to that used to train chat bots. In doing so, it is common to

use a special type of network called a “recurrent neural network”, in which the outputs of some neurons are fed back into earlier stages of the network, so that it no longer passes information only in the forwards direction (above). This gives the network something like a short-term memory, which is useful for predicting word sequences that require some knowledge of what came before.

For example, a normal feed-forward network would have no idea of what words typically follow other words in pop lyrics, and would only ever be capable of bizarre non-sequiturs.

Whilst this might be useful for reproducing the works of Mark E. Smith (lyricist for British post-punk trailblazers The Fall), a Beatle lyric generator would be better off knowing that “She Loves You” rather than “You Loves She”. This is exactly the problem for which recurrent neural networks are useful.

In a recurrent neural network, some neuron

outputs are fed back into earlier stages of the network, instead of only passing information in the forwards direction.

John, Paul, George, Robot

Once it has been shown a large number of word sequences, a recurrent neural network can predict the next word in a sequence based on some initial text. We can thus supply a few words, and get the network to write a set of lyrics for us. Many examples of this technique exist online, including a handy freely available web app at davidlebech.com/lyrics.

Feeding it the opening lines of “Scrambled Eggs” returns the rather impenetrable: lord don’t hear before you know there’s you love me will need i go now now like she ain’t town that

... which indicates that Sir Paul need not yet be concerned.

In future, however, we can expect advances in natural language processing to make lyrics scarily close to the real thing, to the point where we could not say that a human hadn’t written them. Even more impressively, researchers have created a neural network that, supplied with lyrics, can automatically make an entire song in the style of a particular artist: jukebox.openai.com. The results are strange and vaguely distorted, but scarily close to the artists in question, many of whom are now deceased.

One day, might we be listening to brand-new Beatles albums, written and performed by robot moptops? Or perhaps, like their human forebears, robot bands will take everything that came before them and dazzle us with their creativity, showing us new lyrical and musical landscapes.

In any case, “I’ve Got A Feeling” that the human bands of tomorrow will need to do “Something” to “Get Back” at the robots. Alternatively, perhaps they could accept “A Little Help From [Their] Friends”, “Come Together” and “Let It Be”?

MARTIN WHITE is a particle physicist and associate professor at the University of Adelaide. He is also a member of the ATLAS experiment searching for supersymmetric particles at the Hadron Collider. His most recent story, “Can Facebook solve the biggest mystery in physics?”, appeared in Issue 88.

YOU SAY YOU WANT A REVOLUTION

It’s here – now can they use AI to make the lyrics to Beatles hit songs better?

Newspapers in English

Newspapers from Australia