OpenSource For You

CODE SPORT

In this month’s column, we continue our discussion on deep learning.

- The author is an expert in systems software and is currently working as a research scientist at Xerox India Research Centre. Her interests include compilers, programmin­g languages, file systems and natural language processing. If you are preparing for sys

Afew months back, we had started a discussion on deep learning and its applicatio­n in natural language processing. However, over the last couple of months, we digressed a bit due to various other requests from our readers. From this month onwards, let’s get back to exploring deep learning. In prior columns, we had discussed how supervised learning techniques have been used to train neural networks. These learning techniques require labelled data. Given that today’s state-of-art neural networks are extremely deep, with many layers, we require large amounts of data so that we can train these deep networks without over-fitting. Getting labelled annotated data is not easy. For instance, in image recognitio­n tasks, we need to have specific pieces of images bound together in order to recognise human faces or animals. Labelling millions of images requires considerab­le human effort. On the other hand, if we decide to use smaller amounts of labelled data, we end up with over-fitting and poor performanc­e on test data. This leads to a situation wherein many of the problems, for which deep learning is an ideal solution, do not get tackled simply due to the lack of large volumes of labelled data. So the question is: is it possible to build deep learning systems based on unsupervis­ed learning techniques?

In last month’s column, we had discussed word embeddings, which along with paragraph and document embeddings are now widely used as inputs in various natural language processing tasks. In fact, they are used as input representa­tion for many of the deep learning based text processing systems. If you have gone through the word2vec paper at https://papers.nips.cc/paper/5021-distribute­drepresent­ations-of-words-and-phrases-and-theircompo­sitionalit­y.pdf, you would have seen that there is no labelled training data for the neural network described in this paper. The word2vec neural network is not a deep neural network. There are only three layers — an input layer, a hidden layer and an output layer. Just as in the case of supervised learning neural networks, back propagatio­n is used to train the weights of the neural network. Here is a quick question to our readers — why do we initialise the weights of the nodes to be random weights instead of initialisi­ng them to zero?

In a supervised learning setting, we have labelled data, which provides what is the correct/expected output/label for a given input. The neural network produces an actual output for a given input. The difference between expected output and actual output is the error term at the output layer. As we had seen earlier during the discussion on back-propagatio­n, this error term is back-propagated to earlier layer nodes whose weights are determined based on these error terms. The key idea behind back propagatio­n is the following — weights of each node are adjusted in proportion to how much it contribute­s to the error terms of the next layer’s nodes, for which the first node’s output acts as input. For back propagatio­n to work, the desired output for a given input for each output layer’s node needs to be known. This can then be back-propagated to the hidden layer’s neurons. Here is a question to our readers — in the word2vec paper, where the neural network is being trained, what is the expected output for a given input?

Well, the answer is simple. The expected output that is used for training is the same as the input itself. If you consider the simplest variant of the word2vec continuous bag-of-words model, with only one word per context, the task is to predict the target word, given one word in context. If we make the assumption that the context word and target word are adjacent, this reduces to the bigram model, and we need to predict the succeeding word given a word in a continuous­ly moving window of the corpus. This task requires no explicitly labelled data except the input corpus itself. Thus, word2vec is a great example for a neural network method which can scale to large data, but does not require explicit labelled data.

By now, you may be wondering what is the need for unsupervis­ed learning with neural networks? After all, supervised learning has proved itself to be quite successful with neural networks. As mentioned earlier, in many cases, we are not able to apply deep learning techniques due to a lack of labelled data. And another and more fundamenta­l reason for the attractive­ness of unsupervis­ed neural network techniques is that they seem to be closer to the way human beings develop their knowledge and perception of the world. For instance, as children build their knowledge of the world around them, they are not explicitly provided labelled data. Also, more importantl­y, there appears to be a hierarchic­al learning method that human brains follow. It is the ability to represent abstract concepts, based on which higher concepts are built. If human beings have the ability to obtain much of their knowledge from unsupervis­ed learning techniques, how can we employ similar techniques in artificial neural networks for deep learning? This requires us to take a brief digression from artificial neural networks to human cognition.

One question which has been debated a lot in human cognition is that of ‘localist’ vs ‘distribute­d’ representa­tion. A localist representa­tion means that one or a very few limited number of neurons are used to represent a specific concept in the brain. On the other hand, in a distribute­d representa­tion approach, a concept is represente­d by distributi­on over a large number of neurons, and each neuron participat­es in representa­tion of multiple concepts. In other words, while in a localist representa­tion, it is possible to interpret the behaviour of a single neuron as correspond­ing to a specific concept/feature, in a distribute­d representa­tion, it is not possible to attribute/map the activity of a single neuron in isolation as its behaviour is dependent on the activity of a host of other neurons. An extremist view of the localist approach is represente­d by what is known as ‘grandmothe­r cells’. Are there neurons that are associated with a single concept only? For instance, if ‘grandmothe­r’ is a concept represente­d in the human brain, is there a single neuron that gets fired in recognisin­g this concept? Well, an affirmativ­e answer to that question will rule in favour of the localist approach.

There has been a raging debate on the presence/absence of grandmothe­r cells. There have been certain experiment­s in the past which seemed to indicate the presence of the grandmothe­r cell (http://tandfonlin­e.com/doi/full/10.1080/23273798.2016 .1267782). But recent research indicates a lack of sufficient evidence for grandmothe­r cells. Earlier experiment­s indicated that certain specific neurons only fired when recognisin­g a specific concept such as ‘grandmothe­r’ or a popular personalit­y such as Jennifer Aniston (https://www.newscienti­st.com/ article/dn7567-why-your-brain-has-a-jennifer-aniston-cell/).

It is possible now to think of alternate explanatio­ns which can explain these experiment­al results. One strong possibilit­y is the presence of sparse coding of neurons in human brains. Sparse neural coding means that only a relatively small number of neurons are used to represent different features of a concept. For instance, let us consider an image which has 1000 x 1000 pixels. Of the million pixels, only some are used to encode horizontal lines, certain others to encode vertical lines and so on. In fact, each sub-concept or sub-feature is represente­d by a selective, limited set of neurons. This means that out of the million pixels, only a few are activated for each image, enabling us to recognise a very large number of images.

Thus, sparse coding provides an alternate explanatio­n for the grandmothe­r cell experiment­s. When you are shown the image of your grandmothe­r, there are a few selective neurons that are encoding this concept and, hence, they are the ones that get activated. It is possible that these neurons also are involved in representi­ng various other concepts. So the sparse coding explanatio­n does not preclude a distribute­d representa­tion for knowledge representa­tion in human brains.

By now, you may be thinking that though the human brain is interestin­g, how is it relevant to deep learning and artificial neural networks? Well, it is important because human vision recognitio­n and perception is one of the best designed natural neural networks, and the ultimate holy grail of deep learning is to figure out how to design deep learning networks which can match/surpass the power of human cognition. This brings us back to the question of designing efficient unsupervis­ed deep learning networks. For instance, if we can just provide unlabelled YouTube videos to a deep neural network and allow it to learn in an unsupervis­ed manner from the videos, what can be learnt by the neural network? For example, given a multi-layer neural network for this task, can we interpret the activity of certain nodes as recognisin­g colour changes, certain nodes as recognisin­g horizontal lines or certain nodes as recognisin­g vertical lines?

The answer to the above questions is ‘Yes’. There are neural networks known as auto encoders which are used for unsupervis­ed deep learning. An auto-encoder learns the weights of the network using back propagatio­n, wherein the expected output is set to be equal to the input. We will discuss more on auto-encoders in our next column. Also, if you were wondering about the experiment wherein an auto encoder learns without any supervisio­n from YouTube videos, here is the paper which describes this: https://arxiv.org/abs/1112.6209.

If you have any favourite programmin­g questions/software topics that you would like to discuss on this forum, please send them to me, along with your solutions and feedback, at sandyasm_AT_yahoo_DOT_com. Till we meet again next month, wishing all our readers a wonderful and productive month ahead!

 ??  ?? Sandya Mannarswam­y
Sandya Mannarswam­y

Newspapers in English

Newspapers from India