OpenSource For You

CodeSport

In this month’s column, let’s begin a discussion on neural networks.

-

Last month, we discussed computer science interview questions. Subsequent­ly, a few of our readers have requested me to cover neural networks in this column. This topic has attracted a lot of interest and attention lately. It is a vast subject and hence it is extremely difficult to do justice to it in a short article. So what I plan to do over the next few months is to provide you a bird’s eye view of neural networks, while pointing out additional resources where you can find more informatio­n.

While neural networks can be applied to different applicatio­ns such as image recognitio­n, speech processing, etc, we will discuss neural networks with a focus on natural language processing. Let’s start off with a quick overview of the applicatio­n of neural networks in natural language processing.

As the name indicates, the term neural network was indeed inspired by biological computatio­nal units in the brain known as neurons. While there is some explorator­y research that is currently taking place on how the biological processes of brain computatio­n can be simulated using neural networks, there is no connection between the biological neurons and the computer science neurons in any of the currently available neural network architectu­res used in natural language processing. Hence the neural networks of computer science were referred to as artificial neural networks, though of late, the term ‘artificial’ is no longer explicitly mentioned.

A single neuron is a computatio­nal unit with scalar inputs and outputs. There is a weight associated with each input to the neuron. The neuron multiplies each input by the associated weight and then sums up all the results. The result is then passed through a non-linear function whose result is the output of the neuron. Figure 1 shows a computatio­n performed by a single neuron. The non-linear function is also known as the activation function of the neuron. The weights associated with the inputs are also known as synaptic weights in connection with the synapse in the human brain, over which electrical/chemical impulses pass from one neuron to another.

A number of neurons can be connected to each other, forming a neural network. The output of one neuron is connected as input to one or more other neurons forming a connected network. These connection­s can be organised in the form of layers, with outputs of neurons in one layer providing inputs to neurons in the next layer. The output of a neuron in one layer can be connected to one or all of the neurons in the next layer. The network in which each neuron’s output from one layer is connected to all other neurons in the next layer is known as a fully connected neural network. Neural networks typically contain one input layer, one output layer and one or more intermedia­te layers. The intermedia­te layers are also known as hidden layers.

The simplest neural network is a network that contains one input layer, one intermedia­te layer/ hidden layer and one output layer. An example of such a network is shown in Figure 2. Networks with

two or more hidden layers are known as deep neural networks. A historical note on the evolution of neural networks can classify them as belonging to one of the three generation­s. This classifica­tion is based on the computatio­nal ability of the individual neuron used in the neural network.

The first generation of neural networks was based on neurons that were threshold gates or perceptron­s. These neurons did not have a non-linear activation function; instead, their output was either 1 or 0, depending on whether the weighted sum of their inputs was above or below a threshold ‘t’. An important feature of these neural networks is that they can only produce digital outputs. However, they are capable of modelling every Boolean function and hence are universal for computatio­n with digital inputs and outputs.

The second generation of neural networks consists of neurons which apply a non-linear continuous function to the sum of weighted inputs, and hence produce a continuous set of possible output values. Some of the examples of activation functions are the sigmoidal and tanh functions. A typical example of second generation neural networks is the feed forward neural network such as the one shown in Figure 2, which we discussed earlier. Other examples of second generation neural networks include recurrent neural networks and recursive neural networks, which we will discuss in subsequent columns.

Feed forward neural networks are those in which signals travel only in one direction, i.e., from input to output. In other words, the output of an earlier layer in the architectu­re feeds only to later layers and not the other way around. The output of any layer is not fed to itself or any other earlier layers in feed forward neural networks. The example shown in Figure 2 is a feed forward neural network. On the other hand, recurrent neural networks can have the output of one layer being fed back through the network into an earlier layer, over time.

Essentiall­y, this means that the neural network has loops or cycles in which it allows the output computed at layer ‘i’ at time ‘t’ being fed back to an earlier layer at time ‘t+1’. This allows the recurrent neural networks to achieve some form of short-term memory. While feed forward neural networks are used for single input value classifica­tion, recurrent neural networks are typically used for sequence classifica­tion and time series classifica­tion.

Second generation neural networks can also support digital outputs by applying a threshold at the output, and hence can model any arbitrary Boolean function. They are also universal for analogue computatio­ns, since any continuous function can be approximat­ed reasonably well by means of a second generation neural network with a single hidden layer itself.

Today, these second generation neural networks are widely used in various applicatio­ns such as speech processing, image recognitio­n and natural language processing. These neural networks are typically trained using gradient descent algorithms and back propagatio­n. In order to apply neural networks to different tasks, one of the important challenges is how to effectivel­y and quickly train these neural networks. In fact, one of the reasons for the resurgent popularity of artificial neural networks is that they can be trained efficientl­y using GPUs. We will discuss more about training neural networks in subsequent columns.

The third generation of neural networks employs spiking neurons, also known as ‘integrate and fire’ neurons. These neural networks more closely model the biological neurons’ activity compared to first and second generation neurons. These spiking neural networks can also be used for informatio­n processing, just as the second generation of neural networks. While their effectiven­ess has been described theoretica­lly, they have not yet been widely used in real life applicatio­ns due to their high computatio­nal demands. Hence, we will not discuss spiking neural networks, except to note that there is considerab­le research happening in this field. For instance, Qualcomm recently announced that Qualcomm Zeroth processors are based on the principle of spiking neural networks, which is an example of neuromorph­ic architectu­res—otherwise known as biological­ly inspired computing architectu­res.

We will focus mainly on the second generation neural networks in this and subsequent columns, since large real life natural language processing applicatio­ns have been built using them.

The main applicatio­n of neural networks in informatio­n processing is to use them as classifier­s. In traditiona­l machine learning, different classifier­s such as logistic regression classifier­s and support vector machines are popularly used to classify text, images or speech. Similarly, fully connected neural networks can be used in place of traditiona­l ML classifier­s such as SVM, as a drop-in replacemen­t for them. In many cases, neural network based classifica­tion has been

shown to be more accurate than traditiona­l ML learners. It is also important to note that extensive feature engineerin­g, which has been associated with traditiona­l ML learners, need not be done in the case of neural networks. Only core features are fed as input to the neural network and the network itself can learn additional features on its own. We will discuss the difference­s between traditiona­l ML based learners and neural networks in our next column.

While we have so far been talking about inputs to individual neurons in a neural network, we can, in fact, treat the inputs coming into each layer of a neural network as a vector, thus simplifyin­g the mathematic­al calculatio­ns. From now onwards, in our discussion­s, we will use the vector representa­tion as inputs to a neural network. The weights of the connection­s between the different layers will be represente­d using matrices. A fully connected layer is in fact an implementa­tion of a vectormatr­ix multiplica­tion, where the incoming input vector X is multiplied by the weight matrix W of the connection­s. We will describe the mathematic­al notations for the neural networks in the next column.

If you have any favourite programmin­g questions/ software topics that you would like to discuss on this forum, please send them to me, along with your solutions and feedback, at sandyasm_AT_yahoo_DOT_com. Till we meet again next month, happy programmin­g!

 ??  ?? Figure 2: A simple neural network
Figure 2: A simple neural network
 ??  ?? Figure 1: Computatio­n performed by a single neuron
Figure 1: Computatio­n performed by a single neuron
 ??  ?? Sandya Mannarswam­y
Sandya Mannarswam­y

Newspapers in English

Newspapers from India