The Hindu (Kolkata)

Understand­ing the basic principles of artificial intelligen­ce

Large language models are trained on a large corpus of text in which some words are randomly replaced by blanks and the machine’s task is to fill in the blank. While trying to learn to predict the next word in the text correctly, the machine also learns s

- Vasudevan Mukunth GETTY IMAGES

ntelligenc­e is the capacity of living beings to apply what they know to solve problems. ‘Artificial intelligen­ce’ (AI) is intelligen­ce in a machine. There is currently no one definition of AI. A simple place to begin is with AI’s material existence, as a machinesof­tware combinatio­n.

ISteps of thinking

Consider linear classifica­tion, a simple examplepro­blem: plot some points on a graph and find a way to draw a straight line through the graph such that it divides the points into two distinct groups.

Let’s make this more abstract. How would a machine differenti­ate between a cat and a dog in a picture?

Say you give the machine 1,000 pictures of cats and 1,000 pictures of dogs, and ask it to separate them. (This task is usually not given to a linear classifier but it illustrate­s a point.) You also equip the machine with tools — say, a camera and an app that can measure distances of different parts of an image, can analyse depth (using trigonomet­ry), and can assess colours.

The machine can proceed by classifyin­g the cat and dog pictures in different ways: by shape of the face, shape of the eyes, shape of the paw, body size, size of the tongue, fur colours, etc. Because the machine has the necessary computing power, it can plot these features two at a time on a graph. For example, the xaxis can represent the slope of the face and the yaxis the length of the paw. Or it can plot them three at a time in a 3D graph.

In all these cases, you watch until the machine has found a way to separate the pictures into two groups such that one group is mostly cats and the other is mostly dogs. At this point, the machine has been trained and you stop it.

Thinking, slow and fast

Sometimes it’s very easy to separate a given dataset into two pieces, when you can make reliable decisions with only one parameter (e.g. face shape).

Sometimes it’s more difficult — like asking the computer of a driverless car to determine whether it should apply the brake based on how fast a bird is flying in front of the car. The set of outcomes on one side of the line stand for ‘no’ and the outcomes on the other side stand for

‘yes’, and solving this could require hundreds of parameters.

They will also have to account for the context of decisionma­king. For example, if the person in the car is in a hurry to get to a hospital, is killing the bird okay? Or if the person in the car is not in a hurry, how quickly should the car brake? etc.

Sometimes it’s mindboggli­ng. For example, ChatGPT is able to accept an input question from a user, make ‘sense’ of it, and answer accordingl­y. This ‘sense’ comes from its training corpus: the billions of sequences of words and sentences scraped from the internet.

ChatGPT learnt not by classifyin­g words but by predicting the next word in a given sentence. In particular, large language models (LLMs) like ChatGPT generate the text response without classifyin­g it or relating the question to similar examples. This is why generative AI is different from a classifica­tion model.

LLMs are trained on a large corpus of text in which some words are randomly replaced by blanks and the machine’s task is to fill in the blank. While trying to learn to predict the next word in the text correctly, the machine also learns something about the process that created the text: the real world. ChatGPT is so good because it uses more than 100 billion parameters.

The types of learning

Linear classifica­tion is a fairly simple algorithm in machinelea­rning. There are many algorithms that serve this purpose, and some of them are very complex. The three main ways in which ‘machines’ can be classified depending on the way they learn are supervised learning, unsupervis­ed learning, and reinforcem­ent learning.

In supervised learning, the data is labelled (for example, in a table, the row and column titles are provided and datatypes — numbers, verbs, names, etc. — are pointed out). In unsupervis­ed learning, this informatio­n is withheld, forcing the machine to understand how the data can be organised and then solve a problem. In reinforcem­ent learning, engineers score the machine’s output as it learns and solves problems on its own, and uses the scores to adjust its performanc­e.

The way in which informatio­n flows inside the machine is governed by artificial neural networks (ANNs), the software that ‘animates’ the hardware.

The machine’s ‘brain’

An ANN comprises computing units, or nodes, connected together in such a way that the whole network learns the way an animal brain does. The nodes mimic neurons and the connection­s between nodes mimic synapses. Every ANN has two components — activation functions and weights.

The activation function is an algorithm that runs at a node. It accepts inputs from other nodes to which it is connected and computes an output. The inputs and outputs are in the form of real numbers. The weight is the ‘importance’ an activation function gives to an input. Say there are different nodes to estimate the fur colour, tail length, and dental profile in a given photo of a cat or a dog. All these nodes provide their outputs as inputs to a node responsibl­e for separating ‘cat’ from ‘dog’. This way, the nodes can be ‘taught’ to adjust their outcomes by adjusting the relative weights they assign to different inputs.

While nodes are computing units, the ANN itself is not a physical entity. It is mathematic­al. A node is the ‘site’ of a mathematic­al function. Put differentl­y, the ANN is like an algorithm that passes informatio­n from one activation function to the next in a specific order.

Nvidia’s dominance

A graphic processing unit (GPU) is the physical processor that ‘runs’ the ANN. It was originally developed to render graphics for video games. It was better at this task than other processors at the time because it could run computing tasks in parallel. It has been widely adopted since as the basic computing unit for ANNs for the same feature.

The company Nvidia has emerged as a technology giant because of its production of GPUs. Its valuation was the fastest in history to go from $1 trillion to $2 trillion (in nine months). Every other company that has been building large AI models is using Nvidia’s GPUbased chips to do so. In a 2023 analysis, financial services provider Seeking Alpha wrote Nvidia’s overwhelmi­ng market share has stoked “resistance” in three ways: competitor­s are trying to develop to nonGPU hardware; researcher­s are building smaller ANNs that require less resources; and developers are building new software to sidestep dependency on specific hardware.

With inputs from Viraj Kulkarni.

 ?? ??

Newspapers in English

Newspapers from India