Understanding the basic principles of artificial intelligence
Large language models are trained on a large corpus of text in which some words are randomly replaced by blanks and the machine’s task is to fill in the blank. While trying to learn to predict the next word in the text correctly, the machine also learns s
ntelligence is the capacity of living beings to apply what they know to solve problems. ‘Artificial intelligence’ (AI) is intelligence in a machine. There is currently no one definition of AI. A simple place to begin is with AI’s material existence, as a machinesoftware combination.
ISteps of thinking
Consider linear classification, a simple exampleproblem: plot some points on a graph and find a way to draw a straight line through the graph such that it divides the points into two distinct groups.
Let’s make this more abstract. How would a machine differentiate between a cat and a dog in a picture?
Say you give the machine 1,000 pictures of cats and 1,000 pictures of dogs, and ask it to separate them. (This task is usually not given to a linear classifier but it illustrates a point.) You also equip the machine with tools — say, a camera and an app that can measure distances of different parts of an image, can analyse depth (using trigonometry), and can assess colours.
The machine can proceed by classifying the cat and dog pictures in different ways: by shape of the face, shape of the eyes, shape of the paw, body size, size of the tongue, fur colours, etc. Because the machine has the necessary computing power, it can plot these features two at a time on a graph. For example, the xaxis can represent the slope of the face and the yaxis the length of the paw. Or it can plot them three at a time in a 3D graph.
In all these cases, you watch until the machine has found a way to separate the pictures into two groups such that one group is mostly cats and the other is mostly dogs. At this point, the machine has been trained and you stop it.
Thinking, slow and fast
Sometimes it’s very easy to separate a given dataset into two pieces, when you can make reliable decisions with only one parameter (e.g. face shape).
Sometimes it’s more difficult — like asking the computer of a driverless car to determine whether it should apply the brake based on how fast a bird is flying in front of the car. The set of outcomes on one side of the line stand for ‘no’ and the outcomes on the other side stand for
‘yes’, and solving this could require hundreds of parameters.
They will also have to account for the context of decisionmaking. For example, if the person in the car is in a hurry to get to a hospital, is killing the bird okay? Or if the person in the car is not in a hurry, how quickly should the car brake? etc.
Sometimes it’s mindboggling. For example, ChatGPT is able to accept an input question from a user, make ‘sense’ of it, and answer accordingly. This ‘sense’ comes from its training corpus: the billions of sequences of words and sentences scraped from the internet.
ChatGPT learnt not by classifying words but by predicting the next word in a given sentence. In particular, large language models (LLMs) like ChatGPT generate the text response without classifying it or relating the question to similar examples. This is why generative AI is different from a classification model.
LLMs are trained on a large corpus of text in which some words are randomly replaced by blanks and the machine’s task is to fill in the blank. While trying to learn to predict the next word in the text correctly, the machine also learns something about the process that created the text: the real world. ChatGPT is so good because it uses more than 100 billion parameters.
The types of learning
Linear classification is a fairly simple algorithm in machinelearning. There are many algorithms that serve this purpose, and some of them are very complex. The three main ways in which ‘machines’ can be classified depending on the way they learn are supervised learning, unsupervised learning, and reinforcement learning.
In supervised learning, the data is labelled (for example, in a table, the row and column titles are provided and datatypes — numbers, verbs, names, etc. — are pointed out). In unsupervised learning, this information is withheld, forcing the machine to understand how the data can be organised and then solve a problem. In reinforcement learning, engineers score the machine’s output as it learns and solves problems on its own, and uses the scores to adjust its performance.
The way in which information flows inside the machine is governed by artificial neural networks (ANNs), the software that ‘animates’ the hardware.
The machine’s ‘brain’
An ANN comprises computing units, or nodes, connected together in such a way that the whole network learns the way an animal brain does. The nodes mimic neurons and the connections between nodes mimic synapses. Every ANN has two components — activation functions and weights.
The activation function is an algorithm that runs at a node. It accepts inputs from other nodes to which it is connected and computes an output. The inputs and outputs are in the form of real numbers. The weight is the ‘importance’ an activation function gives to an input. Say there are different nodes to estimate the fur colour, tail length, and dental profile in a given photo of a cat or a dog. All these nodes provide their outputs as inputs to a node responsible for separating ‘cat’ from ‘dog’. This way, the nodes can be ‘taught’ to adjust their outcomes by adjusting the relative weights they assign to different inputs.
While nodes are computing units, the ANN itself is not a physical entity. It is mathematical. A node is the ‘site’ of a mathematical function. Put differently, the ANN is like an algorithm that passes information from one activation function to the next in a specific order.
Nvidia’s dominance
A graphic processing unit (GPU) is the physical processor that ‘runs’ the ANN. It was originally developed to render graphics for video games. It was better at this task than other processors at the time because it could run computing tasks in parallel. It has been widely adopted since as the basic computing unit for ANNs for the same feature.
The company Nvidia has emerged as a technology giant because of its production of GPUs. Its valuation was the fastest in history to go from $1 trillion to $2 trillion (in nine months). Every other company that has been building large AI models is using Nvidia’s GPUbased chips to do so. In a 2023 analysis, financial services provider Seeking Alpha wrote Nvidia’s overwhelming market share has stoked “resistance” in three ways: competitors are trying to develop to nonGPU hardware; researchers are building smaller ANNs that require less resources; and developers are building new software to sidestep dependency on specific hardware.
With inputs from Viraj Kulkarni.