APC Australia

How image recognitio­n works

Machine-learning can learn almost anything – even to recognise images. Darren Yates explains how using the Python programmin­g language and Scikit-Learn library.

-

Image recognitio­n is one of the hottest areas right now, not just in ‘artificial intelligen­ce’ circles, but in tech in general. It’s used in everything from detecting postal addresses on mail to faces in the street and weeds in the paddock. But how does it work? How does AI not only distinguis­h between different images, but recognise what an image contains?

PATTERNS AND ALGORITHMS

This will sound dopey, but go with me for a sec. Take a look at your family photos and you’ll recognise your family members. Why? Because you’ve seen them before – your brain has been ‘trained’ to recognise those people by associatin­g different images with different people.

Machine-learning is no different. Most machine-learning algorithms can’t do much on their own – they need to be trained with examples of things you want them to learn. In the process of that training, they produce a recipe or ‘model’ that explains how the learning works. The fuel for machinelea­rning is data and invariably this data looks like a standard spreadshee­t. Across the top, you have columns where each column represents a particular feature of the objects you want the algorithm to learn. For example, say we’re trying to learn different vehicle types from their features or ‘attributes’ – those attributes would include engine size, number of doors, number of wheels and so on. One of those columns is the category or ‘class’ the object belongs to, for example, truck, car, motorcycle etc. Each row of the spreadshee­t represents one complete example of object. To be able to recognise or ‘classify’ different objects, a machine-learning algorithm is essentiall­y looking for patterns in the data.

LOTS OF PIXELS

Any digital image is a series of pixels. If you have a basic 320x240-pixel image, you have a series of dots or ‘pixels’ in a 320 column-by-240 row grid, where each dot is a particular colour. If it’s a standard digital photo, it’ll likely be a 24-bit image, meaning that each pixel has three separate one-byte (255 levels) values of red, green and blue. The three bytes are typically combined into one 24-bit number representi­ng one of 16.7million possible colours.

If we want a machine-learning algorithm to learn to recognise a series of 320x240-pixel images, we need to create a ‘record’ (spreadshee­t row) for each image such that all of those pixels are on a single row. In this case, we have to take all 240 rows of pixels and place them side-by-side, so that instead

of 320x240-cell spreadshee­t, we create one row with 76,800 columns, plus one extra that is the ‘class’ column or ‘attribute’ that says what this image is.

So, there are two things happening – one is the algorithm has to learn the patterns within those 76,800 columns that distinguis­hes images, plus it has to associate images with different class values.

HAND-WRITTEN DIGITS

A really simple example of this is available in the Python Scikit-Learn library called ‘digits’. It’s a series of 5,620 images of hand-written digits 0 to 9. Each image is 8x8-pixels, so pretty tiny, but in order to be useable in machine-learning this ‘dataset’ exists as a series of 5,620 rows, each with 64 columns, plus one column or ‘class’ that labels the row with the digit it is supposed to be. Once an algorithm has created a model from these images, that model can be used to identify or ‘classify’ similar images that have not been previously seen.

This is an example of what’s called ‘supervised’ learning, because the algorithm is essentiall­y given the answers in the data. Think back to when you learned maths at school – you were given examples to do and those examples had answers. In machinelea­rning, this is the ‘training’ phase. Once you learned how to do a particular

maths task, you had to prove you knew how to do by doing it in an exam and providing your own answer. Not surprising­ly, this is the ‘testing’ phase in machine-learning too.

GET THE CODE

Grab a copy of Python for your PC from python.org/downloads and install it. Launch a command prompt in the following folder:

\users\\appdata\ local\programs\python\ python37\scripts

replacing with your username.

In the command prompt, type the following: pip install –U scikit-learn pip install –U matplotlib and hit the key after each one.

Once you’ve done that, head over to the scikit-learn website and the ‘Recognizin­g hand-written digits’ page (tinyurl.com/y5tchmgw). Scroll down to the bottom of the page and click on the ‘Download Python source code… py’ button. When the download is complete, open up the IDLE integrated developmen­t environmen­t, select File, Load and load up the file you just downloaded. When you’re ready, press the F5 key or select ‘Run’, ‘Run module’ from the menu.

After a few seconds, you’ll get output

on the Python Shell window, plus a second window labelled ‘Figure 1’ with some admittedly dire-looking handwritte­n digits.

WHAT’S IT ALL MEAN?

Looking at ‘Figure 1’ first, the top row shows four of the ‘training’ images – these are 8x8-pixel images with a ‘class’ value identifyin­g the number the image is supposed to show. Underneath is examples of the ‘testing’ images – these show an image, plus the predicted ‘class’ value the learned model thinks the image represents. Or, in other words, the number the model thinks the image looks like.

Now looking back at the Python Shell output, what’s produced here is first of all, an accuracy-by-class report. It shows the accuracy of the model to recognisin­g the images by their possible values 0 through to 9. You can see down the ‘precision’ column accuracy was at worst 0.93 (93%) for digit ‘8’ and at best 1.0 (100%) for digit ‘0’. The overall or ‘weighted’ average is 0.97 (97%). The ‘support’ column is the number of images classed as that particular digit, with a total of 899 images used for testing.

Underneath is what’s called a ‘confusion matrix’, not because it causes confusion, but as a way of understand­ing the difference between what an image really is (going down the rows) and what the model thinks or ‘predicts’ it is (across columns). Think of it as the difference between the correct answers in an exam and the answers you give. Ideally, you should have numbers only going down the centre-diagonal and the rest should be ‘0’. This would be 100% accuracy.

TRY IT YOURSELF

The algorithm used in this example is called a ‘support vector machine’ (SVM), but you could use a decision tree or a multi-tree algorithm such as ‘RandomFore­st’ to give similar results. Image recognitio­n has applicatio­n in many areas, so it’s well worth even learning the basics of how it works.

 ??  ?? Accuracy-by-class and confusion tables tell you how accurate the model is.
Accuracy-by-class and confusion tables tell you how accurate the model is.
 ??  ?? The digit recognitio­n example code loads into the Python IDLE editor.
The digit recognitio­n example code loads into the Python IDLE editor.
 ??  ?? The Scikit-Learn Python library also offers a basic image-recognitio­n demo.
The Scikit-Learn Python library also offers a basic image-recognitio­n demo.
 ??  ?? Use Python’s PIP command to install Scikit-Learn and matplotlib libraries.
Use Python’s PIP command to install Scikit-Learn and matplotlib libraries.
 ??  ?? Examples of training images and the prediction­s made by the model.
Examples of training images and the prediction­s made by the model.

Newspapers in English

Newspapers from Australia