Learning Python #5: Making a list
You can make lots of variables, but eventually, you need to get systematic about it. Darren Yates continues his Python series with a deep dive into lists.
So far in our Python series, every time we’ve created a variable for something, we’ve had to — to use an analogy — find a new ‘bucket’, give it a new name and then dump in whatever value we wanted stored. Do it often enough and not only does it get annoying, it can also become very confusing when you need lots of variables and have to keep thinking up new names — names like ‘newVar23’ aren’t particularly useful.
That’s where lists can work — not only do they make handling multiple related data values much easier, they can also perform all sorts of clever manipulation trickery. This month, we’re heading off in search of all things ‘lists’.
HOW LISTS WORK
Rather than having a new bucket for every variable you require, you can think of lists as customisable ice cube trays, where a variable actually contains multiple addressable or ‘indexed’ variables similar to the individual compartments of an ice cube tray. You can also add as many new ‘tray cubes’ as needed. In Python, you begin a list with a blank cube tray and fill cubes with values, as many as needed, each separated by a comma:
colours = [‘red’, ‘green’, ‘blue’, ‘black’, ‘white’]
Just as with a written list, each entity in that list is known as an ‘item’ and you can access items using their index, which indicates the position of each item in the list. A list’s index range always starts from zero and goes up to one less than the total number of items. In the above case, ‘colours[0]’ gives us ‘red’, ‘colours[3]’ returns ‘black’.
You can also alter the contents of a list by using the index. For example, to change ‘blue’ to ‘grey’, use the statement:
colours[2] = ‘grey’. This overwrites the ‘colours’ list at index 2 with the string ‘grey’. If we now print ‘colours[2]’, we get ‘grey’.
Lists can also store numbers — integer and floating-point. You just drop the quotation marks around each item, for example:
batting_ scores = [34, 39, 0, 99, 180, 15, 87, 23, 16, 2]
Using lists with numbers is really cool, as you now have all sorts of clever mathematical gymnastics functions at your disposal. The above is a list of cricket batting scores. Now the first thing you’d probably want to work out is the player’s batting average. Mathematically, we do this by adding up all of the scores and dividing the total by the number of scores. In Python, we can add up the scores in the list by using ‘iteration’, or looping through each item in the list. Here’s one way to do it:
total_ runs = 0 for score in batting_ scores: total_ runs += score batting_ average = total_ runs / len(batting_ scores)
print(‘Batting average:’, batting_ average)
The second line of this code might seem a bit confusing at first, so here’s what it does. The ‘for’ indicates the start of a for-loop, which we covered a couple of months ago. The ‘score’ initialises a new variable called ‘score’. The ‘in’ operator indicates we want each value from the variable following (in this case, the ‘batting_scores’ list) to be stored in the ‘score’ variable. Another way to write this up in pseudo-code is ‘for each score in the batting_scores list’.
We now run through or ‘iterate’ over the ‘batting_scores’ list, with each item in turn automatically placed into the variable ‘score’ and added to the variable ‘total_runs’. Once we’ve gone through the list, we then take the value in ‘total_runs’ and divide it by the length of the ‘batting_scores’ list. By ‘length’, we mean the number of items in the list. As we mentioned before, a list’s index range extends from zero to one less than the total number of items, but the ‘len()’ function always returns the actual number of items, which in this case is 10. The value of ‘total_runs’ divided by the number of items is loaded into the variable ‘batting_ average’, which we print using the print() statement. In this case, the player has a batting average of 49.5, which is excellent.
LIST FUNCTIONS
If you have a BASIC programming background, you might think lists sound similar to arrays. However, lists are far more versatile. Let’s say in that ‘batting_scores’ list, we wanted to find out the batter’s highest score. We can do that programmatically by using the max() function.
print(‘Highest score :’, max(batting_ scores))
We can also find the lowest score by using the min() function.
print(‘Lowest score :’, min(batting_ scores))
We can also remove scores, for example, removing all zero scores, using the remove() function. However, ‘remove(0)’ removes the first — and only first — zero score occurrence. If you know the index of a zero-score, the ‘del batting_scores(index)’ command also works, but you can’t always guarantee knowing the index. One simple way of removing all zero scores is to use a try-except/while-loop combination like this:
try: while(1): batting_ scores. remove(0) except ValueError: print(batting_ scores)
Using ‘while(1)’ is a common way of initialising a continuous loop, since ‘1’ is always ‘true’, and while that loop runs, we keep removing the first zero-score found in the batting_scores list. The ‘try-except’ method provides our way out of the loop — as soon as there are no longer any zero scores, Python throws up a ‘ValueError’ error, which we catch with the ‘except’ clause. This now becomes the start point for any code following.
LIST COMPREHENSION
A more efficient but more complex method to remove all zero scores is to
use ‘list comprehension’:
batting_ scores = [score for score in batting_ scores if score != 0]
Putting this another way with pseudo-code, ‘the batting_scores list contains a score for each score in the batting_scores list if that score does not equal zero’. List comprehension can be incredibly useful, but we won’t pursue it further for now.
LIST MATHEMATICAL FUNCTIONS
The thing with averages is they don’t always tell you the whole story of data — in fact, in this case, a batsman can get one high score to cover over a multitude of poor scores and still end up with a decent batting average. Another option is to look at the median or middle score, which might tell us something else.
To do this programmatically, we need two steps — first, we sort the list into ascending order, then we grab the middle score. If we have an odd number of list items, we can take that middle score straight away, but if we have an even number of list items, we must average the two middle ones. Here’s one way to do this:
batting_ scores = [34, 39, 0, 99, 180, 15, 87, 23, 16, 2]
batting_ scores.sort() print(batting_ scores) midIndex = int(len(batting_ scores)/2) if len(batting_ scores)%2 != 0:
median_ score = batting_ scores[midIndex] else:
median_ score = ( batting_ scores[midIndex] + batting_ scores[midIndex-1] ) / 2
print(‘Median score :’, median_ score)
After sorting the batting_scores list, we print the sorted list to the screen. Next, we find the middle index, which we do by dividing the length of the batting_ scores list by two and using the ‘int()’ function, storing the result in the variable ‘midIndex’. After that, we find out if the number of list items is odd or even — we can do this quickly using the modulus (%) operator. If dividing the length of the batting_scores list by two doesn’t return a remainder of zero, it means we have an odd number of items and we just take the middle index. If we get an even result, we get the middle index plus the one before it and average the two, storing the result each time in ‘median_score’.
OKAY, NOW THE EASY WAY
So now that we can calculate the average (or ‘mean’) and median manually, here’s how you can do it much more easily using Python’s statistics library. You use the ‘import’ statement and then call the statistics.median() and statistics.mean() functions, respectively.
import statistics batting_ scores = [34, 39, 0, 99, 180, 15, 87, 23, 16, 2]
print(‘Median score :’, statistics.median(batting_ scores))
print(‘Average score :’, statistics.average(batting_ scores))
What’s important from this is that the batter’s median is only 28.5, meaning half of the batter’s scores are 28.5 or less, showing a possible weakness early on in an innings.
MULTI-DIMENSIONAL LISTS
So far, the lists we’ve been dealing with are one-dimensional — in mathematics, they’re known as ‘vectors’. But you can also expand lists into multi-dimensional entities called ‘matrices’. In Python, you can think of them as ‘lists of lists’. For example, we can create a list of pizzas and their ingredients:
pizzas = [ [‘Margherita’,’tomato base’, ’oregano’,’cheese’,’garlic’], [‘Italian Classic’,’tomato base’,’oregano’,’anchovies’,’kalam ata olives’],
[‘Vegetarian’,’tomato bas e’,’onion’,’capsicum’,’mushroom’, ’kalamata olives’,’pineapple’,’semidried tomatos’]
]
Each row list represents a different pizza, with the first element the pizza’s name and the following elements in each row that pizza’s ingredients. To access the ‘Vegetarian’ pizza, for example, we’d use ‘pizzas[2]’ and to get the name of that pizza, ‘pizzas[2][0]’. Lists within multi-dimensional lists don’t have to be the same length, so to identify the length of an internal list, you can use the ‘len()’ function. For example, to find the length of the ‘Italian Classic’ pizza list, you’d use ‘len(pizzas[1])’.
To list a pizza’s ingredients, you can loop through from element [1] to element [len(pizzas[row])] using the range function like this:
for ingredients in range(1,le n(pizzas[pizzatype])): print(‘ * ‘, pizzas[pizzatype] [ingredients])
Here, ‘pizzatype’ is the list row number for the pizza you’re interested in. You can see this work in more detail inside the ‘pizzas.py’ app in this month’s source code pack.
MAKE AN ENIGMA MACHINE
Last month, we made a simple cipher tool for encoding messages using the trick of swapping characters based on ASCII codes — A becomes Z, B becomes Y and so on. To finish off multidimensional lists this month, we’ve cranked things up a gear and created a simplified version of the Enigma Cipher Machine used by the German armed forces during World War II. We haven’t implemented the full machine, but our version still uses the genuine wiring code from the first three rotors I, II and III (in that order), plus the ‘B’ reflector rotor used by the German army or ‘Wehrmacht’ machines. It also supports adjustable rotor start positioning.
If you code a message with a genuine Enigma machine using rotors I, II, III with no rotor ring offset or plugboard, you can decode it with our ‘EnigmaLite’ Python code.
Electrically, Enigma is little more than wires, switches and lightbulbs. But those electromechanical rotating rotor wheels that change position on each key press ensure pressing the same key any number of times will always get you a different encoded character each time.
HOW ENIGMALITE WORKS
To begin, you first type in the start positions of the three rotors, for example, ‘APC’. Next, you type in your plain text message, press enter and you’ll get the coded text. Reset the same rotor start positions, type in the coded text and you’ll see the original plain text. Pressing the Enter key at the start quits the app. Have a go at cracking this message:
APC MZABIM CU Y PTHQS XBPNLMMY HPXKQUCH QF EUPKI
The first three letters are the rotor start positions, the rest is the coded text. One trick the Germans used during the war to obfuscate message settings was to implement an encoded random message key. You start with three random letters, say ‘APC’ and choose a random message key, such as ‘MAG’. Set the rotor positions to ‘APC’, encode ‘MAG’ and you get ‘PWE’. You now set the rotor position to ‘MAG’ and encode the message. However, what you send is your initial ‘APC’ letters and the ‘PWE’ encoding, followed by the encoded message using the ‘MAG’ settings.
At the other end, the operator sets the initial ‘APC’ settings, encodes ‘PWE’ to get back ‘MAG’ and then uses ‘MAG’ as the rotor start settings to decode the message. To further confuse hackers, operators broke a message into fiveletter groups. Give this a go:
APC PWE MNEOA WASHU ZDDQJ QNNTA TZWDF ADVHY
Enigma coding today will stop most uninterested people in their tracks, but it wouldn’t take a cryptanalyst at the Australian Signals Directorate more than a cuppa to crack. By today’s standards, Enigma is ‘Cryptography 101’ and although it has its flaws (exploited by British mathematician Alan Turing during the war to crack Enigma messages), it’s still a fun exercise.
Lists are an important tool to have in your coding toolbox — they make sorting and performing mathemagical functions much easier. We’ve barely scratched the surface of what they can do, but now we have them, we can bring out lists whenever we like!