APC Australia

The power of strings and dictionari­es

Learn how to use Python’s powerful functions for manipulati­ng and storing strings. Darren Yates explains.

-

It’s a common fact that, for various reasons, not everyone loves numbers (yes, strange, I know) but numbers on their own aren’t nearly as interestin­g as the functions you can perform with them. For example, we all learned to add, subtract, multiply and divide numbers in school — but functions with letters or characters? We write them and spell them, but that’s about it. Yet when it comes to coding, letters aren’t just for displaying messages on the screen — there’s a swag of functions you can perform with these characters, especially when joined together to form ‘strings’. This month, we’ll look at some powerful Python string-manipulati­on functions you should have in your coding toolbox, plus delve into the advanced data structure of dictionari­es.

WRITE A LETTER

Let’s start with the obvious — collection­s of letters grouped together in a meaningful fashion are called ‘words’, words grouped likewise, we call ‘sentences’. Yes, welcome to English 101. But in coding, collection­s of alphanumer­ic characters are also called ‘strings’. What’s more, you can access each individual character in a string by its position in that string. For example:

title = ‘APC magazine’

...describes the characters stored as a string in a variable called ‘ title’. Among the many things we can do with this is find the number of characters in or ‘ length’ of the string. Run the code in the IDLE Shell:

print (len(title))

...and you’ll get the answer ‘12’. You can also address the characters of a string individual­ly:

print (title[0])

...returns ‘A’. The method is similar to accessing elements of an array — you use the square brackets to indicate the character position you’re after. Just remember, string indexing starts at zero.

COUNTING LETTERS

Grab this month’s source code pack from our website ( apcmag.com/ magstuff), launch the IDLE editor ( part of Python 3.6 from www.python.org) and open ‘countlette­rs.py’. This code counts up the frequency of letters in a string. We start by assigning the string-of-interest to the variable ‘newString’ and use the function ‘String.lower()’ to return the lowercase version and store it back in the ‘newString’ variable. Next, we create another string variable, this time with all the lower-case letters of the alphabet and store them in variable ‘alphabet’. That’s followed by what may appear to be an odd-looking line, but it’s really just a quick way of initialisi­ng a list called ‘charCount’ with 26 cells, all having the value ‘0’, the ‘*’ character acting as a repetition operator. We’ll use this shortly to store

the count of each letter in the ‘newString’ variable.

ITERATING OVER A STRING

For-loops are fantastic things — and they’re brilliant for looping through strings. Commonly, for-loops are used to loop or ‘iterate’ over a numeric range. However, here, we’re iterating over the objects themselves. We start by coding the line: for letter in newString: In this case, ‘ letter’ is the creation or ‘instantiat­ion’ of a new variable called ‘letter’ — we’re creating it and using it at the same time. In this case, Python knows from the variable type ‘newString’ is that we want it to fetch each character in that variable in turn and put it in the variable ‘ letter’. You could also read it as ‘for each letter in the newString variable’.

That’s followed by what we want Python to do with each character. Here, we use an if-clause and the ‘in’ operator again, this time to see if the character is found in the ‘alphabet’ string (you can use the ‘in’ operator with if-clauses and for-loops). If the character is found in the alphabet variable, we then find its ASCII value using the ‘ord()’ function, subtract 97 from it, make the result the index for the charCount list and increment the value stored at that index.

Why? The ASCII (American Standard Code for Informatio­n Interchang­e) code assigns a numeric code to a set of standard characters. There are 255 different codes, including codes for upper- and lowercase letters, as well as special characters. For example, ‘a’ is 97, ‘ b’ is 98 and so on. Python’s ord() function returns the ASCII numeric code for any character. However, by subtractin­g 97 from the ord() function result, we’re re-aligning the value so that it matches one of the 26 values in the charCount list.

Here’s an example — the first letter in the example ‘newString’ is ‘ T’, which becomes ‘ t’ after the lower() function. As we go through the initial for-loop for the first time, the if-clause checks to see if ‘ t’ is in the ‘alphabet’ variable string, which it is. The function ord(‘t’) returns the value ‘116’, we then subtract 97 from it and get 19. This becomes the index of the charCount list and we add one to the 19th value of charCount, keeping score of the number of ‘ t’s found in ‘newString’.

Another example, say the first letter was ‘a’, ord(‘a’) would return 97. Subtract 97 from 97 and you get zero, so the letter ‘a’ correspond­s to the first element of the charCount list (list indexes start at zero).

The second for-loop in the code iterates over the ‘alphabet’ string, printing out each letter and its correspond­ing frequency recorded from the newString variable.

A MORE EFFICIENT VERSION

Now I do tend carry on a bit in this masterclas­s about how there’s always more than one way to code a solution to something — and this is a perfect example. Run the ‘countlette­rs.py’ code we just spoke about and it works. However, it does require you to understand how the ASCII code works, which makes it pretty kludgy.

Load up ‘countchars.py’ and you’ll find a more elegant solution. We start by importing the ‘Counter’ subclass from Python’s ‘collection­s’ module. In this version of our characterc­ounting app, the ‘newString’ and ‘alphabet’ strings are the same as before, but the charCount variable, this time, contains the results of the Counter function: charCount = Counter(letter for letter in newString if letter in alphabet) What’s happening here is that we’re asking Python to use the Counter

“Letters aren’t just for displaying messages on the screen — there’s a swag of functions you can perform.”

function to count up each ‘ letter’ that appears in ‘newString’ if that ‘ letter’ also appears in the ‘alphabet’ string. The beauty of this method is you no longer need to know anything about ASCII codes and Python takes care of everything, thanks to the Counter function.

One of Python’s strength is the mass of modules available that add in functions for seemingly almost anything you can imagine.

DICTIONARI­ES

The only real difference between the two versions is that, instead of finishing with the list of values we had before, the charCount variable in this version now references a ‘dictionary’. Dictionari­es are really useful data structures that work, more or less, like a real dictionary.

Look up any dictionary and you see a list of words, each followed by a definition that explains the word. In the world of Python, dictionari­es work in a similar way, except that the ‘ words’ are called ‘ keys’ and the definition­s are ‘ values’. So putting it more succinctly, a dictionary is a list of key-value pairs.

Run the ‘countchars.py’ code and, this time, you’ll see the output starts with ‘Counter(‘, but is followed by a list of key-value pairs inside a set of curly-brackets {}.

When you look up a dictionary, you first search for the word, then read the definition. To get the value of a particular ‘ key’ in a Python dictionary, you simply call the dictionary and use the ‘ key’ like an index for a string or list.

For example, we can find out the total count of the letter ‘o’ by using: print (charCount[‘o’])

And that gives us ‘4’. If we replace ‘o’ with ‘e’, we get the result ‘ 3’. We’ll come back to dictionari­es in a moment.

STRING FUNCTIONS

The chances are pretty good that, whenever you code apps, you’ll need to provide some form of text output back to the user. However, you’re just as likely to have to test string responses from the user as well.

USER-INPUT VALIDATION

This is often called ‘user input validation’ and we’ll look at a basic example. Say you’re writing an app that asks the user to enter in their postcode. Postcodes in Australia are four-digit numbers, but what if the user enters in an alphanumer­ic postcode — how do you handle it?

Open up the ‘postcodes.py’ code. Here, we begin by using the standard ‘input’ function, prompting the user to enter a four-digit postcode and storing the result in the ‘userInput’ variable.

Next, we enter a while-loop with the condition that if ever the value of ‘userInput’ does not contain numericonl­y characters and is not exactly four digits in length (that’s what the ‘isdigit()’ and ‘ len()’ methods check), print an error message and get the user to try again. Once the user gets it right, we print the result.

The isdigit() method is just one of a number available for testing the contents of strings. The others are ‘isalpha()’ to check for all alphabet characters, ‘isupper()’ for uppercase, ‘islower()’ for lowercase and ‘isalnum()’ for alphanumer­ic. You can use this last one, for example, to check for extended character codes like backslashe­s ‘\’ or colons ‘:’ that might indicate someone trying to hack your code.

SLICE AND DICE

You’ll also find occasions where you need to split the user’s input into various fields, for example, when entering their name. Strings allow

you to do that using the split() function. Here’s a simple example: name = ‘Darren Yates’ name_fields = name.split(‘ ‘) print(‘First name:’, name_ fields[0])

print(‘Last name:’, name_ fields[1])

The split() function splits the words of the ‘name’ string using the space ‘ ‘ character as the delimiter or separator and auto-assigns the individual words into a list we’ve called ‘name_fields’. We then use indexes to grab the individual fields and print them to screen.

USING DICTIONARI­ES

Dictionari­es are great for when you want to store data that is attached to an entity as a distinct pair — the key-value pairs we were on about earlier. It could be anything from pizza prices to town postcodes. To create a predefined or ‘ literal’ dictionary, you enter your data as pairs with a colon ‘:’ separating the key and value of each pair, then separating the pairs with commas, like this:

postcodes = {‘Bathurst’: 2790, ‘Parramatta’: 2150, ‘Parkes’: 2870, ‘Broken Hill’: 2880, ‘Cootamundr­a’: 2590, ‘Junee’: 2663}

Then it’s just a case of calling the ‘postcodes’ dictionary with an appropriat­e key: print (postcodes[‘Parkes’])

...should give back ‘2870’, which in this case is an integer because that’s how it was entered into the dictionary. The rule is you can use just about anything as a value, but the keys must be basic data or ‘immutable’ types — that means integers and strings mostly — and strings are casesensit­ive, so ‘ Parkes’ is different to ‘ parkes’.

Dictionari­es are also like lists in that you can add and delete key-value pairs. However, unlike lists, you don’t need to ‘append’. You just use your desired key as the index and assign it the value: postcodes[‘Goulburn’] = 2580

You can also use this method to change a key-value pair — just use the same key, assign it a new value and it’ll essentiall­y overwrite the old one.

To delete a key-value pair from a dictionary, you use the ‘del’ method and the key of the dictionary entry you want to ditch: del postcodes[‘Parramatta’]

If the key exists, the pair will be removed and the rest will shuffle up into place, just like a list. If it doesn’t exist, the statement will throw a ‘KeyError’ exception, which you can handle using a try-except clause. You can even test for keys in code using the ‘in’ operator, for example: town_ postcode = “Broken Hill” in postcodes

...returns the Boolean value ‘ true’ to ‘town_ postcode’ because the postcodes dictionary does have a key called ‘Broken Hill’. However: town_ postcode = “Parramatta” in postcodes

...returns ‘false’ because we just deleted it from the dictionary and it no longer exists.

SIMPLE DICTIONARY FUNCTIONS

There’s also a bunch of methods available for manipulati­ng dictionari­es — two of the most common are the ‘ keys’ and ‘items’ methods. To get a list of the keys in a dictionary, you can just iterate over the dictionary keys using: for town in postcodes. keys(): print(town)

And to get a list of values in a dictionary, you do the same but with values instead: for postcode in postcodes. values(): print(postcode)

Load up the ‘postdict.py’ source code and you’ll find these examples, ready to run. If it helps, think of dictionari­es as customised lists where indexes don’t have to be numeric, a bit like a simple database.

GIVE IT A GO

The great thing about Python is its uncluttere­d nature, which helps make it one of the easiest languages to pick up, even if you’ve never coded before. That’s why it’s used in everything from games and Linux scripting to websites and data science.

We’ll be back again next month. See you then.

 ??  ?? Dictionari­es are great for data that are linked.
Dictionari­es are great for data that are linked.
 ??  ?? Dictionari­es are highly manipulabl­e and work like lists.
Dictionari­es are highly manipulabl­e and work like lists.
 ??  ?? The ‘len()’ function returns the size of a string.
The ‘len()’ function returns the size of a string.
 ??  ?? Using the ‘Counter’ function creates a more elegant code solution.
Using the ‘Counter’ function creates a more elegant code solution.
 ??  ?? First version of our character-counting code returns a list of results.
First version of our character-counting code returns a list of results.
 ??  ?? Using the ‘ord()’ function to create the index to store letter counts.
Using the ‘ord()’ function to create the index to store letter counts.
 ??  ?? Another example of dictionary functions using something closer to home.
Another example of dictionary functions using something closer to home.
 ??  ?? Using the Counter function returns character counts as a dictionary.
Using the Counter function returns character counts as a dictionary.
 ??  ?? String testing functions make testing user input much simpler.
String testing functions make testing user input much simpler.
 ??  ?? A while-loop allows your code to wait until the user gets it right.
A while-loop allows your code to wait until the user gets it right.

Newspapers in English

Newspapers from Australia