Create a digital assistant; calibrate your monitor; speed up boot times in Win 10; add movement to photos; organize your photo library.
YOU’LL NEED THIS
RASPBERRY PI The credit-card-sized computer
costs as little as $35.
IN THIS TUTORIAL, WE’RE LOOKING at what you need to make your own voice control software for your Raspberry Pi projects. If you want a virtual assistant, one project is the Jasper system ( http://jasperproject.github.io). The documentation on the main Jasper website has a description of the hardware to attach to your Raspberry Pi, along with a full set of instructions for installation and configuration. There’s a set of standard modules included to allow interaction with various services—use the time, Gmail, or even the joke module—and there are also third-party modules for you to access. There is even a developer API and documentation to help you add your own functionality to Jasper.
1 VOICE CONTROL Everyone who has watched the IronMan movies has probably dreamed of having their own artificially intelligent computer system to do their bidding. While J.A.R.V.I.S. has massive amounts of computing power behind him, you can construct the front end with very modest resources. With a Raspberry Pi and the Python programing language, you can build your own personal digital assistant that can be used as a front end to whatever massive supercomputing resources you use in your day-to-day life as a playboy philanthropist genius. Over the next few pages, we’ll go over the basics that you need to know, so that by the end of this tutorial, you should be able to build your own rudimentary, customized agent.
>> The first step to interacting with the humans around us is to listen for verbal commands, so that we know what we need to process. You have several options available to handle this task. To keep things simple, we are dealing only with devices that are plugged into one of your Raspberry Pi’s USB ports. With that stipulation, you can talk directly with the USB device at the lowest level. This might be necessary if you are trying to use something that is rather unusual to do the listening, but you would probably be better off using something a bit more commonplace. In this case, you can use the Python module PyAudio. PyAudio provides a Python wrapper around the low-level cross-platform library PortAudio. Assuming you are using something like Raspbian for your Linux distribution, you can easily install the required software with the command:
sudo apt-get install python-pyaudio
>> If you need the latest version, you can always grab and build it from source. PyAudio provides the functionality to read in audio data from a microphone, along with the ability to play audio data out to headphones or speakers. So, we’re using it as our main form of interaction with the computer.
>> The first step is to be able to read in some audio commands from the humans who happen to be nearby. You need to import the PyAudio module before you can start interacting with the microphone. The way PyAudio works is similar to working with files, so it should seem familiar to most programers. You start by creating a new PyAudio object with the statement p = pyaudio. PyAudio() . You can then open an input stream with the function p.open(…) , with several parameters. You can set the data format for the recording; in the example code, we used format=pyaudio.
paInt16 . You can set the rate in hertz for sampling. For example, we are using rate=44100 , which is the standard 44.1KHz sampling rate. You also need to say how big a buffer to use for the recording— we used frames_per_buffer=1024 . Because we want to record, you need to use input= true . The last parameter is to select the number of channels to record on—in this case, we’re using
channels=2 . Now that the stream has been opened, you can start to read from it. You need to read the audio data in using the same chunk size that you used when you created the stream—it looks like stream.read(1024) . You can then simply loop and read until you are done. There are then two commands to shut down the input stream. You need to call stream.stop_stream() and then stream. close() . If you are completely done, you can now call
p.terminate() to shut down the connection to the audio devices on your Raspberry Pi.
>> The next step is to be able to send audio output so that J.A.R.V.I.S. can talk to you as well. For this, you can use PyAudio, so we don’t have to look at another Python module. To make things simple, let’s say that you have a WAV file that you want to play. You can use the “wave” Python module to load it. Once again, you create a PyAudio object, and open a stream. The parameter “output” should be set to true. The format, the number of channels, and the rate is all information that is derived from the audio data stored in your WAV file. To actually hear the audio, you can simply loop through, reading one chunk of data from the WAV file at a time, and immediately writing out to the PyAudio stream. Once you’re done, you can stop the stream and close it, as you did above.
>> In both of the above cases, the functions block when you call them until they have completed. So, what are the options if you want to be able to do processing while you are either recording audio or outputting audio? There are non-blocking versions that take a callback function as an extra parameter, called stream_callback . This callback function takes four parameters, named in_data ,
frame_count , time_info , and status . The in_data parameter contains the recorded audio if input is true. The callback function needs to return a tuple with the values
out_data and flag — out_data contains the data to be outputted if output is true in the call to the function open. If the input is true instead, then out_data should be equal to None. The flag can be any of paContinue , paComplete , or
paAbort —with obvious meanings. One thing to be aware of is that you cannot call, read, or write functions when you wish to use a callback function. Once the stream is opened, you simply call the function stream.start_stream() . This starts a separate thread to handle this stream processing. You can use stream.is_active() to check on the current
status. Once the stream processing is done, you can call stream. stop_stream() to stop the secondary thread.
2 OFFLOAD TASKS You can offload the audio data processing to Google, accessing the API directly over HTTP by posting your audio data to the appropriate URL. First, install the Python module SpeechRecognition: pip install SpeechRecognition
>> Now create an instance of the Recognizer object. A Helper object, called WavFile, takes an audio file and prepares it for use by the Google API, then processes it with the record() function, and hands this processed audio to the function recognize() . When it returns, you get a list of pairs of possible texts, along with a percentage confidence level for each possible text decoding. Be aware that this module uses an unofficial API key to do its decoding, so for anything more than small personal testing, you should request your own API key.
>> We’ve seen how we can make our Raspberry Pi listen to the world around it, now we need to try to make sense of what it might have just heard. In general, this is called speech recognition, and it is a very large and active area of research. Every major smartphone operating system has applications trying to take advantage of this mode of human interaction. There are also several different Python modules available that can perform this speech-to-text (STT) translation step. In this part of our project, we’re looking at using Pocket Sphinx to do all the heavy lifting. Sphinx was developed by Carnegie Mellon University, and is licensed under a BSD license, so you are free to add any extra functionality that you may need for specific tasks. Because of the activity in this field, it is well worth your time to keep track of all the updates and performance improvements.
While you can download the source code for all of these modules, and build it all from scratch, we’re assuming that you are using one of the Debian-based distributions, such as Raspbian. For these, you can simply use the following to get all of the required files for the engine:
sudo apt-get install python-pocketsphinx
You also need audio model files and language model files in order to get a translation in your language of choice. To get the files needed for English, you can install the packages: sudo apt-get install pocketsphinx-hmm-wsj1 pocketsphinxlm-wsj
You may need to go outside the regular package management system if you want to process other languages. Then you can simply start writing and using your code straight away. To begin using these modules, you need to import both Pocket Sphinx and Sphinx Base with the following commands:
These modules are actually Python wrappers around the C code that handles the actual computational work of translating sounds to text. The most basic workflow involves instantiating a Decoder object from the Pocket Sphinx module. The Decoder object takes several input parameters to define the language files it is allowed to use. These include hmm , lm , and dict . If you use the above packages used to handle English, the files you need are in the directories “/usr/share/pocketsphinx/model/hmm/wsj1” and “/usr/share/pocketsphinx/model/lm/wsj.” If you don’t set these parameters, it tries to use sensible defaults, which usually work fine for English language speech. This newly created Decoder object can now be given WAV files with data to process. If you remember, we previously saved the recorded speech as a WAV file. In order to have this audio recorded in the correct format, you need to edit the code from the first step, and ensure that you are recording in mono (using one channel, for example), and recording at 16kHz with 16-bit quality. To read it properly, you can use a file object, and load it as a binary file with read permissions. WAV files have a small piece of header data at the beginning of the file that you need to jump over. This is done by using the seek function to jump over the first 44 bytes. Now that the file pointer is in the correct position, you can hand the file object to the Decoder object’s decode_raw() function. It then goes off and does a bunch of data crunching to try to figure out what was said. To get the results, you use the get_hyp() function call. You get a list with three elements from this function: a string containing the best guess at the spoken text, a string containing the utterance ID, and a number containing the score for this guess.
>> So far, we’ve looked at how to use the generic language and audio models for a particular language. But Pocket Sphinx is a research-level language system, so it has tools available to enable you to build your own models [ Image A]. In this way, you can train your code to understand your particular voice, with all its peculiarities and accents. This is a long process, so most people aren’t interested in doing something so intensive. However, if you are interested, there is information available at the main website ( http://cmusphinx.sourceforge.net). You can also define your own models and grammars to tell Pocket Sphinx how to interpret the audio that it’s processing. Once again, effectively carrying out these tasks requires more in-depth reading on your part.
>> If you want to process audio more directly, you can tell Pocket Sphinx to start processing with the function start_utt() . You can then start reading audio from your microphone. You need to read in appropriate sized blocks of data before handing it to Pocket Sphinx— specifically to the function process_raw() —and you still need to use the function get_hyp() to actually get the translated text. Also, because your code can’t know when someone has finished a complete utterance, you need to do this from within a loop. On each pass of the loop, read another chunk of audio, and feed it in to Pocket Sphinx. You then need to call get_hyp() again to see if you can get anything intelligible from the data. When you are done doing this real-time processing, you can use the function end_utt() .
3 SOCIAL MEDIA You may want your system to check your social media accounts on the Internet. There are several Python modules available to handle this. Let’s say that you want to be able to check your Facebook account. Install the following Python module: sudo apt-get install python-facebook
>> You can then use import facebook to get access to the Facebook API. If you’re a Twitter user, you can install the python-twitter Debian package to use the Twitter API. Email is easier, as long as your email provider offers IMAP or POP access. You can then import emails and get voice control to read unread emails out to you. For the Google fans, Google has a Python module that provides access to the APIs for almost everything available; work with your calendar, email, or fitness data.
>> You should now have a string containing the text that was spoken to your Raspberry Pi. But you need to figure out what command this maps to. One method is to do a search for keywords. If you have a list of keywords available, you can loop through them, and search the heard string to see whether any one of those keywords exist within it as a substring. Then you can execute the associated task with that keyword. However, this method only finds the first match. What happens if your user accidentally includes a keyword in their spoken command before the actual command word? This is the auditory equivalent to having fat fingers and mistyping a command on the keyboard. Being able to deal with these errors gracefully is an ongoing area of research. Maybe you can create a new algorithm to handle these situations—let us know if you come up with something.
>> Let’s say that you have a series of Python scripts that contain the various tasks you want your system to be able to tackle. You need a way to have your system be able to run these scripts when called upon. The most direct way to run a script is to use execfile . Say you have a script called “do_task.py” that contains Python code that you want to run when a command is given—you can run it with: execfile(“do_task.py”)
>> Using this form, you can add command-line options to the string being handed in. This looks in the current directory for the script of that file name, and runs it in the current execution context of your main program. If you need to rerun this code multiple times, call execfile each time you do. If you don’t need the script to run within the same context, use the Subprocess module. You can import it with: import subprocess
>> You can then execute the script like so: subprocess.call(“do_task.py”)
>> This forks off a subprocess of the main Python interpreter, and runs the script there. If your script needs to interact with the main program, this is probably not the method to use. Collecting output from a call to “do_task.py” with Subprocess isn’t straightforward, so another way of achieving the same thing is to use the import statement. It also runs the code in your script at the point the import statement is called. If your script only contains executable Python statements, these get run at the point of importation. In order to rerun this code, you need to use the reload command. The reload command doesn’t exist in version three of Python—so, if you’re using that particular Python version, a better option is to encapsulate the code contained in the script within a function. You can then import the script at the beginning of your main program, and simply call the relevant function at the correct time. This is a much more Pythonic method to use. If you have the following contents for do_task.py… def do_ func(): do_task1() do_task2() …you can use it with the following code within your main program: import do_task .... .... do_task.do_ func() ....
>> An even more Pythonic method is to use classes and objects. You can write a script that defines a class that contains methods for you to call when you need it.
>> What are the options if you want to do something that isn’t achievable with a Python script? In these cases, you need to be able to run arbitrary programs on the host system. The host system in this case is your Raspberry Pi. As a toy example, let’s say you need to download some emails using the Fetchmail program. You can do this in a couple of different ways. The older method is to use the os.system() command where you hand in a string. In our example, this would look something like the following: os.system(“/usr/ bin/fetchmail”)
>> You need to explicitly use os.wait() to be told exactly when the task has finished. This method is now being replaced by the newer Subprocess module. It gives you more control over how the task gets run and how you can interact with it. A simple equivalent to the above command would look like this: subprocess.call(“/usr/ bin/fetchmail”)
>> It waits until the called program has finished and then returns the return code to your main Python process. But what if your external program needs to feed in results to your main program? In this case, you can use the command subprocess.check_output() . This is essentially the same as subprocess.call() , except that when it finishes, anything that is written out by the external program to stdout gets handed in as a string object. If you also need information written out on stderr, you can add the parameter stderr=subprocess.STDOUT to your call to subprocess.check_output . >> You should now have enough of the bare bones to be able to build your own version of the J.A.R.V.I.S. system. You will be able to fine-tune it to do basically anything that you command it to do. So, go forth and order your machines around, and have them actually listen to what you are saying for once.