Cre­ate a dig­i­tal as­sis­tant; calibrate your mon­i­tor; speed up boot times in Win 10; add move­ment to pho­tos; or­ga­nize your photo li­brary.


RASP­BERRY PI The credit-card-sized com­puter

costs as lit­tle as $35.

IN THIS TU­TO­RIAL, WE’RE LOOK­ING at what you need to make your own voice con­trol soft­ware for your Rasp­berry Pi projects. If you want a vir­tual as­sis­tant, one project is the Jasper sys­tem ( http://jasper­pro­ The doc­u­men­ta­tion on the main Jasper web­site has a de­scrip­tion of the hard­ware to at­tach to your Rasp­berry Pi, along with a full set of in­struc­tions for in­stal­la­tion and con­fig­u­ra­tion. There’s a set of stan­dard mod­ules in­cluded to al­low in­ter­ac­tion with var­i­ous ser­vices—use the time, Gmail, or even the joke mod­ule—and there are also third-party mod­ules for you to ac­cess. There is even a de­vel­oper API and doc­u­men­ta­tion to help you add your own func­tion­al­ity to Jasper.

1 VOICE CON­TROL Ev­ery­one who has watched the Iron­Man movies has prob­a­bly dreamed of hav­ing their own ar­ti­fi­cially in­tel­li­gent com­puter sys­tem to do their bid­ding. While J.A.R.V.I.S. has mas­sive amounts of com­put­ing power be­hind him, you can con­struct the front end with very mod­est re­sources. With a Rasp­berry Pi and the Python pro­gram­ing lan­guage, you can build your own per­sonal dig­i­tal as­sis­tant that can be used as a front end to what­ever mas­sive su­per­com­put­ing re­sources you use in your day-to-day life as a play­boy phi­lan­thropist ge­nius. Over the next few pages, we’ll go over the ba­sics that you need to know, so that by the end of this tu­to­rial, you should be able to build your own rudi­men­tary, cus­tom­ized agent.

>> The first step to in­ter­act­ing with the hu­mans around us is to lis­ten for ver­bal com­mands, so that we know what we need to process. You have sev­eral op­tions avail­able to han­dle this task. To keep things sim­ple, we are deal­ing only with de­vices that are plugged into one of your Rasp­berry Pi’s USB ports. With that stip­u­la­tion, you can talk di­rectly with the USB de­vice at the low­est level. This might be nec­es­sary if you are try­ing to use some­thing that is rather un­usual to do the lis­ten­ing, but you would prob­a­bly be bet­ter off us­ing some­thing a bit more com­mon­place. In this case, you can use the Python mod­ule PyAu­dio. PyAu­dio pro­vides a Python wrap­per around the low-level cross-plat­form li­brary PortAu­dio. As­sum­ing you are us­ing some­thing like Rasp­bian for your Linux dis­tri­bu­tion, you can eas­ily in­stall the re­quired soft­ware with the com­mand:

sudo apt-get in­stall python-pyau­dio

>> If you need the lat­est ver­sion, you can al­ways grab and build it from source. PyAu­dio pro­vides the func­tion­al­ity to read in au­dio data from a mi­cro­phone, along with the abil­ity to play au­dio data out to head­phones or speak­ers. So, we’re us­ing it as our main form of in­ter­ac­tion with the com­puter.

>> The first step is to be able to read in some au­dio com­mands from the hu­mans who hap­pen to be nearby. You need to im­port the PyAu­dio mod­ule be­fore you can start in­ter­act­ing with the mi­cro­phone. The way PyAu­dio works is sim­i­lar to work­ing with files, so it should seem fa­mil­iar to most pro­gramers. You start by cre­at­ing a new PyAu­dio ob­ject with the state­ment p = pyau­dio. PyAu­dio() . You can then open an in­put stream with the func­tion…) , with sev­eral pa­ram­e­ters. You can set the data for­mat for the record­ing; in the ex­am­ple code, we used for­mat=pyau­dio.

paInt16 . You can set the rate in hertz for sam­pling. For ex­am­ple, we are us­ing rate=44100 , which is the stan­dard 44.1KHz sam­pling rate. You also need to say how big a buf­fer to use for the record­ing— we used frames_per_buffer=1024 . Be­cause we want to record, you need to use in­put= true . The last pa­ram­e­ter is to se­lect the num­ber of chan­nels to record on—in this case, we’re us­ing

chan­nels=2 . Now that the stream has been opened, you can start to read from it. You need to read the au­dio data in us­ing the same chunk size that you used when you cre­ated the stream—it looks like . You can then sim­ply loop and read un­til you are done. There are then two com­mands to shut down the in­put stream. You need to call stream.stop_stream() and then stream. close() . If you are com­pletely done, you can now call

p.ter­mi­nate() to shut down the con­nec­tion to the au­dio de­vices on your Rasp­berry Pi.

>> The next step is to be able to send au­dio out­put so that J.A.R.V.I.S. can talk to you as well. For this, you can use PyAu­dio, so we don’t have to look at an­other Python mod­ule. To make things sim­ple, let’s say that you have a WAV file that you want to play. You can use the “wave” Python mod­ule to load it. Once again, you cre­ate a PyAu­dio ob­ject, and open a stream. The pa­ram­e­ter “out­put” should be set to true. The for­mat, the num­ber of chan­nels, and the rate is all in­for­ma­tion that is de­rived from the au­dio data stored in your WAV file. To ac­tu­ally hear the au­dio, you can sim­ply loop through, read­ing one chunk of data from the WAV file at a time, and im­me­di­ately writ­ing out to the PyAu­dio stream. Once you’re done, you can stop the stream and close it, as you did above.

>> In both of the above cases, the func­tions block when you call them un­til they have com­pleted. So, what are the op­tions if you want to be able to do pro­cess­ing while you are ei­ther record­ing au­dio or out­putting au­dio? There are non-block­ing ver­sions that take a call­back func­tion as an ex­tra pa­ram­e­ter, called stream_­call­back . This call­back func­tion takes four pa­ram­e­ters, named in­_­data ,

frame_­count , time_info , and sta­tus . The in­_­data pa­ram­e­ter con­tains the recorded au­dio if in­put is true. The call­back func­tion needs to re­turn a tu­ple with the val­ues

out­_­data and flag — out­_­data con­tains the data to be out­putted if out­put is true in the call to the func­tion open. If the in­put is true in­stead, then out­_­data should be equal to None. The flag can be any of paCon­tinue , paCom­plete , or

paAbort —with ob­vi­ous mean­ings. One thing to be aware of is that you can­not call, read, or write func­tions when you wish to use a call­back func­tion. Once the stream is opened, you sim­ply call the func­tion stream.start_stream() . This starts a sep­a­rate thread to han­dle this stream pro­cess­ing. You can use stream.is_ac­tive() to check on the cur­rent

sta­tus. Once the stream pro­cess­ing is done, you can call stream. stop_stream() to stop the se­condary thread.

2 OFF­LOAD TASKS You can off­load the au­dio data pro­cess­ing to Google, ac­cess­ing the API di­rectly over HTTP by post­ing your au­dio data to the ap­pro­pri­ate URL. First, in­stall the Python mod­ule SpeechRecog­ni­tion: pip in­stall SpeechRecog­ni­tion

>> Now cre­ate an in­stance of the Rec­og­nizer ob­ject. A Helper ob­ject, called WavFile, takes an au­dio file and pre­pares it for use by the Google API, then pro­cesses it with the record() func­tion, and hands this pro­cessed au­dio to the func­tion rec­og­nize() . When it re­turns, you get a list of pairs of pos­si­ble texts, along with a per­cent­age con­fi­dence level for each pos­si­ble text de­cod­ing. Be aware that this mod­ule uses an un­of­fi­cial API key to do its de­cod­ing, so for any­thing more than small per­sonal test­ing, you should re­quest your own API key.

>> We’ve seen how we can make our Rasp­berry Pi lis­ten to the world around it, now we need to try to make sense of what it might have just heard. In gen­eral, this is called speech recog­ni­tion, and it is a very large and ac­tive area of re­search. Ev­ery ma­jor smart­phone op­er­at­ing sys­tem has ap­pli­ca­tions try­ing to take ad­van­tage of this mode of hu­man in­ter­ac­tion. There are also sev­eral dif­fer­ent Python mod­ules avail­able that can per­form this speech-to-text (STT) trans­la­tion step. In this part of our project, we’re look­ing at us­ing Pocket Sphinx to do all the heavy lift­ing. Sphinx was de­vel­oped by Carnegie Mel­lon Uni­ver­sity, and is li­censed un­der a BSD li­cense, so you are free to add any ex­tra func­tion­al­ity that you may need for spe­cific tasks. Be­cause of the ac­tiv­ity in this field, it is well worth your time to keep track of all the up­dates and per­for­mance im­prove­ments.

While you can down­load the source code for all of th­ese mod­ules, and build it all from scratch, we’re as­sum­ing that you are us­ing one of the De­bian-based dis­tri­bu­tions, such as Rasp­bian. For th­ese, you can sim­ply use the fol­low­ing to get all of the re­quired files for the en­gine:

sudo apt-get in­stall python-pock­et­sphinx

You also need au­dio model files and lan­guage model files in or­der to get a trans­la­tion in your lan­guage of choice. To get the files needed for English, you can in­stall the pack­ages: sudo apt-get in­stall pock­et­sphinx-hmm-wsj1 pock­et­sphinxlm-wsj

You may need to go out­side the reg­u­lar pack­age man­age­ment sys­tem if you want to process other lan­guages. Then you can sim­ply start writ­ing and us­ing your code straight away. To be­gin us­ing th­ese mod­ules, you need to im­port both Pocket Sphinx and Sphinx Base with the fol­low­ing com­mands:

im­port sphinxbase

Th­ese mod­ules are ac­tu­ally Python wrap­pers around the C code that han­dles the ac­tual com­pu­ta­tional work of trans­lat­ing sounds to text. The most ba­sic work­flow in­volves in­stan­ti­at­ing a De­coder ob­ject from the Pocket Sphinx mod­ule. The De­coder ob­ject takes sev­eral in­put pa­ram­e­ters to de­fine the lan­guage files it is al­lowed to use. Th­ese in­clude hmm , lm , and dict . If you use the above pack­ages used to han­dle English, the files you need are in the di­rec­to­ries “/usr/share/pock­et­sphinx/model/hmm/wsj1” and “/usr/share/pock­et­sphinx/model/lm/wsj.” If you don’t set th­ese pa­ram­e­ters, it tries to use sen­si­ble de­faults, which usu­ally work fine for English lan­guage speech. This newly cre­ated De­coder ob­ject can now be given WAV files with data to process. If you re­mem­ber, we pre­vi­ously saved the recorded speech as a WAV file. In or­der to have this au­dio recorded in the cor­rect for­mat, you need to edit the code from the first step, and en­sure that you are record­ing in mono (us­ing one chan­nel, for ex­am­ple), and record­ing at 16kHz with 16-bit qual­ity. To read it prop­erly, you can use a file ob­ject, and load it as a bi­nary file with read per­mis­sions. WAV files have a small piece of header data at the be­gin­ning of the file that you need to jump over. This is done by us­ing the seek func­tion to jump over the first 44 bytes. Now that the file poin­ter is in the cor­rect po­si­tion, you can hand the file ob­ject to the De­coder ob­ject’s de­code_raw() func­tion. It then goes off and does a bunch of data crunch­ing to try to fig­ure out what was said. To get the re­sults, you use the get_hyp() func­tion call. You get a list with three el­e­ments from this func­tion: a string con­tain­ing the best guess at the spo­ken text, a string con­tain­ing the ut­ter­ance ID, and a num­ber con­tain­ing the score for this guess.

>> So far, we’ve looked at how to use the generic lan­guage and au­dio mod­els for a par­tic­u­lar lan­guage. But Pocket Sphinx is a re­search-level lan­guage sys­tem, so it has tools avail­able to en­able you to build your own mod­els [ Im­age A]. In this way, you can train your code to un­der­stand your par­tic­u­lar voice, with all its pe­cu­liar­i­ties and ac­cents. This is a long process, so most peo­ple aren’t in­ter­ested in do­ing some­thing so in­ten­sive. How­ever, if you are in­ter­ested, there is in­for­ma­tion avail­able at the main web­site ( http://cmus­phinx.source­ You can also de­fine your own mod­els and gram­mars to tell Pocket Sphinx how to in­ter­pret the au­dio that it’s pro­cess­ing. Once again, ef­fec­tively car­ry­ing out th­ese tasks re­quires more in-depth read­ing on your part.

>> If you want to process au­dio more di­rectly, you can tell Pocket Sphinx to start pro­cess­ing with the func­tion start_utt() . You can then start read­ing au­dio from your mi­cro­phone. You need to read in ap­pro­pri­ate sized blocks of data be­fore hand­ing it to Pocket Sphinx— specif­i­cally to the func­tion pro­cess_raw() —and you still need to use the func­tion get_hyp() to ac­tu­ally get the trans­lated text. Also, be­cause your code can’t know when some­one has fin­ished a com­plete ut­ter­ance, you need to do this from within a loop. On each pass of the loop, read an­other chunk of au­dio, and feed it in to Pocket Sphinx. You then need to call get_hyp() again to see if you can get any­thing in­tel­li­gi­ble from the data. When you are done do­ing this real-time pro­cess­ing, you can use the func­tion end_utt() .

3 SO­CIAL ME­DIA You may want your sys­tem to check your so­cial me­dia ac­counts on the In­ter­net. There are sev­eral Python mod­ules avail­able to han­dle this. Let’s say that you want to be able to check your Face­book ac­count. In­stall the fol­low­ing Python mod­ule: sudo apt-get in­stall python-face­book

>> You can then use im­port face­book to get ac­cess to the Face­book API. If you’re a Twit­ter user, you can in­stall the python-twit­ter De­bian pack­age to use the Twit­ter API. Email is eas­ier, as long as your email provider of­fers IMAP or POP ac­cess. You can then im­port emails and get voice con­trol to read un­read emails out to you. For the Google fans, Google has a Python mod­ule that pro­vides ac­cess to the APIs for al­most ev­ery­thing avail­able; work with your cal­en­dar, email, or fit­ness data.

>> You should now have a string con­tain­ing the text that was spo­ken to your Rasp­berry Pi. But you need to fig­ure out what com­mand this maps to. One method is to do a search for key­words. If you have a list of key­words avail­able, you can loop through them, and search the heard string to see whether any one of those key­words ex­ist within it as a sub­string. Then you can ex­e­cute the as­so­ci­ated task with that key­word. How­ever, this method only finds the first match. What hap­pens if your user ac­ci­den­tally in­cludes a key­word in their spo­ken com­mand be­fore the ac­tual com­mand word? This is the au­di­tory equiv­a­lent to hav­ing fat fin­gers and mistyp­ing a com­mand on the key­board. Be­ing able to deal with th­ese er­rors grace­fully is an on­go­ing area of re­search. Maybe you can cre­ate a new al­go­rithm to han­dle th­ese sit­u­a­tions—let us know if you come up with some­thing.

>> Let’s say that you have a se­ries of Python scripts that con­tain the var­i­ous tasks you want your sys­tem to be able to tackle. You need a way to have your sys­tem be able to run th­ese scripts when called upon. The most di­rect way to run a script is to use ex­ec­file . Say you have a script called “do_­” that con­tains Python code that you want to run when a com­mand is given—you can run it with: ex­ec­file(“do_­”)

>> Us­ing this form, you can add com­mand-line op­tions to the string be­ing handed in. This looks in the cur­rent direc­tory for the script of that file name, and runs it in the cur­rent ex­e­cu­tion con­text of your main pro­gram. If you need to re­run this code mul­ti­ple times, call ex­ec­file each time you do. If you don’t need the script to run within the same con­text, use the Subpro­cess mod­ule. You can im­port it with: im­port subpro­cess

>> You can then ex­e­cute the script like so: subpro­“do_­”)

>> This forks off a subpro­cess of the main Python in­ter­preter, and runs the script there. If your script needs to in­ter­act with the main pro­gram, this is prob­a­bly not the method to use. Col­lect­ing out­put from a call to “do_­” with Subpro­cess isn’t straight­for­ward, so an­other way of achiev­ing the same thing is to use the im­port state­ment. It also runs the code in your script at the point the im­port state­ment is called. If your script only con­tains ex­e­cutable Python state­ments, th­ese get run at the point of im­por­ta­tion. In or­der to re­run this code, you need to use the reload com­mand. The reload com­mand doesn’t ex­ist in ver­sion three of Python—so, if you’re us­ing that par­tic­u­lar Python ver­sion, a bet­ter op­tion is to en­cap­su­late the code con­tained in the script within a func­tion. You can then im­port the script at the be­gin­ning of your main pro­gram, and sim­ply call the rel­e­vant func­tion at the cor­rect time. This is a much more Pythonic method to use. If you have the fol­low­ing con­tents for do_­… def do_ func(): do_­task1() do_­task2() …you can use it with the fol­low­ing code within your main pro­gram: im­port do_­task .... .... do_­task.do_ func() ....

>> An even more Pythonic method is to use classes and ob­jects. You can write a script that de­fines a class that con­tains meth­ods for you to call when you need it.

>> What are the op­tions if you want to do some­thing that isn’t achiev­able with a Python script? In th­ese cases, you need to be able to run ar­bi­trary pro­grams on the host sys­tem. The host sys­tem in this case is your Rasp­berry Pi. As a toy ex­am­ple, let’s say you need to down­load some emails us­ing the Fetch­mail pro­gram. You can do this in a cou­ple of dif­fer­ent ways. The older method is to use the os.sys­tem() com­mand where you hand in a string. In our ex­am­ple, this would look some­thing like the fol­low­ing: os.sys­tem(“/usr/ bin/fetch­mail”)

>> You need to ex­plic­itly use os.wait() to be told ex­actly when the task has fin­ished. This method is now be­ing re­placed by the newer Subpro­cess mod­ule. It gives you more con­trol over how the task gets run and how you can in­ter­act with it. A sim­ple equiv­a­lent to the above com­mand would look like this: subpro­“/usr/ bin/fetch­mail”)

>> It waits un­til the called pro­gram has fin­ished and then re­turns the re­turn code to your main Python process. But what if your ex­ter­nal pro­gram needs to feed in re­sults to your main pro­gram? In this case, you can use the com­mand subpro­cess.check­_out­put() . This is es­sen­tially the same as subpro­ , ex­cept that when it fin­ishes, any­thing that is writ­ten out by the ex­ter­nal pro­gram to std­out gets handed in as a string ob­ject. If you also need in­for­ma­tion writ­ten out on stderr, you can add the pa­ram­e­ter stderr=subpro­cess.STD­OUT to your call to subpro­cess.check­_out­put . >> You should now have enough of the bare bones to be able to build your own ver­sion of the J.A.R.V.I.S. sys­tem. You will be able to fine-tune it to do ba­si­cally any­thing that you com­mand it to do. So, go forth and or­der your ma­chines around, and have them ac­tu­ally lis­ten to what you are say­ing for once.

Newspapers in English

Newspapers from Australia

© PressReader. All rights reserved.