Text to speech

Ever wondered what the CLI might sound like? Shashank Sharma has the perfect recipe to give voice to your terminal, so it can finally scream.

2019-03-12 -

Ever wondered what the CLI might sound like? Shashank Sharma has the perfect recipe to give voice to your terminal.

Although it’s not a field that has witnessed active development in recent years, much to the chagrin of a large user community, the Linux ecosystem now boasts many useful textto-speech solutions. Granted, they all still lack a very human-sounding voice, but that doesn’t mean that you can’t learn to use these tools.

Based on the espeak engine, espeak-ng is an actively developed fork which boasts of support for over 100 languages and accents. Along with its parent

espeak, the fork is distributed in the software repositories of many popular distributions such as Ubuntu, Fedora, Arch and others – so let’s see some of the ways we can use it. If you’re running an apt-based Linux distribution, such as Debian or Ubuntu or a derivative, open a terminal and run apt search espeak-ng . Fedora users can similarly run the dnf search espeak-ng command. This will provide a list of all packages which match the search term. You can then install espeak-ng with the sudo apt install espeak-ng command, or use the default software management tool on your distribution.

When installing espeak-ng from the software repositories, you must remember to also install additional packages such as espeak-ng-data. It’s this latter package which provides additional languages, apart from the default British (The Queen’s!–ed) English. If your distribution doesn’t ship espeak-ng in its software repositories, you must compile it manually from source. Head to the project’s Github page and clone the repository with git clone https://github.com/ espeak-ng/espeak-ng.git . Before you continue, make sure you have all the required dependencies and follow the installation instructions on the Github page.

Irrespective of your chosen mode of installation, whether using the software repositories or manual compilation, you now have all you need to get your terminal to speak out the output of commands, read text files to you, and more.

Mastering the basics

To test its new found abilities, launch the terminal and run espeak-ng “Read out this text, please” . Try running the same command with and without the comma before the ‘please’. Notice the pause when you use the comma, just like you’re taught to do in primary school? In addition to reading out quoted text, espeak NG can also read the content of a specified text file. You must use the -f command option to specify the file: espeakng -f filename.txt . Unfortunately espeak-ng doesn’t natively support reading from other file formats such as DOC and ODT files. But that doesn’t mean you can’t listen to these files with espeak-ng. If you’re intent on having the terminal read an ODT file, you must make use of the popular pipe ( | ) discussed in some length in LXF231, and the odt2txt utility, to convert the file on the fly and feed it to espeak-ng:

$ odt2txt --stdout filename.odt | espeak-ng --stdin

In this example, instead of specifying a filename, we’ve instead instructed espeak-ng to read from the standard input. Although not efficient, you can similarly convert DOC and PDF files to TXT and then feed them to espeak-ng.

By default, espeak-ng defaults to British English. For a complete list of all supported languages, you must run the espeak-ng --voices command. You can limit the results to the English language only by running the espeak-ng --voices=en command. We’re constrained for space and can’t print here the output of either of these commands – but as you can see, there’s a field called Gender, which means that you can give your terminal a male or female voice.

In all the commands we’ve run so far, we’ve relied on the default male voice. If you’re ready for some fun, run the espeak-ng -v en-gb-scotland+f3 “Aren’t you curious about the + attribute?” command. Here, in addition to defining en-gb-scotland as the voice using the -v command option, we’ve also specified the gender and age, with +f3 . Try horsing around with different values such as f1 , f2 , f4 , and let your ears identify the different intonation and settle on one that’s most soothing and easily understandable.

Unfortunately, finding your ‘right’ voice really does involve quite a bit of trial and error as the official documentation, and even the Man page, don’t provide any details on switching genders and trying the different ages. Having said that, you can run the following commands for a complete list of available voices and a couple of clues:

# cd /usr/lib/x86_64-linux-gnu/espeak-ng-data/

$ cd voices

$ ls mb ‘!v’

$ cd !v

$ ls

Please note that the final directory name is !v and not lv. Apart from the various f* and m* variants, there are others such as Tweaky, croak, whisper and so on.

Sounding smart

While we’ve only discussed the default configuration settings so far, espeak-ng also enables you to tweak the pitch and speed of the speech so as to improve comprehension. The default reading speed for the tool is 160 words per minute. While this isn’t too fast for a couple of sentences, it might lead to frustration when listening to a chapter of a book, for instance.

You can use the -s command option to define a custom reading speed. As with the gender and age voice profile, the ‘correct’ speed is something that will vary from one user to the next, and you must test out different settings to find a speed that’s comfortable for you. Run the espeak-ng -s 100 -f filename.txt command and then run it again but reduce the speed to 50, then 120, and you’ll soon be able to identify one that best suits you.

If you want the utility to identify capital letters during the read through, you can use the -k command option, which accepts a number of values. With -k1 , espeak-ng produces a small beep before each capitalised letter. With -k2 , it says the word “capital” whenever it comes across a capital letter. Using a higher number results in an increased pitch, which can be used to identify the capitalised words.

Another useful option is the pause between words, which you can tweak using the -g command option. This increases speed in units of 10ms. Finally, you can also adjust the pitch of the spoken voice using the -p command option. The default pitch is set to 50.

Currently, there are several text-to-speech utilities available for a variety of Linux distributions, such as

espeak and Festival. While they all sound quite similar, what distinguishes espeak-ng from its peers is the number of languages supported, and useful features such as the ability to control the pitch and speed of the speech, such that with a little effort, you just might find a the right voice for your terminal.

?? ?? If you’re having difficulty comprehending the output of the espeak-ng –voices command, consider horizontally expanding the terminal. — If you’re having difficulty comprehending the output of the espeak-ng –voices command, consider horizontally expanding the terminal.

?? ?? You can also produce the output as a wave file. Surprisingly, it only takes a few seconds to output a 10 minute audio file. — You can also produce the output as a wave file. Surprisingly, it only takes a few seconds to output a 10 minute audio file.

Text to speech

Ever wondered what the CLI might sound like? Shashank Sharma has the perfect recipe to give voice to your terminal, so it can finally scream.

Newspapers in English

Newspapers from Australia