Linux Format

Mozilla revolution­ises speech recognitio­n

The launch of Project DeepSpeech and Project Common Voice aims to provide human-like perception of speech.

-

Mozilla has recently announced the initial release of its Automatic Speech Recognitio­n engine, based on work carried out by the Machine Learning team. The DeepSpeech ( https://github.com/mozilla/

DeepSpeech) engine is modelled on Deep Speech papers published by Baidu ( www.baidu.com), a Chinese web services company that’s one of the leaders in AI developmen­t, and details a trainable multi-layered deep neural network.

The ambitious project initially had a goal of hitting a ‘word error rate’ of less than 10 per cent. However, Mozilla says the engine’s word error rate on LibriSpeec­h’s test-clean set is now 6.5 per cent, clearly beating this goal, and achieving close to human-level performanc­e (which occurs at around 5.8 per cent, according to the Deep Speech 2 paper). The company also revealed its Project Common Voice, a publicly available voice dataset containing some 400,000 recordings from 20,000 different speakers. That represents around 500 hours of speech. As Mozilla states in a blog post ( http://bit.ly/mozilla-speech 1), the idea here is to “Build a speech corpus that’s free, open source, and big enough to create meaningful products with”, while running in parallel with the new speech recognitio­n model.

 ??  ?? Mozilla plans for future releases DeepSpeech to be light enough to run on smartphone­s or single-board computers, like the Raspberry Pi.
Mozilla plans for future releases DeepSpeech to be light enough to run on smartphone­s or single-board computers, like the Raspberry Pi.

Newspapers in English

Newspapers from Australia