Mozilla revolutionises speech recognition
The launch of Project DeepSpeech and Project Common Voice aims to provide human-like perception of speech.
Mozilla has recently announced the initial release of its Automatic Speech Recognition engine, based on work carried out by the Machine Learning team. The DeepSpeech ( https://github.com/mozilla/
DeepSpeech) engine is modelled on Deep Speech papers published by Baidu ( www.baidu.com), a Chinese web services company that’s one of the leaders in AI development, and details a trainable multi-layered deep neural network.
The ambitious project initially had a goal of hitting a ‘word error rate’ of less than 10 per cent. However, Mozilla says the engine’s word error rate on LibriSpeech’s test-clean set is now 6.5 per cent, clearly beating this goal, and achieving close to human-level performance (which occurs at around 5.8 per cent, according to the Deep Speech 2 paper). The company also revealed its Project Common Voice, a publicly available voice dataset containing some 400,000 recordings from 20,000 different speakers. That represents around 500 hours of speech. As Mozilla states in a blog post ( http://bit.ly/mozilla-speech 1), the idea here is to “Build a speech corpus that’s free, open source, and big enough to create meaningful products with”, while running in parallel with the new speech recognition model.