Speaking for itself
“Hey Siri, how do you like your new voice in iOS 11?”
Apple rarely spills the beans on its inner workings, but has changed tack by publishing a paper on how it’s made Siri sound more human. The paper, with the impenetrable title ‘Siri On-Device Deep Learning-Guided Unit Selection Text-to-Speech System’, outlines how Apple chooses the voice actors behind Siri, and how it chops and changes their voices to make Siri say just about anything.
First, the chosen voice for the digital assistant must be one that is ‘pleasant and intelligible and fits the personality of Siri’. Once one has been found, Apple records 10-20 hours of speech with the voice actor. A variety of reading material is used, from audio books and navigation instructions to witty jokes, according to the Siri team’s blog. That’s then spliced together using a number of clever artificial intelligence (AI) techniques in order to create all of the chatter you hear from Siri.
For iOS 11, Apple chose a new female voice for its American English accent, and made use of its own deep-learning technology to improve the tone and cadence of Siri’s voice. Apple says the new voice performed far better in tests than its iOS 9 and iOS 10 equivalents.
Since December 2016, Apple has been publishing research on its AI efforts in an bid to tempt more AI experts to join the company.