Siri-ously speaking

2022-05-16 - Paul Warren Professor of Linguistics at Victoria University of Wellington

Aquick search of the internet produces many sites of cute (and not so cute) pictures of dog owners who look very much like their dogs. Patterns of convergence also exist in our speech.

This does not mean that we sound more like our dogs (though I will come back to how we speak to pets shortly), but that over time we can sound more and more like the people we hang out with. The opposite effect also exists – we show patterns of divergence, particularly from people we might not want to be associated with.

We also adopt different registers or styles of speaking depending on general characteristics of who we perceive to be our audience. A classic example of this is what is known in linguistics as childdirected speech. It goes under many other names, such as infant-directed speech and baby talk, although the latter can also refer to the children’s own speech. In many early studies we find the term motherese, along with more inclusive terms like parentese and caregiverese.

Child-directed speech has a number of key characteristics. These include simpler sentences and vocabulary, as well as special words such as doggie and onomatopoeic forms like choo-choo or bow-wow. There is lots of repetition, and special ways of speaking, using more dramatic intonation patterns with bigger rises and falls and a generally higher voice pitch.

Adults (and a child’s older siblings) adopt these ways of speaking without generally being aware that they do it. Young children seem to find these types of speech more attractive and pay more attention to them, and features such as simple grammar and repetition provide good scaffolding for their learning.

Many of these characteristics are also found in what is sometimes referred to as pet-directed speech, so perhaps we instinctively adopt a certain style of speaking with small cuddly creatures.

It seems we also have particular ways of speaking to our devices. A recent study in the Journal of Phonetics investigated what speakers sound like when speaking to voice-activated artificially intelligent systems such Apple’s Siri or Amazon’s Alexa, compared to how they sound when speaking with other humans.

The study showed that, unlike childdirected speech, Siri-directed speech has lower voice pitch and a smaller pitch range than adult-directed speech. This smaller pitch range possibly reflects less emotional engagement with a Siri than with a human. The pitch range increases over the course of an interaction with the Siri, perhaps reflecting increasing engagement with the device.

In one particularly interesting part of the study, participants took part in a simulation where they believed that they were interacting either with a native English-speaking adult or with a Siri. They were seated in front of a computer and asked to say aloud a short phrase such as ‘‘The word is bone’’. They then heard either a human or a Siri voice saying ‘‘Is this the word?’’ as a word appeared on the computer screen. If the wrong word appeared (e.g., bode rather than bone), then the participant had to repeat the phrase.

The researchers were interested in how participants would change the way they said the phrase in order to correct the error, and whether this would differ depending on whether they thought they were talking with a Siri or with another human. Most strategies were similar, including making extra effort to speak more clearly.

However, Siri’s speech recognition is trained using casual speech, and the exaggerated corrections make the speech less intelligible to Siri.

The resulting ‘‘cycle of misunderstanding’’ suggests that Siri’s training data should also include examples of speakers making this kind of correction.

Got a language query? Email opinion@stuff.co.nz. Not all queries will be answered.

Siri-ously speaking

Newspapers in English

Newspapers from New Zealand