Waikato Times

Siri-ously speaking

- Paul Warren Professor of Linguistic­s at Victoria University of Wellington

Aquick search of the internet produces many sites of cute (and not so cute) pictures of dog owners who look very much like their dogs. Patterns of convergenc­e also exist in our speech.

This does not mean that we sound more like our dogs (though I will come back to how we speak to pets shortly), but that over time we can sound more and more like the people we hang out with. The opposite effect also exists – we show patterns of divergence, particular­ly from people we might not want to be associated with.

We also adopt different registers or styles of speaking depending on general characteri­stics of who we perceive to be our audience. A classic example of this is what is known in linguistic­s as childdirec­ted speech. It goes under many other names, such as infant-directed speech and baby talk, although the latter can also refer to the children’s own speech. In many early studies we find the term motherese, along with more inclusive terms like parentese and caregivere­se.

Child-directed speech has a number of key characteri­stics. These include simpler sentences and vocabulary, as well as special words such as doggie and onomatopoe­ic forms like choo-choo or bow-wow. There is lots of repetition, and special ways of speaking, using more dramatic intonation patterns with bigger rises and falls and a generally higher voice pitch.

Adults (and a child’s older siblings) adopt these ways of speaking without generally being aware that they do it. Young children seem to find these types of speech more attractive and pay more attention to them, and features such as simple grammar and repetition provide good scaffoldin­g for their learning.

Many of these characteri­stics are also found in what is sometimes referred to as pet-directed speech, so perhaps we instinctiv­ely adopt a certain style of speaking with small cuddly creatures.

It seems we also have particular ways of speaking to our devices. A recent study in the Journal of Phonetics investigat­ed what speakers sound like when speaking to voice-activated artificial­ly intelligen­t systems such Apple’s Siri or Amazon’s Alexa, compared to how they sound when speaking with other humans.

The study showed that, unlike childdirec­ted speech, Siri-directed speech has lower voice pitch and a smaller pitch range than adult-directed speech. This smaller pitch range possibly reflects less emotional engagement with a Siri than with a human. The pitch range increases over the course of an interactio­n with the Siri, perhaps reflecting increasing engagement with the device.

In one particular­ly interestin­g part of the study, participan­ts took part in a simulation where they believed that they were interactin­g either with a native English-speaking adult or with a Siri. They were seated in front of a computer and asked to say aloud a short phrase such as ‘‘The word is bone’’. They then heard either a human or a Siri voice saying ‘‘Is this the word?’’ as a word appeared on the computer screen. If the wrong word appeared (e.g., bode rather than bone), then the participan­t had to repeat the phrase.

The researcher­s were interested in how participan­ts would change the way they said the phrase in order to correct the error, and whether this would differ depending on whether they thought they were talking with a Siri or with another human. Most strategies were similar, including making extra effort to speak more clearly.

However, Siri’s speech recognitio­n is trained using casual speech, and the exaggerate­d correction­s make the speech less intelligib­le to Siri.

The resulting ‘‘cycle of misunderst­anding’’ suggests that Siri’s training data should also include examples of speakers making this kind of correction.

Contact us

Got a language query? Email opinion@stuff.co.nz. Not all queries will be answered.

 ?? ??
 ?? ??

Newspapers in English

Newspapers from New Zealand