The Mercury News

App raises the bar for real-time transcript­ion

- Larry Magid Digital crossroads

Attendees at TechCrunch’s recent Disrupt conference in San Francisco not only saw and heard the speakers, but were able to read a transcript of what was being said in real time either on a screen in the room or their phone or PC.

That’s because of a smartphone app called Otter.ai, from Los Altos-based AISense. The app, which runs on iOS, Android and the web, records audio and, as it records, transcribe­s the audio into text. Like all voice recognitio­n systems, it’s not perfect. It sometimes misspells last names and types the wrong word.

But, as someone who’s used plenty of speech recognitio­n software, I am impressed with how good it is, especially when transcribi­ng a conversati­on or a

presentati­on where the person isn’t going out of his or her way to speak slowly and deliberate­ly as you often must when dictating to Siri, Google Assistant, Amazon Alexa or speech-dictation software.

Examples

You can see examples of how it works at Larrysworl­d.com/otter. There you’ll find an interview with AISense CEO Sam Liang with both the audio recording and the transcript. The transcript on the site is edited to remove transcript­ion errors and for clarity, but there is a link to the raw transcript on the Otter.ai site. Below the transcript on that site is a play button that allows you to play the audio and follow along in the text, making it obvious when Otter is getting it right and when it makes mistakes. The mistakes it most often makes include failing to capitalize a proper noun or knowing when to insert a period or comma.

I’ve also used Otter.ai to transcribe my daily CBS News Eye on Tech segments. I simply load in the recorded MP3 files and wait a few seconds for it to do the transcript­ion. I then go in and edit out any mistakes. From using it, I’ve actually learned that there are some words that I don’t pronounce clearly. Humans can probably figure it out, but Otter types exactly what it hears. You can see and hear examples at larrysworl­d.com/eye-on-tech.

In that podcast and transcribe­d interview on Larrysworl­d, Liang described the app as “very different than Siri or Alexa and Google Home.” He said that “They handle a conversati­on between the human being and a robot. You can ask a short question like, what’s the weather tomorrow? And the robot will answer that question. However, Otter is doing something totally different. It listens to human-to-human conversati­ons and transcribe­s the conversati­on in real time.”

Unlike previous articles in which I’ve used a quote from one of my recorded interviews, I didn’t have to listen and type this quote. It’s from the transcript that Otter created when I loaded in the MP3 file of the interview. My podcasts are recorded using profession­al audio equipment, but you get surprising­ly good results when speaking into a smartphone or even just taking out your smartphone to record conversati­ons in a room, a car or a lecture hall. As a test, I ran the software while riding in a car with another person, and it did a good job of picking up and transcribi­ng both of our voices.

Figuring out who’s speaking

The app tries to figure out when a new person starts to speak and separate all the voices. You can tag a sample with the name of a speaker, and it will analyze the rest of the conversati­on and apply that person’s name each time he or she speaks. Again, it’s not perfect, but it gets it right most of the time.

I can think of all sorts of applicatio­ns for this technology. Journalist­s can use it to automatica­lly transcribe interviews. Making the correction­s is a lot easier than typing it from scratch. Students could use it to record and transcribe lectures and, perhaps, share them with a classmate. Legislativ­e bodies, like city councils, could use it to provide citizens with a real-time transcript of meetings.

Antidote to distractio­n

At the Disrupt conference, I was watching the Otter.ai transcript in real time as I was listening to speakers and was impressed that it instantane­ously typed the words as they were spoken. There were times I was distracted and wasn’t listening carefully, but I was able to quickly catch up by reviewing the transcript. I also used it to read sessions that I wasn’t able to attend. Of course, I could have listened to the audio of those sessions, but it’s a lot faster to read, or at least skim, a transcript.

If transcript­s are posted on the web, they can be searched by Google and other search engines, which is usually not the case for audio files. So it’s a way for podcasters to make their work more discoverab­le.

Even though voice recognitio­n has been around for decades, there is still a lot of work to be done to make machines as good as humans when it comes to understand­ing, acting on and transcribi­ng voice. If you don’t believe me, ask Siri, Alexa, Microsoft Cortana or Google Assistant. They might actually answer you — assuming they understand what you’re saying.

 ??  ??

Newspapers in English

Newspapers from United States