The Sunday Guardian

From your mouth to screen, transcribi­ng takes the next step

- JOHN MARKOFF

Sam Liang longs for his mother and wishes he could recapture the things she told him when he was in high school.

“I really miss her,” he said of her death in 2001. “Those were precious lifetime moments.”

Liang, who is the chief executive and co-founder of Otter.ai, a Silicon Valley startup, has set out to do something about that in the future. His company offers a service that automatica­lly transcribe­s speech with high enough accuracy that it is gaining popularity with journalist­s, students, podcasters and corporate workers.

Improvemen­ts in software technology have made automatic speech transcript­ion possible. By capturing vast quantities of human speech, neural network programs can be trained to recognize spoken language with accuracy rates that in the best circumstan­ces approach 95%. Coupled with the plunging cost of storing data, it is now possible to use human language in ways that were unthinkabl­e just a few years ago.

Liang, a Stanford-educated electrical engineer who was a member of the original team that designed Google Maps, said that data compressio­n had made it possible to capture the speech conversati­on of a person’s entire life in just two terabytes of informatio­n—compact enough to fit on storage devices that cost less than $50.

The rapid improvemen­t in speech recognitio­n technology, which over the past decade has given rise to virtual speech assistants such as Apple’s Siri, Amazon’s Alexa, Google Voice, Microsoft Cortana and others, is spilling into new areas that are beginning to have a significan­t impact on the workplace.

These consumer speech portals have already raised extensive new privacy concerns. “Computers have a much greater ability to organize, access and evaluate human communicat­ions than do people,” said Marc Rotenberg, president and executive director of the Electronic Privacy Informatio­n Center in Washington. In 2015, the group filed a complaint with the Federal Trade Commission against Samsung, arguing that the capture and storage of conversati­ons by their smart TVS was a new threat to privacy. Speech transcript­ion potentiall­y pushes traditiona­l privacy concerns into new arenas both at home and work, he said.

The rapid advances being made in the automated transcript­ion market in the past year show striking nearterm potential in a growing array of new applicatio­ns. This fall, for example, at the University of California, Los Angeles, students on campus who require assistance in note taking, such as those who are hearing-impaired, are being equipped with the Otter.ai service. The system is designed to replace the current note-taking process where other students take notes during classes and then share them.

In May, when the former first lady, Michelle Obama, visited campus as part of a student signing day celebratio­n, deaf students were given access to a instantane­ous transcript­ion of her speech generated by the transcript­ion service.

Zoom, maker of a webbased video conferenci­ng system, offers a transcript­ion option powered by the Otter.ai service that makes it possible to instantane­ously capture a transcript of a business meeting that can be stored and searched online. One of the features that is offered by Otter.ai and other companies is the ability to easily separate and then label different speakers in a single transcript­ion.

Companies such as Rev, which began in 2010 using temporary workers to offer transcript­ion for $1 a minute, offers an additional automated speech transcript­ion service for 10 cents a minute. As a result, transcript­ion is pushing into a variety of new areas, including captioning for Youtube channels, corporate training videos and market research firms who need transcript­s from focus groups.

The Rev system allows the customer to choose whether they want more accuracy or a quicker turnaround at lower cost, said Jason Chicola, the company’s founder and chief executive. Increasing­ly, his customers will correct machine-generated texts rather than transcribi­ng from scratch. He said that while Rev had 40,000 human transcribe­rs, he did not believe that automated transcript­ion would decimate his workforce. “Humans and machines will work together for the foreseeabl­e future,” he said.

In the medical field, automated transcript­ion is being used to change the way doctors take notes. In recent years, electronic health record systems became part of a routine office visit, and doctors were criticized for looking at their screens and typing rather than maintainin­g eye contact with patients. Now, several health startups are offering transcript­ion services that capture text and potentiall­y video in the examining room and use a remote human transcribe­r, or scribe, to edit the automated text and produce a “structured” set of notes from the patient visit.

One of the companies, Robin Healthcare, based in Berkeley, California, records office visits with an automated speech transcript­ion system that is then annotated by a staff of human “scribes” who work in the United States, according to Noah Auerhahn, the company’s chief executive. Most of the scribes are pre-med students who listen to the doctor’s conversati­on, then produce a finished record within two hours of the patient’s visit. The Robin Healthcare system is being used at the University of California, San Francisco, and at Duke University.

A competitor, Deepscribe, also based in Berkeley, takes a more automated approach to generating electronic health records. The firm uses several speech engines from large technology companies like Google and IBM to record the conversati­on and creates a summary of the examinatio­n that is checked by a human. By relying more on speech automation, Deepscribe is able to offer a less expensive service, said Akilesh Bapu, the company’s chief executive.

In the past, human speech transcript­ion has largely been limited to the legal and medical fields. This year, the cost of automated transcript­ion has collapsed as rival startup firms have competed for a rapidly growing market. Companies such as Otter.ai and Descript, a rival San Francisco-based startup started by Groupon founder Andrew Mason, are giving away basic transcript­ion services and focusing on charging for subscripti­ons that offer enhanced features.

Speech scientists emphasize that while the automated transcript­ion systems are significan­tly improved, they are still far from perfect. While 95% accuracy may be obtained by automated transcript­ion, it is possible only under the best circumstan­ces. An accent, a poorly positioned microphone or background noise can cause accuracy to fall.

The hope for the future is the emergence of another speech technology known as natural language processing, which tries to capture the meaning of words and sentences that will increase computer accuracy to human levels. But for now, natural language processing still remains one of the most challengin­g frontiers in the field of artificial intelligen­ce. © 2019 THE NEW YORK TIMES

 ?? (JIM WILSON/THE NEW YORK TIMES) ?? Yun Fu, left, and Sam Liang, founders of Otter.ai, a Silicon Valley start-up that offers a service that automatica­lly transcribe­s speech, outside the company’s offices in Los Altos, California.
(JIM WILSON/THE NEW YORK TIMES) Yun Fu, left, and Sam Liang, founders of Otter.ai, a Silicon Valley start-up that offers a service that automatica­lly transcribe­s speech, outside the company’s offices in Los Altos, California.

Newspapers in English

Newspapers from India