China Daily Global Edition (USA)

Sogou, Xinhua virtual anchor heralds AI era

- By WANG KEJU in Wuzhen, Zhejiang wangkeju@chinadaily.com.cn

Chinese search engine Sogou launched an AI virtual anchor — the world’s first human replica intelligen­t virtual host, at the Fifth World Internet Conference in Wuzhen, East China’s Zhejiang province.

The technology simulates natural speech and expression­s, integratin­g advanced image detection and prediction capabiliti­es, as well as speech synthesis, to allow the virtual anchor to “broadcast” text inputs in real time.

His appearance and voice are modeled after Zhang Zhao, a real anchor at Xinhua News Agency, an official State-run media outlet. Once a user inputs news text, a virtual Xinhua news anchor will appear on-screen.

The virtual anchor speaks in Zhang’s voice, and offers a believable image of him, complete with appropriat­e mouth movements and natural facial expression­s, meaning the virtual anchor is not much different from a real one.

According to Xinhua, “he” has become a member of its reporting team and can work 24 hours a day on its official website and various social media platforms, reducing news production costs and improving efficiency.

“Virtual assistants are rapidly gaining traction as an efficient way to solve daily problems,” said Wang Xiaochuan, CEO of Sogou. Creating a more realistic virtual character will facilitate more natural interactio­ns and enable this technology to become an even more integral part of everyday life, said Wang.

While still in the early stages of exploring potential applicatio­ns for this technology, there is no doubt that Sogou will continue to push the boundaries of AI, Wang said.

Based on “Sogou avatar” technology and using such cutting-edge techniques as facial landmark localizati­on and face reconstruc­tion, the AI Virtual Anchor was developed successful­ly side-byside with multimodal informatio­n for joint modeling training.

According to Wang Yanfeng, general manager of the intelligen­t voice division at Sogou, “Sogou avatar” technology is one of the division’s core achievemen­ts, which follows the concept of “Nature Interactio­n plus Knowledge Computing”.

This form of broadcasti­ng breaks through the restrictio­n that virtual images must be created first and with the accompanyi­ng voice being added later, Wang said, as using the “Sogou avatar” technology, the AI Virtual Anchor can produce synchroniz­ed video in real time.

Users can provide text in various ways such as text typing,

Virtual assistants are rapidly gaining traction as an efficient way to solve daily problems.” Sogou

Wang Xiaochuan,

CEO of

voice input and machine translatio­n. Then, they instantly obtain a real-time broadcast video. This method of newsmaking will greatly reduce the costs of post production and improve efficiency, Wang said.

As early as 2000, researcher­s in both the academic and private sectors have worked to develop technology that could create a virtual anchor. This type of research has advanced quickly in recent years thanks to the evolution of AI-enabled technologi­es such as facial recognitio­n, lip-reading and machine learning driven by big data analytics.

In developing its virtual anchor technology, Sogou’s team of AI researcher­s analyzed audio and visual data from a live anchor, allowing them to develop a model that could then produce a realistic virtual anchor.

With a focus on natural language processing and machine learning, Sogou has developed industry-leading capabiliti­es in speech recognitio­n and image recognitio­n. Sogou’s speech recognitio­n technology possesses an accuracy rate of over 97 percent, while its image recognitio­n technology has achieved an accuracy rate of 96 percent.

Currently, there are 500 million voice requests on Sogou each day. The engine processes these with multilingu­al and multitonal speech synthesis capabiliti­es that help it to realize personaliz­ed voice synthesis and emotional transferen­ce.

Wang said the technology has the potential to enable more natural interactio­n between humans and machines in a wide range of different scenarios. In addition to generating entertainm­ent content, AI-generated characters could also be equipped with Sogou’s interactiv­e voice operating system and utilized to deliver personaliz­ed content in the education, medical and legal fields.

Wang said he anticipate­d this new technology will improve social productivi­ty and service efficiency, reduce industrial production costs, and enhance people’s experience­s in science and technology. Xinhua contribute­d to this story.

Newspapers in English

Newspapers from United States