Sogou, Xinhua virtual anchor heralds AI era
Chinese search engine Sogou launched an AI virtual anchor — the world’s first human replica intelligent virtual host, at the Fifth World Internet Conference in Wuzhen, East China’s Zhejiang province.
The technology simulates natural speech and expressions, integrating advanced image detection and prediction capabilities, as well as speech synthesis, to allow the virtual anchor to “broadcast” text inputs in real time.
His appearance and voice are modeled after Zhang Zhao, a real anchor at Xinhua News Agency, an official State-run media outlet. Once a user inputs news text, a virtual Xinhua news anchor will appear on-screen.
The virtual anchor speaks in Zhang’s voice, and offers a believable image of him, complete with appropriate mouth movements and natural facial expressions, meaning the virtual anchor is not much different from a real one.
According to Xinhua, “he” has become a member of its reporting team and can work 24 hours a day on its official website and various social media platforms, reducing news production costs and improving efficiency.
“Virtual assistants are rapidly gaining traction as an efficient way to solve daily problems,” said Wang Xiaochuan, CEO of Sogou. Creating a more realistic virtual character will facilitate more natural interactions and enable this technology to become an even more integral part of everyday life, said Wang.
While still in the early stages of exploring potential applications for this technology, there is no doubt that Sogou will continue to push the boundaries of AI, Wang said.
Based on “Sogou avatar” technology and using such cutting-edge techniques as facial landmark localization and face reconstruction, the AI Virtual Anchor was developed successfully side-byside with multimodal information for joint modeling training.
According to Wang Yanfeng, general manager of the intelligent voice division at Sogou, “Sogou avatar” technology is one of the division’s core achievements, which follows the concept of “Nature Interaction plus Knowledge Computing”.
This form of broadcasting breaks through the restriction that virtual images must be created first and with the accompanying voice being added later, Wang said, as using the “Sogou avatar” technology, the AI Virtual Anchor can produce synchronized video in real time.
Users can provide text in various ways such as text typing,
Virtual assistants are rapidly gaining traction as an efficient way to solve daily problems.” Sogou
voice input and machine translation. Then, they instantly obtain a real-time broadcast video. This method of newsmaking will greatly reduce the costs of post production and improve efficiency, Wang said.
As early as 2000, researchers in both the academic and private sectors have worked to develop technology that could create a virtual anchor. This type of research has advanced quickly in recent years thanks to the evolution of AI-enabled technologies such as facial recognition, lip-reading and machine learning driven by big data analytics.
In developing its virtual anchor technology, Sogou’s team of AI researchers analyzed audio and visual data from a live anchor, allowing them to develop a model that could then produce a realistic virtual anchor.
With a focus on natural language processing and machine learning, Sogou has developed industry-leading capabilities in speech recognition and image recognition. Sogou’s speech recognition technology possesses an accuracy rate of over 97 percent, while its image recognition technology has achieved an accuracy rate of 96 percent.
Currently, there are 500 million voice requests on Sogou each day. The engine processes these with multilingual and multitonal speech synthesis capabilities that help it to realize personalized voice synthesis and emotional transference.
Wang said the technology has the potential to enable more natural interaction between humans and machines in a wide range of different scenarios. In addition to generating entertainment content, AI-generated characters could also be equipped with Sogou’s interactive voice operating system and utilized to deliver personalized content in the education, medical and legal fields.
Wang said he anticipated this new technology will improve social productivity and service efficiency, reduce industrial production costs, and enhance people’s experiences in science and technology. Xinhua contributed to this story.