Baidu, Zhipu LLMs top AI ranking – but only at home
Baidu’s Ernie Bot 4.0 and start-up Zhipu AI’s GLM-4 rank on top among Chinese large language models (LLMs), but their foreign rivals still lead in overall capabilities, according to a new test by Tsinghua University in Beijing.
The SuperBench assessment report examined 14 representative LLMs – the technology underpinning generative artificial intelligence (AI) chatbots – and found that overseas models, such as OpenAI’s GPT-4 and Anthropic’s Claude-3, came out on top in multiple capabilities, including semantic comprehension, coding abilities and alignment with human commands.
Researchers found “obvious gaps” in code-writing and operative abilities in the real-world environment between domestic and first-class foreign models.
The report aimed to “provide objective and scientific evaluation criteria” to examine a number of LLMs that had emerged recently, said a WeChat post published by Tsinghua’s Basic Model Research Centre, which conducted the assessment with the state-backed Zhongguancun Laboratory.
Chinese tech giants and startups have been racing to improve their LLMs since OpenAI, a US start-up backed by Microsoft, launched a series of innovative tools powered by generative AI, including ChatGPT and text-tovideo service Sora.
Around 200 LLMs have been introduced in China, where OpenAI’s services are officially unavailable, according to government figures.
The Tsinghua report echoes a recent comment by Alibaba Group Holding co-founder and chairman Joe Tsai, who said China was about two years behind US companies in the global AI race, citing how OpenAI had leapfrogged the rest of the tech industry in AI innovation.
Alibaba is the owner of the Post.
Revisions to existing US export controls, which took effect this month, have made it harder for China to access advanced AI processors and semiconductor-manufacturing equipment.
Despite the challenges faced by domestic LLM developers, Tsinghua’s report showed that Ernie Bot 4.0, the latest version of the generative AI chatbot launched by web search giant Baidu, and GLM-4 from Zhipu AI, a start-up founded by a Tsinghua graduate, have gradually narrowed their respective gaps with the world’s best models in overall performances.
One area where the country’s LLMs performed better is Chinese text-language tasks, the test found.
Start-up Moonshot AI’s Kimi chatbot, Alibaba’s Tongyi Qianwen 2.1, GLM-4 and Ernie Bot 4.0 ranked in the top four in that category, although GPT-4 still came first in Chinese-textlanguage reasoning.
Moonshot AI and Zhipu AI, along with Baichuan and MiniMax, are known as the “four new AI tigers” for being some of the country’s most promising generative AI start-ups.