China Daily

Nation’s firms eye lightweigh­t LLMs as AI race heats up

Smaller large models require fewer calculatio­ns, less powerful processors

- By CHENG YU chengyu@chinadaily.com.cn

More Chinese companies are developing lightweigh­t large language models after US-based technology firm OpenAI launched a text-to-video model, Sora, last month, hiking the stakes in the global AI race.

The lightweigh­t model, also known as a smaller large model, basically refers to those that require fewer parameters. This means they will have limited capacity to process and generate text compared to large models.

Simply put, these small models are like compact cars, while large models are like luxury sport utility vehicles.

In February, Chinese artificial intelligen­ce startup ModelBest Inc launched its latest lightweigh­t large model, generating much attention in the AI industry.

Dubbed as MiniCPM-2B, the model is embedded with a capacity of 2 billion parameters, much smaller than the 1.7 trillion parameters that OpenAI’s massive GPT-4.0 can handle.

In December, US tech giant

Microsoft released Phi-2, a small language model capable of common-sense reasoning and language understand­ing, although this packed 2.7 billion parameters.

Li Dahai, CEO of ModelBest, said the new model’s performanc­e is close to that of Mistral-7B from French AI company Mistral on open-sourced general benchmarks with better ability on Chinese, mathematic­s and coding. Its overall performanc­e exceeds some peer large models with some 10-billionlev­el parameters, Li said.

“Both large and smaller large models have their advantages, depending on the specific requiremen­ts of a task and their constraint­s, but Chinese companies may find a way out to leverage small models amid an AI boom,” said Li.

Zhou Hongyi, founder and chairman of 360 Security Technology, and a member of the 14th National Committee of the Chinese People’s Political Consultati­ve Conference at the ongoing two sessions, had also said previously in an interview that creating a universal large model that surpasses GPT-4.0 may be challengin­g at the moment.

Though GPT-4.0 currently “knows everything, it is not specialize­d”, he said.

“If we can excel in a particular business domain by training a model with unique business data and integratin­g it with many business tools within that sector, such a model will not only have intelligen­ce, but also possess unique knowledge, even hands and feet,” he said.

Li said that if such a lightweigh­t model can be applied to industries, its commercial value will be huge.

“If the model is compressed, it will require fewer calculatio­ns to operate, which also means less powerful processors and less time to complete responses,” Li said.

“With the popularity of such endside models, the inference cost of more electronic devices, such as mobile phones, will further decrease in the future,” he added.

Both large and smaller large models have their advantages ... Chinese companies may find a way out to leverage small models amid an AI boom.”

Li Dahai, ModelBest CEO

 ?? ZHU HAIWEI / FOR CHINA DAILY ?? An employee introduces an AI large model to a visitor (middle) during the 2nd Global Digital Trade Expo in Hangzhou, Zhejiang province.
ZHU HAIWEI / FOR CHINA DAILY An employee introduces an AI large model to a visitor (middle) during the 2nd Global Digital Trade Expo in Hangzhou, Zhejiang province.

Newspapers in English

Newspapers from Hong Kong