‘Difficult’ Chinese characters digitized, finally
The coding of many Chinese characters that were previously unavailable in electronic format was completed in October as part of the country’s largest State-funded digitization project.
Around 3,000 difficult characters have been digitized in accordance with a national standard and can be used in China as well as areas of East Asia with software support for written Chinese.
For global use, however, the letters will have to be released through Unicode — the world computing industry consortium — after being certified by the International Organization for Standardization. The Geneva-based body, of which China is a member, works with countries to build proprietary, industrial and commercial standards.
The Chinese program, launched in 2011, aims to encode 300,000 Han characters, 100,000 characters from ethnic minority scripts and 100,000 more from rare and ancient writing styles, such as oracle bones, in the coming years, officials of the General Administration of Press and Publication said last week.
The project, known as the China Font Bank, seeks to make vast linguistic resources more accessible to Chinese and foreigners, academics said.
Around 480 million yuan ($69.6 million) has been provided so far for the project, which is divided into 28 sections and involves several Chinese universities and companies in addition to government departments.
The International Organization for Standardization already recognizes more than 80,000 Chinese characters.
The General Administration of Press and Publication, which is overseeing the project, has submitted the newly coded characters, including minority scripts of
southern China, to the ISO through the China Electronics Standardization Institute.
“The batch has been filed with the relevant ideographs section of the ISO,” a General Administration of Press and Publication official said.
Once discussions conclude, the characters can be categorized by the ISO and made available for use worldwide, a process that typically takes two to three years, the official said.
Another set of 2,000 characters that make up people and place names in China is expected to be submitted to the standards organization by June. These characters weren’t widely digitized earlier due to their complex compositions, which made it difficult for Chinese using them to conduct public dealings at banks, airports and other places that depend on international computer codes.
To address this, the government issued a list of 8,105 standard characters in 2013 and urged people to choose from the list while naming children.
The font project has already identified 200,000 characters that need to be encoded.
The ultimate goal is to build a database of computer codes and fonts to facilitate the inheritance and popularization of Chinese culture, the officials said.
Since the 1980s, a lot of what is known as simplified Chinese has been digitized. But little from early writing systems, for example that of imperial China, has been covered.
“Thisway, more scholars will be able to study the dynastic writings of China,” said Li Guoying, a Chinese language professor at Beijing Normal University.
The university’s role in the project is to digitize dictionaries— 300 of which have been processed.
Some of the ancient language is a mix of drawing and writing on rocks, oracle bones, bronze and silk. In addition, the minority scripts are highly diverse.
Mandarin, China’s common language, has evolved over the years but retains influences of classical Chinese.
“There are only 26 letters in the English alphabet, so in away it is a closed system. But there are too many in Chinese, which is why we need to upgrade and try to digitally integrate them,” said Zhang Jianguo, general manager of the font division at Beijing Founder Electronics Co.
His company, which makes fonts that include Mandarin, Tibetan language and Qing Dynasty (1644-1911) calligraphy styles, joined the project in 2014.