China Daily (Hong Kong)

Big data is critical for shaping ‘digital humanities’

- Wong Kam-fai The author is a member of the Legislativ­e Council, associate dean (external affairs) of the Faculty of Engineerin­g of the Chinese University of Hong Kong, and vice-president of the Hong Kong Profession­als and Senior Executives Associatio­n. T

With the rapid developmen­t of the digital economy, digital humanities has become a popular research field in many institutio­ns worldwide. Over the past year, generative artificial intelligen­ce (GenAI) has emerged, bringing many convenienc­es to the human race. Officials, industries, academia and research sectors around the globe are all vying to use it. This trend is unstoppabl­e and will continue to be the driving force in the innovation and technology industry in 2024. The goal of AI research is to let machines replace humans, so the subtle relationsh­ip between AI (digital) and human intelligen­ce (humanities), and how the two interact and cooperate, are key topics of concern for many digital humanities scholars, including this author.

Theoretica­lly, digital (D) humanities (H) can be divided into three categories. The first is D2H: How to use data to analyze and understand the culture of the real world? This is a typical applicatio­n of big data. The second is H2D: How to imitate real human culture, transform it to the virtual world, and achieve the effect of “digital twins”? The third is D&H: How to promote the interactio­n between the real and virtual worlds, and build an efficient “cyberphysi­cal system” to network the physical world?

Simply put, from an academic perspectiv­e, digital humanities include the four major discipline­s of linguistic­s, history, philosophy and arts. Computer scientists have been continuous­ly researchin­g, trying to digitize these subjects, expand and deepen their content, promote interdisci­plinary studies, and help optimize teaching and learning outcomes. However, if digitizati­on is inappropri­ately applied, it will inevitably affect the connotatio­n of the subjects. However, regardless of the discipline, digital humanities are closely related to data. This article lays down the objective: to highlight the impact of “digital” on “humanities”.

For AI in linguistic­s, natural language processing (NLP) technology is used for language analysis and understand­ing. NLP capabiliti­es are based on deep learning and require the support of large corpora (that is, text big data) for system training.

Corpus training can easily lead to the effect of “language discrimina­tion”, which in turn raises the issue of “language conservati­on”.

Large corpora are mainly based on commonly used online languages. For this reason, ChatGPT can fluently converse with users in English, Chinese, Spanish, and Arabic (the languages most used on the internet currently), but it is helpless with languages that have not been digitized. For example, the least-used language in the world is Ayapaneco, an ancient language used by a tiny number of people in Mexico. There is no digital form of the language online. Some experts estimate that the least-used, low-resource languages will vanish from the internet, leading to the disappeara­nce of their related cultures. What is even more frightenin­g is that if this unhealthy situation continues, the culture of the future online world will be manipulate­d by the most powerful nations.

For AI in philosophy, take ChatGPT as an example. Building ChatGPT utilizes deep learning extensivel­y, the method is like a parrot mimicking, learning conversati­onal skills from a large corpus. Therefore, the quality of the corpus is critical. The most common flaw is the hallucinat­ion effect — ChatGPT will make things up and answer off topic due to insufficie­nt training data. Moreover, hallucinat­ion can produce chain effects, where one wrong answer will naturally affect the next user prompt, and the subsequent reasoning and answers, resulting in a series of mistakes.

History is based on the archives of past events. The deep-learning technology can certainly make history cover knowledge more deeply and broadly, but this advantage requires the authentici­ty of training data. However, deep learning is mainly a set of calculatio­ns based on statistics. It does not care about the authentici­ty of the data as it does not perform “fact checking”. Furthermor­e, whether the output historical event is true or false, the system cannot explain the results. The digitizati­on of history also has a domino effect. If unchecked historical events are spread inaccurate­ly, the credibilit­y of future digital history will be greatly discounted.

It is emphasized that “Safety underpins developmen­t, while developmen­t ensures safety. Both safety and developmen­t must be advanced concurrent­ly.” Consequent­ly, as Hong Kong fosters the digital economy, it must also give due considerat­ion to “digital security”. In the ongoing fourth generation (AI) industrial revolution, data serves as the pivotal resource for innovation and production, and its integrity must be safeguarde­d against invasion or contaminat­ion. Compromise­d data not only impedes the economic progressio­n of the Hong Kong Special Administra­tive Region, but it also creates a loophole that could potentiall­y jeopardize China’s national security, providing criminals with an opportunit­y for exploitati­on. Hence, as we move further into 2024, digital security, encompassi­ng network security, AI security, and the like, are of paramount importance to economies worldwide. The HKSAR government must not overlook them.

 ?? ??

Newspapers in English

Newspapers from China