DT Next

AI platform of IIT-M to process texts in regional languages

-

The faculty of the Indian Institute of Technology Madras (IIT-M) has developed Artificial Intelligen­ce (AI) models and datasets to process texts in 11 Indian regional languages. This was taken up jointly with ‘AI4Bharat,’ a platform for building AI solutions for problems of relevance to India.

This is a unique attempt in academia to develop and publicly release such large scale multilingu­al AI models containing millions of parameters trained on billions of tokens from Indian languages. The researcher­s from IIT-M and AI4Bharat released AI models and datasets for the following languages such as Tamil, Hindi, Malayalam, Telugu, Kannada, Punjabi,

Bengali, Odia, Assamese, Gujarati, and Marathi.

The multilingu­al AI models and datasets developed through this initiative will provide the essential building blocks to students, faculty, start-ups, and industry to work on Indian language tools and push the frontiers of technology.

The faculty has made these cutting-edge resources open-source and completely free of cost, which can be accessed by anyone. These models are freely available and can be downloaded from a Github repository.

An accompanyi­ng research paper describing the research methodolog­ies and evaluation have been accepted at EMNLP-Findings (a companion publicatio­n at one of the top Natural Language Processing conference­s).

Mitesh M Khapra, Assistant Professor,

Department of Computer Science and Engineerin­g, IIT-M, said, “As we move towards a digital economy, it is important that our languages find a space online. This requires a lot of innovation in creating input tools, datasets, and AI models for Indian languages.”

“While such tools are available for English and other foreign languages, there are hardly any tools for Indian languages and this is the critical gap that we are trying to address through this initiative. These models are available free of cost as we want the entire country to benefit from them”, he added. For the past year, a team of researcher­s comprising students, faculty, and volunteers from IIT-M and AI4Bharat worked on collecting data and training powerful models for processing text written in Indian languages.

The models take advantage of the similariti­es between Indian languages to make efficient use of data. With these models, the researcher­s have been able to push the state-of-the-art for Indian language processing on several tasks such as document classifica­tion, sentiment analysis, semantic matching, paraphrase detection, and so on.

 ??  ??

Newspapers in English

Newspapers from India