Business Standard

Krutrim: Disrupting AI giants with Indian data dominance

- PEERZADA ABRAR

Krutrim, an artificial intelligen­ce (AI) venture co-founded by Bhavish Aggarwal of Ola, has entered the increasing­ly competitiv­e AI race dominated by players such as Google, Microsoft, and Openai.

However, what sets the firm apart from these players is that it has been built with the largest representa­tion of Indian data used for its generative AI (GENAI) applicatio­ns in all Indian languages.

Currently, all AI models called LLMS (large language models) are primarily trained in English. Due to India’s multicultu­ral and multilingu­al context, these models struggle to capture the richness of the country’s linguistic diversity. Experts argue that training on unique data sets specific to the country is crucial.

“This is a problem statement at the crossroads of knowledge and language,” said Ravi Jain, head of strategy at Krutrim, in an interview. “Our differenti­ation is driven by the data used in the training and the various languages we incorporat­e, including their richness and depth. This will define the quality of the output in terms of the applicatio­ns we build, setting us apart from the (technology) giants operating in 180 countries.”

Krutrim, meaning “artificial” in Sanskrit, is a family of LLMS that includes Krutrim Base and Krutrim Pro. The latter boasts multimodal, larger knowledge capabiliti­es, and various technical advancemen­ts for inference. It is trained on over 2 trillion tokens, referring to chunks of text that the model reads or generates.

A team of computer scientists, based in Bengaluru and San Francisco, has trained this model, which will also power Krutrim’s conversati­onal AI assistant capable of understand­ing and speaking multiple Indian languages fluently.

When asked about the source of the data, Jain mentioned that the first model the firm built had a significan­t representa­tion of Indian data available in the public domain.

“Imagine all the Indian data in various languages on the web, including many PDFS (portable document formats). So, we have a substantia­l amount of publicly available data in different languages,” Jain explained. “As we progress, digitising non-digitised data, especially in many Indian languages, will become a crucial part of our journey. If we can incorporat­e them into the corpus and train the models, it will make a big difference.”

Last month, Krutrim became available for public beta testing.

The AI chatbot, similar to Openai’s CHATGPT, is accessible in two languages: English and Hindi.

“This is a starting point for us and our first-generation product. There is much more to come, and improvemen­ts will be significan­t as we build on this foundation,” remarked Ola founder Bhavish Aggarwal recently on X. Aggarwal emphasised that Krutrim is firmly rooted in Indian values and data, covering over 10 Indian languages, and is ready to assist in English, Hindi, Tamil, Bengali, Marathi, Kannada, Gujarati, and even Hinglish.

“While some ‘hallucinat­ions’ may occur, they are much less prevalent in Indian contexts compared to other global platforms. We are working diligently to find and rectify them,” Aggarwal assured. Indeed, Krutrim recently provided incorrect responses to users’ queries. Screenshot­s shared on social media showed the chatbot incorrectl­y stating that the West Indies won the 1983 Cricket World Cup. It also erroneousl­y asserted that Hillary Clinton won the 2014 US presidenti­al elections, among other errors.

When asked about addressing such issues, Jain acknowledg­ed that generative models can make mistakes as their insights are based on informatio­n in the public domain, which can have diverse views.

AS WE GO ALONG, THIS (PUBLICLY AVAILABLE DATA) WOULD BECOME THE MOST IMPORTANT PART OF OUR JOURNEY ABOUT HOW TO DIGITISE DATA THAT IS NOT DIGITISED YET. AND THAT IS THE CASE WITH MANY INDIAN LANGUAGES. IF WE CAN MAKE THEM PART OF THE CORPUS AND TRAIN THE MODELS, THAT WILL MAKE A BIG DIFFERENCE” RAVI JAIN, HEAD OF STRATEGY, KRUTRIM

 ?? ??

Newspapers in English

Newspapers from India