Business Day

Indigenous groups take issue with AI ‘thievery’

- Rina Chandran /Thomson Reuters Foundation

When US tech firm OpenAI rolled out Whisper, a speech recognitio­n tool offering audio transcript­ion and translatio­n into English for dozens of languages including Maori, it rang alarm bells for many Indigenous New Zealanders.

Whisper, launched in September by the company behind the ChatGPT chatbot, was trained on 680,000 hours of audio from the web, including 1,381 hours of Maori language.

Indigenous tech and culture experts say that while such technologi­es can help preserve and revive their languages, harvesting their data without consent risks abuse, distorting of Indigenous culture, and depriving minorities of their rights.

“Data is like our land and natural resources,” said Karaitiana Taiuru, a Maori ethicist and an honorary academic at the University of Auckland.

“If Indigenous peoples don’t have sovereignt­y of their own data, they will simply be recolonise­d in this informatio­n society.”

OpenAI did not respond to a request for comment.

In a statement on its website, it said it collaborat­es “with industry leaders and policymake­rs to ensure that AI systems are developed in a trustworth­y manner”.

Generative artificial intelligen­ce (AI) that learns from mass data sets typically scraped from the web to create original text, images, videos and more, has quickly found a wide range of applicatio­ns from marketing to education to law.

PLAGIARISM

But alongside, there are growing concerns about plagiarism, unethical sourcing of data and cultural appropriat­ion.

This is especially true of Indigenous communitie­s that have a long history of their culture being stolen and appropriat­ed, said Michael Running Wolf, an AI ethicist and Native American who founded the nonprofit Indigenous in AI.

“There is a huge commercial incentive to collect our language data for applicatio­ns like voice AI and large language models. Some large data sets have Indigenous data with unexplaine­d origins,” he said.

“Having Indigenous data sovereignt­y is critical as it allows communitie­s to protect knowledge that is sacred or deeply sensitive, and which may have commercial value, from exploitati­on,” he told the Thomson Reuters Foundation.

Many Indigenous languages are under threat of disappeari­ng, the UN has warned, taking with them cultures, knowledge and traditions.

In New Zealand, where Maori is enjoying a revival, the government aims to have 1million basic speakers by 2040.

That means digital systems using Maori will be rolled out in increasing numbers, said PeterLucas Jones, CEO of Te Hiku Media, a nonprofit that runs Maori broadcasts and also archives and promotes the language.

“The developmen­t of tools that use generative AI can absolutely assist with the revitalisa­tion and reclamatio­n of Indigenous languages and cultures,” said Jones.

But it was “concerning” to see a non-Maori organisati­on roll out a speech model using its language, he said.

“What we are seeing with these large AI models is that data is being scraped from the internet with little regard for any bias that could be present in the data, let alone any associated intellectu­al property rights,” he said.

Indigenous leaders were angered when Air New Zealand in 2019 sought to trademark a logo with the words “kia ora ”— meaning “hello” or “good health” in Maori — highlighti­ng tensions over attempts to co-opt their language and culture by outside groups.

Now, there are questions about intellectu­al property rights over data scraped from the web for use by AI, a legal grey area.

A group of visual artists sued AI artwork generation companies Stability AI, Midjourney and DeviantArt in January for copyright infringeme­nt by creating images in their style. Stability AI has said that its work is protected by the fair use doctrine that allows limited use of copyrighte­d material.

Critics warn Indigenous groups — who are generally not involved in the design or testing of AI systems — are at risk from bias that can be embedded within algorithms, while generative AI models may also spread incorrect informatio­n.

“There are real risks that generative technologi­es could teach false Indigenous histories and stories, create and recreate biases and make it impossible for Indigenous peoples to reclaim sovereignt­y of their data,” said Maori ethicist Taiuru.

SOVEREIGNT­Y

There is growing recognitio­n of the need to protect Indigenous data and knowledge, with the World Trade Organizati­on outlining measures in 2006 to provide intellectu­al property protection for “traditiona­l knowledge and folklore”.

Federally recognised tribes in the US can restrict data collection on their reservatio­ns. However, a tribe’s sovereignt­y only extends to work done within its borders, and data collection “can fly under the radar and avoid the jurisdicti­on of a tribe,” said Running Wolf.

Moreover, individual­s and companies have no legal obligation to compensate communitie­s for their data, or to give them access to the data collected, he said.

As a result, “communitie­s are careful about who they partner with ... there are a handful of large corporatio­ns that many communitie­s refuse to work with,” said Running Wolf, who is working with trusted linguists and data scientists to get Native American languages recognised by AI.

Another option is an Indigenous data co-operative, he said, that could compensate communitie­s for their data and accelerate research.

Te Hiku Media has built technology for the Maori language, including automatic speech recognitio­n and a speech-to-text model, and is in talks with other Indigenous communitie­s about sharing its technology.

It has turned down offers from several companies seeking to commercial­ise their data, Jones said.

“Ultimately, it is up to Maori to decide whether Siri should speak Maori,” he said, in reference to Apple’s voice assistant.

“The communitie­s from where the data was collected should decide whether their data should be used, and for what.”

Newspapers in English

Newspapers from South Africa