New York Post

GRAND THEFT: WORDS

Media vs. AI bots

- By SHANNON THALER

Artificial intelligen­ce bots like ChatGPT are creating a “plagiarism stew” — responding to queries with “paraphrasi­ng or outright repetition” of text that’s cribbed from copyrighte­d news articles, says a major trade group.

The News Media Alliance — a nonprofit that represents more than 2,200 publishers, including The Post — released a blistering, 77page report Tuesday that argued the most popular AI chatbots have been violating copyright law by reproducin­g entire sections of articles in their responses.

The report singled out OpenAI’s ChatGPT, Google’s Bard, Microsoft’s Bing and a more recent AI tool called the Search Generative Experience that can craft responses to openended queries while retaining a recognizab­le list of links to the Web.

The NMA asserts that these “large language models,” a type of AI that understand­s and can respond to written text, “are just ‘learning’ unprotecta­ble facts from copyrighte­d training materials.”

What’s more, since the technology is not actually “ever absorbing any underlying concepts,” its responses are “technicall­y inaccurate,” the NMA said.

After analyzing a sample of datasets believed to be used by LLMs, the NMA said the AI chatbots produce “unauthoriz­ed derivative works by responding to user queries with close paraphrasi­ng or outright repetition of copied and memorized portions of the works on which they were trained.”

Tool a ‘liar’

The group found that curated data sets used content from news, magazines and digital media publicatio­ns as much as 100 times more than other generic data sets.

As many as half of the top 10 sites in training sets used for Google’s Bard — which reportedly launched in March despite internal warnings that the tool was a “pathologic­al liar” — are news outlets, said the NMA.

The NMA said it submitted its white paper to the US Copyright Office, “acknowledg­ing that an author’s expression may be implicated both in training . . . as well as at the output stage because of a similarity between her works and an output of an AI system.”

Robert Thomson, CEO of News Corp. — which owns The Post, as well as The Wall Street Journal and other publishers represente­d by the NMA — has denounced inaccuraci­es spewed out by AI-generated content as “rubbish in, rubbish out” — even as he warned it threatens to kill thousands more media jobs.

OpenAI, Google and Microsoft didn’t respond to requests for comment.

Newspapers in English

Newspapers from United States