GRAND THEFT: WORDS
Media vs. AI bots
Artificial intelligence bots like ChatGPT are creating a “plagiarism stew” — responding to queries with “paraphrasing or outright repetition” of text that’s cribbed from copyrighted news articles, says a major trade group.
The News Media Alliance — a nonprofit that represents more than 2,200 publishers, including The Post — released a blistering, 77page report Tuesday that argued the most popular AI chatbots have been violating copyright law by reproducing entire sections of articles in their responses.
The report singled out OpenAI’s ChatGPT, Google’s Bard, Microsoft’s Bing and a more recent AI tool called the Search Generative Experience that can craft responses to openended queries while retaining a recognizable list of links to the Web.
The NMA asserts that these “large language models,” a type of AI that understands and can respond to written text, “are just ‘learning’ unprotectable facts from copyrighted training materials.”
What’s more, since the technology is not actually “ever absorbing any underlying concepts,” its responses are “technically inaccurate,” the NMA said.
After analyzing a sample of datasets believed to be used by LLMs, the NMA said the AI chatbots produce “unauthorized derivative works by responding to user queries with close paraphrasing or outright repetition of copied and memorized portions of the works on which they were trained.”
Tool a ‘liar’
The group found that curated data sets used content from news, magazines and digital media publications as much as 100 times more than other generic data sets.
As many as half of the top 10 sites in training sets used for Google’s Bard — which reportedly launched in March despite internal warnings that the tool was a “pathological liar” — are news outlets, said the NMA.
The NMA said it submitted its white paper to the US Copyright Office, “acknowledging that an author’s expression may be implicated both in training . . . as well as at the output stage because of a similarity between her works and an output of an AI system.”
Robert Thomson, CEO of News Corp. — which owns The Post, as well as The Wall Street Journal and other publishers represented by the NMA — has denounced inaccuracies spewed out by AI-generated content as “rubbish in, rubbish out” — even as he warned it threatens to kill thousands more media jobs.
OpenAI, Google and Microsoft didn’t respond to requests for comment.