AI news stories spur copyright concerns

2024-05-12 - By Chase DiFeliciantonio Reach Chase DiFeliciantonio: chase.difeliciantonio@ sfchronicle.com; Twitter: @ChaseDiFelice

“Most casual readers will assume that

the fake name on there is a real person. That’s skating on the road to deception. It’s really, really bad.”

Lance Knoble, CEO and co-founder of Cityside

Media industry fears that artificial intelligence will crush the value of exclusive news articles by quickly ingesting and regurgitating them on competing sites are no longer a futuristic concern. Hoodline, a San Francisco company that runs a national network of “hyperlocal” news sites, is aggressively embracing the practice, replete with fictitious author bylines.

Under the ownership of Impress3 Media since 2020, Hoodline’s sites are using generative AI to write entire news stories to rapidly generate online content, according to CEO Zack Chen.

The company has sites serving dozens of metro areas and reaches millions of readers per month, Chen said in an email.

Chen declined to say which AI software Hoodline is using to generate the news stories, describing it as “a number of different third-party software both AI-powered and programmatic, as well as in-house/custom-built systems.” He also declined to elaborate on what news sources he is feeding into his programs to generate content.

Some of the stories produced by Hoodline closely resemble those from other outlets, the Chronicle found. This raises questions about copyright infringement and whether the latest tech-driven disruption in the struggling local news business could be yet another blow to readership and revenue for conventional publishers.

Chen originally disclosed the use of AI after news site the Gazetteer asked in April about the seemingly fake photos and bios of some Hoodline writers. Since then, Hoodline stories generated by AI carry a small badge near the top, while the apparently AI-generated photos no longer appear.

Chen said articles are produced by an “In-House Writing Collective” and carry author bylines. But instead of humans, he conceded, these are pen names for AI-produced stories that “are not associated with any individual live journalist or editor.”

He said in an email that the company has been using AI for at least 10 months, and that not all the site’s content is AI-produced.

Most news publishers have reporters who pursue stories originally published by competitors and who rewrite them while trying to offer more details and a fresh take. Media startups such as Buzzfeed and the Huffington Post soared to huge valuations by adapting the practice to the internet, where they repackaged material optimized for search and social media.

Chen argues that his software does essentially the same thing as those aggregators and thus does not break copyright laws. He compared the stories written with AI to “reblogs” that other sites write. He said that “our content is inherently unique and fresh because we are providing background that only we have combined into that specific topic being covered,” and that any reused content is attributed.

The stories on Hoodline may be churned out by a machine, but Chen said he hopes the resulting traffic and revenue will help him hire more experienced, full-time journalists for the site. Chen said the company employs dozens of people in news gathering and is looking to hire what he called investigative journalists.

Chen said in the email that SFist, which Impress3 also owns, “does not use AI in its newsroom. It has had largely the same writing-editing staff for many years now, even before we acquired it.”

Chen declined to address a question about revenue. He said the company is not eliminating jobs, and employs more people “on the publishing side” than it ever has.

He said the site “employs dozens of people full-time, as well as freelance journalists” on the West Coast and East Coast and some people abroad, adding the site is “actively seeking additional investigative journalist folks in many locations.”

He said local editors review and publish AI and non-AI produced stories. Chen also refers to a team of “journalists researchers” who work on AI-produced stories and do reporting, with the finished product reviewed by editors.

Chen also said no “complaints or legal action has been taken against us as a result of the use of AI, to my knowledge.”

Whether taking published material and reposting it with some changes is a copyright violation — generated by AI or not — can be hard to pinpoint and depends on a few factors, said Puya Partow-Navid, an intellectual property attorney at Seyfarth Shaw LLP.

Generic source material, such as weather or police reports, cannot be copyrighted, he said. But creative expression can be, and the deciding factor in most copyright cases is “how much creative expression you’ve added,” Partow-Navid said.

Facts are not copyrightable, he noted, so writing a story that resembles another one about the weather likely wouldn’t be infringement. But “if you’re taking the entire style and the commentary and the writers’ thoughts on the matter” and reproducing them, that likely would be infringement.

One story reported first by the Chronicle showed up on Hoodline’s website shortly afterward, crediting the publication, and carrying the byline of “Tony Ng.” The Hoodline story culled quotes and other information from the original Chronicle story, as well as from a story on the same subject on SFist.

Copyright infringement or not, Hoodline’s use of AI is “somewhere on the spectrum from ludicrous to reprehensible,” said Lance Knoble, CEO and co-founder of Cityside, the nonprofit parent company and publisher of news sites Oaklandside and Berkeleyside.

Despite the small “AI” badge, “most casual readers will assume that the fake name on there is a real person,” Knoble said. “That’s skating on the road to deception. It’s really, really bad.”

He said he was not against all uses of AI in journalism, and could see a use for the technology to speed his reporters’ work. Cityside has taken grant money from OpenAI to study how to use the technology in its fundraising efforts, and Knoble said the organization is discussing with its unionized workforce about ways to use the technology for news gathering.

Still, the practice of socalled engagement farming is proof that “journalism is a broken business,” said Stuart Schuffman, who runs the Broke-Ass Stuart website, which he said largely relies on donations through his Patreon.

Instead of churning out computer-produced content, Schuffman has instead doubled down on hyperlocal reporting, and removed the banner ads from his site. He said he hasn’t seen his own content pop up in a slightly altered form on Hoodline’s site but doesn’t like what they are doing.

“That s— is crazy, it’s so dishonest,” Schuffman said.

Hoodline is not the first to wade into AI-generated journalism.

Sports Illustrated owner Arena Group ended up firing the magazine’s publisher in December after the struggling brand was caught using AI-generated stories and tagging them with fictitious author names.

X owner Elon Musk has floated an idea that could get around copyright issues altogether, according to independent tech reporter Alex Kantrowitz. Instead of running actual news stories though AI bots, Musk reportedly wants to run the chatter about those news stories though his AI program, Grok, to summarize what people are saying about the news.

Google is also reportedly working on an AI program that can take in details of news events and produce news stories automatically.

Beyond producing news stories with AI, some outlets including the Financial Times, the Associated Press, Politico owner Axel Springer and others have inked licensing deals with OpenAI to allow their archives to be used in training AI programs.

At the same time, other outlets, including the New York Times and more recently the San Jose Mercury News, have sued OpenAI, saying the company illegally used their copyrighted material to train its chatbot programs.

But training issues aside, is there anything publishers can do to stop the AI-driven republishing of their stories?

Sending a stern letter to stop what appears to be copyright infringement would be a first step, said Daren Orzechowski, a partner with law firm A&O Shearman. But, he said, a publisher would have to demonstrate “there’s substantial similarity between the two pieces” beyond the basic facts of a story.

Debates over aggregating and reposting content are not new.

“There seems to be a cycle of this sort of behavior,” said Knoble, the Cityside publisher, referring to the Huffington Post’s tactics long before AI.

Schuffman isn’t buying that using AI to write stories will lead to people like Chen suddenly expanding newsrooms around the country.

“People don’t want to pay for robots to write this stuff,” he said. “They want to pay for me to pay other people.”

AI news stories spur copyright concerns

Newspapers in English

Newspapers from United States