San Francisco Chronicle

AI firms would reveal data sources under plan

- By Chase DiFelician­tonio Reach Chase DiFelician­tonio: chase.difelician­tonio@sfchronicl­e.com; Twitter: @ChaseDiFel­ice

The days of artificial intelligen­ce companies sweeping up endless data, copyrighte­d or not, could be coming to an end.

That is if Rep. Adam Schiff, DBurbank, gets his way after introducin­g a bill Tuesday that would force AI companies to say where they got the reams of data needed to make their super smart chatbots and image generators.

The bill faces an uphill battle in Congress. But if passed into law, it would wade into a developing area of the law and set rules for how AI systems can and can’t be trained. And it would potentiall­y put limits on the breakneck speed at which companies including OpenAI are moving to build ever-better digital brains.

“AI has the disruptive potential of changing our economy, our political system, and our day-today lives,” Schiff, who is running for one of California’s U.S. Senate seats, said in a statement. “We must balance the immense potential of AI with the crucial need for ethical guidelines and protection­s.”

The Generative AI Copyright Disclosure Act would require companies to alert the government prior to releasing a new generative AI system, outlining “all copyrighte­d works used in building or altering the training dataset for that system.”

The rules would also be retroactiv­e, meaning AI companies would have to divulge where they got the millions, and in some cases billions and trillions, pieces of training data used to train their existing models.

AI programs such as OpenAI’s GPT series are finely tuned probabilit­y machines that learn from being fed essentiall­y everything on the internet and then some. That allows them to recognize patterns in language and images and produce fluent responses to prompts as if a user were talking to someone who knows a bit about everything.

The more training data, the smarter the program.

But companies including OpenAI, Anthropic and others are facing lawsuits that say they have run roughshod over copyright rules and unfairly used data — including books, images and songs — that didn’t belong to them.

The companies have argued they are protected under fair use rules, which allow unlicensed use of copyrighte­d materials for certain purposes, like free expression, under the law. But it’s far from a settled matter. The New York Times which is among those suing AI companies alleging copyright infringeme­nt, recently reported OpenAI and others may have knowingly skirted rules, and possibly the law, in guzzling training data in an effort to win the AI arms race and build the best machine.

TechNet, a trade group that lobbies on behalf of tech companies like Meta and Google with huge investment­s in AI, said last year that the kinds of disclosure­s Schiff ’s bill calls for would kneecap the U.S. edge in AI technology.

Enforcing rules around how AI models are trained would be bad for business, and could force companies to take their business “to other jurisdicti­ons with more innovation-friendly legal frameworks,” TechNet said in a letter to the U.S. Copyright Office in October.

The Times’ lawsuit against OpenAI alleges the ChatGPT bot regurgitat­ed whole pieces of articles and reviews that appeared on its site in violation of its copyright.

So does a suit against Anthropic, another core AI developer in San Francisco, filed last year by Universal Music and more than a dozen other music publishers.

A letter from the Artists Rights Alliance, which includes Billie Eilish, Nicki Minaj and many others, called for a halt to the use of AI in the music industry, framing it as a threat to creativity and the future of music.

Michael Chabon and other authors have also sued OpenAI and Meta, alleging their copyrighte­d works were unfairly used without compensati­on to train the companies’ AI programs.

Schiff’s announceme­nt included statements of support from a range of creative industries, including the Recording Industry Associatio­n of America, the Directors Guild of America, multiple writers’ guilds, and SAG-AFTRA.

“The Directors Guild of America commends this commonsens­e legislatio­n, which is an important first step toward enabling filmmakers to protect their intellectu­al property from the potential harms caused by generative AI,” said Lesli Linka Glatter, president of the Directors Guild of America, in a statement.

California also has a number of AI safety bills wending their way through Sacramento this session, on topics from discrimina­tion to deepfakes.

Perhaps the marquee bill of the session on AI, by state Sen. Scott Wiener, D-San Francisco, would require some large AI companies to safety-test their models before releasing them to the public. They could face fines and other penalties if their technology causes harm or otherwise runs awry.

 ?? Jose Luis Magana/Associated Press ?? Rep. Adam Schiff, D-Burbank, speaks to reporters on Dec. 19, 2022. A proposal from Schiff would require AI companies to disclose what data they used to train their programs.
Jose Luis Magana/Associated Press Rep. Adam Schiff, D-Burbank, speaks to reporters on Dec. 19, 2022. A proposal from Schiff would require AI companies to disclose what data they used to train their programs.

Newspapers in English

Newspapers from United States