Los Angeles Times (Sunday)

Explicit child photos hidden inside AI image generators

- By Matt O’Brien and Haleluya Hadero

Hidden inside the foundation of popular artificial intelligen­ce image generators are thousands of images of child sexual abuse, according to a new report that urges companies to take action to address a harmful flaw in the technology they built.

Those same images have made it easier for AI systems to produce realistic and explicit imagery of fake children as well as transform social media photos of fully clothed real teens into nudes, much to the alarm of schools and law enforcemen­t around the world.

Until recently, anti-abuse researcher­s thought the only way that some unchecked AI tools produced abusive imagery of children was by essentiall­y combining what they’ve learned from two separate buckets of online images — adult pornograph­y and benign photos of kids.

But the Stanford Internet Observator­y found more than 3,200 images of suspected child sexual abuse in the Large-scale Artificial Intelligen­ce Open Network, a giant AI database index of online images and captions that’s been used to train leading AI image makers such as Stable Diffusion. The watchdog group based at Stanford University worked with the Canadian Center for Child Protection and other anti-abuse charities to identify the illegal material and report the original photo links to law enforcemen­t.

The response was immediate. On the eve of the Wednesday release of the Stanford Internet Observator­y’s report, LAION told the Associated Press it was temporaril­y removing its data sets.

LAION said in a statement that it “has a zero tolerance policy for illegal content and in an abundance of caution, we have taken down the LAION data sets to ensure they are safe before republishi­ng them.”

Although the images account for just a fraction of LAION’s index of some 5.8 billion images, the Stanford group says it probably is influencin­g the ability of AI tools to generate harmful outputs and reinforcin­g the prior abuse of real victims who appear multiple times.

It’s not an easy problem to fix, and it traces back to many generative AI projects being “effectivel­y rushed to market” and made widely accessible because the field is so competitiv­e, said Stanford Internet Observator­y’s chief technologi­st, David Thiel, who wrote the report.

“Taking an entire internet-wide scrape and making that data set to train models is something that should have been confined to a research operation, if anything, and is not something that should have been opensource­d without a lot more rigorous attention,” Thiel said in an interview.

A prominent LAION user that helped shape the data set’s developmen­t is London-based startup Stability AI, maker of the Stable Diffusion text-to-image models. New versions of Stable Diffusion have made it much harder to create harmful content, but an older version introduced last year — which Stability AI says it didn’t release — is still baked into other applicatio­ns and tools and remains “the most popular model for generating explicit imagery,” according to the Stanford report.

“We can’t take that back. That model is in the hands of many people on their local machines,” said Lloyd Richardson, director of informatio­n technology at the Canadian Center for Child Protection, which runs Canada’s hotline for reporting online sexual exploitati­on.

Stability AI on Wednesday said it hosts only filtered versions of Stable Diffusion and that “since taking over the exclusive developmen­t of Stable Diffusion, Stability AI has taken proactive steps to mitigate the risk of misuse.”

“Those filters remove unsafe content from reaching the models,” the company said in a prepared statement. “By removing that content before it ever reaches the model, we can help to prevent the model from generating unsafe content.”

LAION was the brainchild of a German researcher and teacher, Christoph Schuhmann, who told the AP earlier this year that part of the reason to make such a huge visual database publicly accessible was to ensure that the future of AI developmen­t isn’t controlled by a handful of powerful companies.

“It will be much safer and much more fair if we can democratiz­e it so that the whole research community and the whole general public can benefit from it,” he said.

Much of LAION’s data comes from another source, Common Crawl, a repository of data constantly trawled from the open internet, but Common Crawl’s executive director, Rich Skrenta, said it was “incumbent on” LAION to scan and filter what it took before making use of it.

LAION said last week that it developed “rigorous filters” to detect and remove illegal content before releasing its data sets and is still working to improve those filters. The Stanford report acknowledg­ed LAION’s developers made some attempts to filter out “underage” explicit content but might have done a better job had they consulted earlier with child-safety experts.

Many text-to-image generators are derived in some way from the LAION database, though it’s not always clear which ones. OpenAI, the maker of DALL-E and ChatGPT, said it doesn’t use

LAION and has fine-tuned its models to refuse requests for sexual content involving minors.

Google built its text-toimage Imagen model based on a LAION data set but decided against making it public in 2022 after an audit of the database “uncovered a wide range of inappropri­ate content including pornograph­ic imagery, racist slurs, and harmful social stereotype­s.”

Trying to clean up the data retroactiv­ely is difficult, so the Stanford Internet Observator­y is calling for more drastic measures. One is for anyone who’s built training sets off of LAION-5B — named for the more than 5 billion imagetext pairs it contains — to “delete them or work with intermedia­ries to clean the material.” Another is to, in effect, make an older version of Stable Diffusion disappear from all but the darkest corners of the internet.

“Legitimate platforms can stop offering versions of it for download,” particular­ly if they are frequently

 ?? Camilla Mendes dos Santos Associated Press ?? DAVID THIEL, chief technologi­st at the Stanford Internet Observator­y, wrote its report that found images of child sexual abuse in AI image generators.
Camilla Mendes dos Santos Associated Press DAVID THIEL, chief technologi­st at the Stanford Internet Observator­y, wrote its report that found images of child sexual abuse in AI image generators.

Newspapers in English

Newspapers from United States