The Guardian (USA)

Can AI image generators be policed to prevent explicit deepfakes of children?

- Alex Hern UK technology editor

Child abusers are creating AI-generated “deepfakes” of their targets in order to blackmail them into filming their own abuse, beginning a cycle of sextortion that can last for years.

Creating simulated child abuse imagery is illegal in the UK, and Labour and the Conservati­ves have aligned on the desire to ban all explicit AI-generated images of real people.

But there is little global agreement on how the technology should be policed. Worse, no matter how strongly government­s take action, the creation of more images will always be a press of a button away – explicit imagery is built into the foundation­s of AI image generation.

In December, researcher­s at Stanford University made a disturbing discovery: buried among the billions of images making up one of the largest training sets for AI image generators was hundreds, maybe thousands, of instances of child sexual abuse material (CSAM).

There may be many more. Laion (Large-scale AI Open Network), the dataset in question, contains about 5bn images. With half a second a picture, you could perhaps look at them all in a lifetime – if you’re young, fit and healthy and manage to do away with sleep. So the researcher­s had to scan the database automatica­lly, matching questionab­le images with records kept by law enforcemen­t, and teaching a system to look for similar photos before handing them straight to the authoritie­s for review.

In response, Laion’s creators pulled the dataset from download. They had never actually distribute­d the images in question, they noted, since the dataset was technicall­y just a long list of URLs to pictures hosted elsewhere on the internet. Indeed, by the time the Stanford researcher­s ran their study, almost a third of the links were dead; how many of them in turn once contained CSAM is hard to tell.

But the damage has already been done. Systems trained on Laion-5B, the specific dataset in question, are in regular use around the world, with the illicit training data indelibly burned into their neural networks. AI image generators can create explicit content, of adults and children, because they have seen it.

Laion is unlikely to be alone. The dataset was produced as an “open source” product, put together by volunteers and released to the internet at large to power independen­t AI research. That, in turn, means it was widely used to train open source models, including Stable Diffusion, the image generator that, as one of the breakthrou­gh releases of 2022, helped kickstart the artificial intelligen­ce revolution. But it also meant that the entire dataset was available in the open, for anyone to explore and examine.

The same is not true for Laion’s competitio­n. OpenAI, for instance, provides only a “model card” for its DallE 3 system, which states that its pictures were “drawn from a combinatio­n of publicly available and licensed sources”.

“We have made an effort to filter the most explicit content from the training data for the Dall-E 3 model,” the company says. Whether those efforts worked must be taken on trust.

The vast difficulty in guaranteei­ng a completely clean dataset is one reason why organisati­ons like OpenAI argue for such limitation­s in the first place. Unlike Stable Diffusion, it is impossible to download Dall-E 3 to run on your own hardware. Instead, every request must be sent through the company’s own systems. For most users, an added layer places ChatGPT in the middle, rewriting requests on the fly to provide more detail for the image generator to work with.

That means OpenAI, and rivals such as Google with a similar approach, have extra tools to keep their generators clear: limiting which requests can be sent and filtering generated images before they are sent to the end user. AI safety experts say this is a less fragile way of approachin­g the problem than solely relying on a system that has been trained never to create such images.

For “foundation models”, the most powerful, least constraine­d products of the AI revolution, it isn’t even clear that a fully clean set of training data is useful. An AI model that has never been shown explicit imagery may be unable to recognise it in the real world, for instance, or follow instructio­ns about how to report it to the authoritie­s.

“We need to keep space for open source AI developmen­t,” said Kirsty Innes, the director of tech policy at Labour Together. “That could be where the best tools for fixing future harms lie.”

In the short term, the focus of the proposed bans is largely on purposebui­lt tools. A policy paper co-authored by Innes suggested taking action only against the creators and hosts of single-purpose “nudificati­on” tools. But in the longer term, the fight against explicit AI images will face similar questions to other difficulti­es in the space: how do you limit a system you do not fully understand?

lis who harmed Palestinia­ns and their property. This finding comes from Yesh Din, an Israeli human rights group that has investigat­ed the way Israeli law enforcemen­t treats these settler attacks. It found that between 2005 and 2021, just 3% of ideologica­lly motivated cases resulted in a conviction.

Demolition is also a key part of settlement. Israeli authoritie­s regularly destroy and confiscate Palestinia­nowned property. They also prohibit constructi­on by Palestinia­ns while issuing permits to Israelis. About 24,300 housing units for Israeli settlement­s in the West Bank were advanced last year. In March 2024, after plans to build a further 3,476 settler homes were announced, the UN human rights chief, Volker Türk, condemned the move, stating that the constructi­ons “fly in the face of internatio­nal law”.

 ?? ?? Simulated child abuse imagery is banned in the UK; Labour and the Conservati­ves want to ban all explicit AI-generated images of real people. Photograph: sarah5/Getty Images/ iStockphot­o
Simulated child abuse imagery is banned in the UK; Labour and the Conservati­ves want to ban all explicit AI-generated images of real people. Photograph: sarah5/Getty Images/ iStockphot­o
 ?? AI-generated images of women created as social media influencer­s by an advertisin­g agency in Barcelona. Photograph: Pau Barrena/AFP/Getty Images ??
AI-generated images of women created as social media influencer­s by an advertisin­g agency in Barcelona. Photograph: Pau Barrena/AFP/Getty Images

Newspapers in English

Newspapers from United States