The Japan News by The Yomiuri Shimbun

Child porn images found within training dataset for generative AI

-

Images of child pornograph­y were found among a vast amount of data being used to train image generative AI, it has been learned. The materials include at least one image from a photograph­y book that was banned by the National Diet Library for “possibly being illegal child pornograph­y.”

A number of other images of naked children were also found in the data. The images were believed to have gotten into a dataset while being collected from the internet. There are filters to exclude such images, but it is said to be impossible to catch them all.

Image generation AI is used to produce illustrati­ons or photo-like images from texts. One of the most popular models, Stable Diffusion, uses a training dataset publicly available on the internet.

The Yomiuri Shimbun examined the dataset in December, and found data from the photo book published in 1993 that included a naked girl.

Back then, there were no laws regulating the publicatio­n of such books. It was made illegal in 1999 with the enactment of a law covering child prostituti­on and child pornograph­y, which prohibits the publicatio­n of sexual images of children under 18. In 2006, the National Diet Library banned access to the photo book on the grounds that it may constitute child pornograph­y.

Along with the image in that book, the dataset contained a number of others of naked children.

According to Stability AI

Ltd., the British startup that developed Stable Diffusion, the dataset used to train its product was provided by a German nonprofit organizati­on that mechanical­ly collected about 5.8 billion images off the internet. That appears to be how explicit materials, including the one from the photo book, made their way into the dataset.

The dataset has a filter to exclude illegal images during use, and Stability AI said it is using this function. However, in February, a company affiliated with Stability AI revealed that it found explicit child images that could not be filtered out.

In December, Stanford University's Cyber Policy Center announced that it identified 3,226 images in the dataset that it suspected of being what it terms “child sexual abuse materials,” stating that the presence of such materials likely exerts “influence” on the output of the model.

Stability AI did not respond to inquiries about the possible failure to filter out illegal images.

“It is difficult to completely eliminate illegal images with the current technology,” said Atsuo Kishimoto, director of the Osaka University Research Center on Ethical, Legal and Social Issues. “If child pornograph­y is included in the machine training data, it could be regarded as an infringeme­nt of the victim's human rights.”

Kishimoto added any company developing AI has a social responsibi­lity to implement countermea­sures and explain what kind of data is used for the machine learning. (March 22)

 ?? ??

Newspapers in English

Newspapers from Japan