National Post

ADOBE’S ‘ETHICAL’ FIREFLY USED AI-GENERATED IMAGES FROM RIVALS FOR TRAINING

COMPANY PROMOTES ITS TOOL AS SAFE FROM CONTENT SCRAPED FROM INTERNET

- Rachel Metz Brody Ford and

When Adobe Inc. released its Firefly image-generating software last year, the company said the artificial intelligen­ce model was trained mainly on Adobe Stock, its database of hundreds of millions of licensed images. Firefly, Adobe said, was a “commercial­ly safe” alternativ­e to competitor­s like Midjourney, which learned by scraping pictures from across the internet.

But behind the scenes, Adobe also was relying in part on Ai-generated content to train Firefly, including from those same AI rivals. In numerous presentati­ons and public posts about how Firefly is safer than the competitio­n due to its training data, Adobe never made clear that its model actually used images from some of these same competitor­s.

Massive amounts of data are needed to train AI models underlying popular content creation products, and there is increasing scrutiny on AI technology companies over their use of copyrighte­d materials in this process. Companies like Midjourney, Dall-e maker Openai and Stable Diffusion maker Stability AI built their media-generating models with datasets that pull imagery from across the internet, a practice that has led to outrage and lawsuits from a number of artists.

“This shows the murkiness of the definition of responsibl­e AI, and it also illustrate­s the difficulti­es of getting away from, if not the legal, then the social and cultural problems, or ethical problems, with generated content,” said Luke Stark, an assistant professor at Western University in Ontario, who studies the social and ethical impacts of AI.

Adobe’s decision to build Firefly with content the company holds the rights to and that in the public domain was meant to differenti­ate its AI image tool in the fast-growing market for generative artificial intelligen­ce. The company promoted it as a more ethical, legally sound option for customers interested in conjuring images from just a few words but wary of potential copyright issues. It won’t generate content based on the intellectu­al property of other people or brands, Adobe has said, and will avoid producing harmful images, too.

Ai-generated content made it into Firefly’s training set because creators were allowed to submit millions of images into Adobe’s stock marketplac­e that used the technology from other companies. “Generative AI images from the Adobe Stock collection are a small part of the Firefly training dataset,” wrote Adobe representa­tive Michelle Haarhoff in September on a Discord group for photograph­ers and artists who contribute to the marketplac­e.

Adobe said a relatively small amount — about 5 per cent — of the images used to train its AI tool was generated by other AI platforms. “Every image submitted to Adobe Stock, including a very small subset of images generated with AI, goes through a rigorous moderation process to ensure it does not include IP, trademarks, recognizab­le characters or logos, or reference artists’ names,” a company spokespers­on said.

Criticism of the practice has come from inside the company: Since the early days of Firefly, there has been internal disagreeme­nt on the ethics and optics of ingesting Ai-generated imagery into the model, according to multiple employees familiar with its developmen­t who asked not to be named because the discussion­s were private. Some have suggested weaning the system off generated images over time, but one of the people said there are no current plans to do so.

Adobe has taken shots at competitor­s over their data collection practices. Other models are built on data that is “openly scraped,” chief strategy officer Scott Belsky said last year. One way that Firefly is better than Openai’s comparable model is because it shows respect for the creative community by training only on licensed or freely available data, Adobe says on its website. And in a blog post last March titled “Responsibl­e Innovation in the Age of Generative AI,” general counsel Dana Rao pointed out that generative AI “is only as good as the data on which it’s trained.”

“Training on curated, diverse datasets inherently gives your model a competitiv­e edge when it comes to producing commercial­ly safe and ethical results,” he wrote, while pointing out that Adobe trained Firefly on Adobe stock images, licensed content and public domain content in which the copyright has run out.

“Our enterprise customers came to us when we launched Firefly and said, ‘We love what you’re doing, we really appreciate that you’re not stealing all of our intellectu­al property out on the open internet,’ ” Ashley Still, an Adobe senior vice-president, said earlier this month during a Bloomberg Intelligen­ce event.

Still, Adobe never made clear publicly that Firefly had trained in part on images from competitor­s’ tools that are supposedly less ethical. It did, however, outline such details in at least two online discussion groups the company runs on Discord — one for Adobe Stock and another devoted to Firefly — according to messages Bloomberg has viewed.

In March 2023, Adobe unveiled Firefly as a “beta” product. That month, Raúl Cerón, who works with the Adobe Stock community, posted on Discord that the company wasn’t planning to use generated images to train the forthcomin­g public version of Firefly.

“Once we go live out of beta, we will have a new training database for it, leaving Gen AI content out of it,” he wrote in a post in June.

When Adobe announced the public release of Firefly on Sept. 13, the company also paid a special “Firefly bonus” to Adobe Stock contributo­rs “whose content was used to train the first commercial Firefly model.” Contributo­rs who used generative AI were among those who received the bonus payment, according to a Discord message from Mat Hayward, who also works with the Adobe Stock community.

Ai-generated imagery in Adobe Stock “enhances our data-set training model, and we decided to include this content for the commercial­ly released version of Firefly,” Hayward wrote.

Brian Penny, a writer and stock image contributo­r who has submitted thousands of Ai-generated images — mostly made with Midjourney — to Adobe Stock, was surprised to get the bonus. He figured as an AI contributo­r he wouldn’t be eligible. Despite the financial gain, Penny thinks the decision to train Firefly on content such as his is a bad one, and said the company should be more candid about how it’s training the software for creating images.

“They need to be ethical, they need to be more transparen­t, they need to do more,” he said.

Adobe Stock’s library has boomed since it began formally accepting AI content in late 2022. Today, there are about 57 million images, or about 14 per cent of the total, tagged as Ai-generated images. Artists who submit AI images must specify that the work was created using the technology, though they don’t need to say which tool they used. To feed its AI training set, Adobe has also offered to pay for contributo­rs to submit a mass amount of photos for AI training — such as images of bananas or flags.

Training on Ai-generated content probably wouldn’t make Adobe’s Firefly image generator less commercial­ly safe, and the company isn’t required to say what it’s training on as long as it isn’t misleading consumers, said Harvard professor Rebecca Tushnet, who focuses on copyright and advertisin­g law. But training on AI images, such as those created by Midjourney, undermines the idea that Firefly is unique from competing services, she said.

“Adobe basically wants to position itself as the superior alternativ­e, but it also wants really cheap inputs, and AI is a really good way to get cheap inputs,” she said.

ADOBE BASICALLY WANTS TO POSITION ITSELF AS THE SUPERIOR ALTERNATIV­E, BUT IT ALSO WANTS REALLY CHEAP INPUTS, AND AI IS A REALLY GOOD WAY TO GET CHEAP INPUTS.

— HARVARD PROFESSOR REBECCA TUSHNET, WHO FOCUSES ON COPYRIGHT AND ADVERTISIN­G LAW

THIS SHOWS THE MURKINESS OF THE DEFINITION OF RESPONSIBL­E AI.

 ?? DAVID BECKER / AP IMAGES FOR ADOBE ?? David Wadhwani, president of digital media at Adobe, gives a presentati­on on the firm’s Firefly image-generating software at a company summit in Las Vegas.
DAVID BECKER / AP IMAGES FOR ADOBE David Wadhwani, president of digital media at Adobe, gives a presentati­on on the firm’s Firefly image-generating software at a company summit in Las Vegas.

Newspapers in English

Newspapers from Canada