Text2image: If you can type it, you can see it

Some worry AI will cost artists their jobs; others see it as a chance to inform and enhance their work

2022-10-23 - CONTRIBUTOR

By Dwight Silverman

When Boris Dayma moved to Houston from Brazil for an oil and gas job more than a decade ago, he didn’t expect the two things that would dramatically change his life. First, he got married.

Next, he taught himself to write software code for artificial intelligence, then created a cutting-edge form of AI that, in June, went viral on the internet.

Suddenly, Dayma’s website — now called Craiyon — was the toast of social media and an example of a form of AI developing so rapidly that it’s tough to track the advances.

On Craiyon, visitors enter words to generate images that are sometimes realistic, sometimes whimsical and occasionally disturbing. At its peak last summer, Dayma, 35, said he spent most of his time just trying to keep Craiyon from cratering under the weight of its popularity.

“Now, it’s settled down to about half a million to a million prompts (entered by visitors) a day,” he said during an interview at his Montrose home.

Dayma’s creation is known as text2image. While Craiyon’s results clearly aren’t realistic, other models can output images so advanced that it’s hard to tell them from photographs or humancreated art, generating both excitement and anxiety.

In the art community, some worry such technology will cost artists their jobs. But others see it as an opportunity to inform and enhance their work.

In some instances, researchers developing these AI models have hesitated to release them because of the potential for abuse. Hany Farid, a computer science professor and digital forensics expert at the University of California at Berkeley, said the growing problem of computer-generated fake photos, video and audio of people that are real enough to fool many is exacerbated by the speed of AI developments.

“The single most abusive application is nonconsensual sexual imagery. Some people call it nonconsensual porn,” Farid said. “You can take a person’s likeness and insert it into sexual material. Without guardrails, (text2image AI lets you) pick your favorite person that you want to see naked without clothes on, and it will synthesize it for you.”

But that hasn’t stopped others from making their AI models accessible for developing video from text.

Each day seems to bring announcements of breakthrough projects or text2image products released publicly. The pace of AI development prompted the Biden administration to release a

blueprint for an AI Bill of Rights to protect Americans from abuse of the technology.

“I’ve been in the space of image forensics and video forensics for two and a half decades,” Farid said. “Typically, the (development) cycle you could measure in years, right? Now we measure it in months, weeks, sometimes days.”

Bizarre, otherworldly

There are myriad AI models with common interfaces: Users enter text and the model generates images based on those words. Craiyon offers nine images in about a minute.

Craiyon’s model is trained and its images produced, Dayma says, using data from “hundreds of millions” of pictures, captions and metadata scraped from the web.

Craiyon’s results are often surreal. Faces and limbs are distorted — though celebrities are almost always recognizable — and out-of-place artifacts abound. Social media fans loved it, tossing it prompts such as

“an astronaut riding a horse on the moon,” “CCTV camera footage of Dark Vader breakdancing” and “Velociraptors singing karaoke.”

Craiyon began as a project in a July 2021 hackathon. Dayma’s entry placed first, and he left it up online for anyone to access.

It took almost a year for viral posts on Twitter and Reddit to create a flood of users generating often bizarre, otherworldly images.

With the wave of visitors easing, Dayma now has time to focus on the next iteration of Craiyon, including making money from it. He wants to create a paid version that can produce more realistic images, but for now Craiyon makes money with website ads.

Ways to create

Several methods are used by AI to generate images.

GAN, or generative adversarial networks, is an older AI approach trained to output a specific image based on other images. An example of a GAN model can be found at www.thispersondoesnotexist.com. The site, which launched in 2019, generates a realistic human face each time the web page is reloaded. No human matches the face — each is fake — but the site raised alarm bells when it appeared, making it clear AI has progressed to where almost no image can be trusted to be real.

Models known as diffusion and transformer work from a text description. The words are applied to the dataset of images learned by the AI, and one matching the text is eventually constructed. Craiyon uses the transformer model.

Most of the state-ofthe-art text2image projects use diffusion, including Midjourney, Dall-E 2 and an open source model, Stable Diffusion, that puts the ability to build a text2image project into anyone’s hands.

Beginnings

Text2image technology has roots in a model created by a researcher now working at Rice University. Vicente Ordóñez, an associate professor in Rice’s computer science department, published in 2019 a paper and an online demonstration called Text2Scene while working with a research team at the University of Virginia.

The demonstration, accessible at vislang.ai/ text2scene, is more rudimentary than today’s text2image models, with a limited number of cartoon-like images involved in each generated image. But Ordóñez said the paper influenced other projects, including one from Google.

Ordóñez said much of the early research on text2image AI stemmed from work trying to get artificial intelligence to create text that describes an image.

“People believed that going from text to image would be much harder than it is,” Ordóñez said. “But you can see how much progress has been made so far.”

Indeed, new developments in the field have happened in quick succession, both in terms of projects and the impact the technology is having in the real world. Some examples:

• In August, the Colorado State Fair awarded first prize in its art contest for a work that was created using AI. The artwork, depicting a portal opening inside an ornately detailed room, was submitted in a category for digital art. Jason M. Allen told the New York Times he was open about how the piece was created, and the judges stood by their decision. But the award incensed some artists and created a social media backlash. Some artists are concerned about AI’s ability to mimic styles. An artist’s style can’t be copyrighted.

• The first business offering stock art — generic images often used in marketing and graphic design — made completely from a text2image model opened its virtual doors. Many of the available images at StockAI.com are realistic, but others have the quirky distortions seen in Craiyon’s output. The site has a collection of pre-generated images, but if it doesn’t have what a customer wants, it can be generated with the right text prompt.

• Getty Images, the repository of news and commercial photography, announced last month that it would not accept images generated by AI. Getty cited the potential for copyright issues, as AI models use a collection of other images to create their own.

• After worrying that its initial stab at a text2image project could produce offensive imagery, OpenAI in September launched a model capable of very realistic imagery — but with filters and “guardrails” to prevent prompts that would result in biased or offensive results.

• Stable Diffusion has already turned into a free consumer app that runs on newer Apple computers. Owners of Macs with Apple’s M1 or M2 processors can download DiffusionBee, which pulls in the massive dataset needed to generate images without having to go online to a server.

• And coming soon: Text2video. Facebook parent company Meta this month announced Make-a-Video, which generates a few seconds of video from a text prompt. Not to be outdone, Google announced Imagen Video. Neither is available to the public.

Diversity in the art

You can see where AI-generated art is headed in the work of Mary Flanagan, a Dartmouth College art professor, game designer, programmer, writer and author. Flanagan, who has a home base in Houston, has coded her own AI designed to create images, Grace:AI, and trained it on tens of thousands of pieces of art created by women.

Though some artists are upset about AI-generated art putting some of them out of work, Flanagan isn’t among them.

She says AI art will free artists and designers “from the really boring work” so they can concentrate on their own creativity.

As Flanagan became more intrigued with the possibility of AI-generated art, she also grew concerned about images that frequently hewed to stereotypes or that weren’t representative of the population as a whole. She saw this when she prompted an older AI model to create a picture of hands.

“All the hands returned as male. Maybe there was one that just looked male-ish,” she said. “There was one black hand. The rest were white.”

Flanagan got permission from the National Museum of Women in the Arts and the University of Indiana to access 30,000 paintings and drawings from female artists to train Grace:AI.

It was the featured component of an exhibition this year at the Nancy Littlejohn Fine Art gallery in Upper Kirby. Flanagan fed Grace a diet of clouds, then had it generate dozens of images of its own clouds. The computer on which Grace:AI ran sat in the middle of the room, churning out clouds projected on the floor next to it.

Flanagan has an exhibit involving AI-generated art at the Moody Center for the Arts at Rice. “Urban Impressions” features video of AI-generated scenes of trees morphing into buildings, and then into wasted urban landscapes. Flanagan then did paintings inspired by some of the AI images.

The exhibit, which runs through the end of the year, includes a video of AI images chosen from students.

“People believed that going from text to image would be much harder than it is. But you can see how much progress has been made so far.”

Vicente Ordóñez, associate professor at Rice University’s computer science department

?? Karen Warren/Staff photographer ?? Boris Dayma created Craiyon, where visitors enter words to form images that can be realistic, whimsical, even disturbing. — Karen Warren/Staff photographer Boris Dayma created Craiyon, where visitors enter words to form images that can be realistic, whimsical, even disturbing.

?? Screenshot ?? Craiyon generates distorted images with the prompt, “woman singing to a baby in a photorealistic style.” — Screenshot Craiyon generates distorted images with the prompt, “woman singing to a baby in a photorealistic style.”

?? Screenshot ?? A prompt that began, “Twas in another lifetime, one of toil and blood,” produced this text2image picture. — Screenshot A prompt that began, “Twas in another lifetime, one of toil and blood,” produced this text2image picture.

Text2image: If you can type it, you can see it

Some worry AI will cost artists their jobs; others see it as a chance to inform and enhance their work

Bizarre, otherworldly

Ways to create

Beginnings

Diversity in the art

Newspapers in English

Newspapers from United States

Text2image: If you can type it, you can see it

Some worry AI will cost artists their jobs; others see it as a chance to inform and enhance their work

Bizarre, otherworld­ly

Ways to create

Beginnings

Diversity in the art

Newspapers in English

Newspapers from United States

Bizarre, otherworldly