Houston Chronicle Sunday

Text2image: If you can type it, you can see it

Some worry AI will cost artists their jobs; others see it as a chance to inform and enhance their work

- CONTRIBUTO­R

By Dwight Silverman

When Boris Dayma moved to Houston from Brazil for an oil and gas job more than a decade ago, he didn’t expect the two things that would dramatical­ly change his life. First, he got married.

Next, he taught himself to write software code for artificial intelligen­ce, then created a cutting-edge form of AI that, in June, went viral on the internet.

Suddenly, Dayma’s website — now called Craiyon — was the toast of social media and an example of a form of AI developing so rapidly that it’s tough to track the advances.

On Craiyon, visitors enter words to generate images that are sometimes realistic, sometimes whimsical and occasional­ly disturbing. At its peak last summer, Dayma, 35, said he spent most of his time just trying to keep Craiyon from cratering under the weight of its popularity.

“Now, it’s settled down to about half a million to a million prompts (entered by visitors) a day,” he said during an interview at his Montrose home.

Dayma’s creation is known as text2image. While Craiyon’s results clearly aren’t realistic, other models can output images so advanced that it’s hard to tell them from photograph­s or humancreat­ed art, generating both excitement and anxiety.

In the art community, some worry such technology will cost artists their jobs. But others see it as an opportunit­y to inform and enhance their work.

In some instances, researcher­s developing these AI models have hesitated to release them because of the potential for abuse. Hany Farid, a computer science professor and digital forensics expert at the University of California at Berkeley, said the growing problem of computer-generated fake photos, video and audio of people that are real enough to fool many is exacerbate­d by the speed of AI developmen­ts.

“The single most abusive applicatio­n is nonconsens­ual sexual imagery. Some people call it nonconsens­ual porn,” Farid said. “You can take a person’s likeness and insert it into sexual material. Without guardrails, (text2image AI lets you) pick your favorite person that you want to see naked without clothes on, and it will synthesize it for you.”

But that hasn’t stopped others from making their AI models accessible for developing video from text.

Each day seems to bring announceme­nts of breakthrou­gh projects or text2image products released publicly. The pace of AI developmen­t prompted the Biden administra­tion to release a

blueprint for an AI Bill of Rights to protect Americans from abuse of the technology.

“I’ve been in the space of image forensics and video forensics for two and a half decades,” Farid said. “Typically, the (developmen­t) cycle you could measure in years, right? Now we measure it in months, weeks, sometimes days.”

Bizarre, otherworld­ly

There are myriad AI models with common interfaces: Users enter text and the model generates images based on those words. Craiyon offers nine images in about a minute.

Craiyon’s model is trained and its images produced, Dayma says, using data from “hundreds of millions” of pictures, captions and metadata scraped from the web.

Craiyon’s results are often surreal. Faces and limbs are distorted — though celebritie­s are almost always recognizab­le — and out-of-place artifacts abound. Social media fans loved it, tossing it prompts such as

“an astronaut riding a horse on the moon,” “CCTV camera footage of Dark Vader breakdanci­ng” and “Velocirapt­ors singing karaoke.”

Craiyon began as a project in a July 2021 hackathon. Dayma’s entry placed first, and he left it up online for anyone to access.

It took almost a year for viral posts on Twitter and Reddit to create a flood of users generating often bizarre, otherworld­ly images.

With the wave of visitors easing, Dayma now has time to focus on the next iteration of Craiyon, including making money from it. He wants to create a paid version that can produce more realistic images, but for now Craiyon makes money with website ads.

Ways to create

Several methods are used by AI to generate images.

GAN, or generative adversaria­l networks, is an older AI approach trained to output a specific image based on other images. An example of a GAN model can be found at www.thisperson­doesnotexi­st.com. The site, which launched in 2019, generates a realistic human face each time the web page is reloaded. No human matches the face — each is fake — but the site raised alarm bells when it appeared, making it clear AI has progressed to where almost no image can be trusted to be real.

Models known as diffusion and transforme­r work from a text descriptio­n. The words are applied to the dataset of images learned by the AI, and one matching the text is eventually constructe­d. Craiyon uses the transforme­r model.

Most of the state-ofthe-art text2image projects use diffusion, including Midjourney, Dall-E 2 and an open source model, Stable Diffusion, that puts the ability to build a text2image project into anyone’s hands.

Beginnings

Text2image technology has roots in a model created by a researcher now working at Rice University. Vicente Ordóñez, an associate professor in Rice’s computer science department, published in 2019 a paper and an online demonstrat­ion called Text2Scene while working with a research team at the University of Virginia.

The demonstrat­ion, accessible at vislang.ai/ text2scene, is more rudimentar­y than today’s text2image models, with a limited number of cartoon-like images involved in each generated image. But Ordóñez said the paper influenced other projects, including one from Google.

Ordóñez said much of the early research on text2image AI stemmed from work trying to get artificial intelligen­ce to create text that describes an image.

“People believed that going from text to image would be much harder than it is,” Ordóñez said. “But you can see how much progress has been made so far.”

Indeed, new developmen­ts in the field have happened in quick succession, both in terms of projects and the impact the technology is having in the real world. Some examples:

• In August, the Colorado State Fair awarded first prize in its art contest for a work that was created using AI. The artwork, depicting a portal opening inside an ornately detailed room, was submitted in a category for digital art. Jason M. Allen told the New York Times he was open about how the piece was created, and the judges stood by their decision. But the award incensed some artists and created a social media backlash. Some artists are concerned about AI’s ability to mimic styles. An artist’s style can’t be copyrighte­d.

• The first business offering stock art — generic images often used in marketing and graphic design — made completely from a text2image model opened its virtual doors. Many of the available images at StockAI.com are realistic, but others have the quirky distortion­s seen in Craiyon’s output. The site has a collection of pre-generated images, but if it doesn’t have what a customer wants, it can be generated with the right text prompt.

• Getty Images, the repository of news and commercial photograph­y, announced last month that it would not accept images generated by AI. Getty cited the potential for copyright issues, as AI models use a collection of other images to create their own.

• After worrying that its initial stab at a text2image project could produce offensive imagery, OpenAI in September launched a model capable of very realistic imagery — but with filters and “guardrails” to prevent prompts that would result in biased or offensive results.

• Stable Diffusion has already turned into a free consumer app that runs on newer Apple computers. Owners of Macs with Apple’s M1 or M2 processors can download DiffusionB­ee, which pulls in the massive dataset needed to generate images without having to go online to a server.

• And coming soon: Text2video. Facebook parent company Meta this month announced Make-a-Video, which generates a few seconds of video from a text prompt. Not to be outdone, Google announced Imagen Video. Neither is available to the public.

Diversity in the art

You can see where AI-generated art is headed in the work of Mary Flanagan, a Dartmouth College art professor, game designer, programmer, writer and author. Flanagan, who has a home base in Houston, has coded her own AI designed to create images, Grace:AI, and trained it on tens of thousands of pieces of art created by women.

Though some artists are upset about AI-generated art putting some of them out of work, Flanagan isn’t among them.

She says AI art will free artists and designers “from the really boring work” so they can concentrat­e on their own creativity.

As Flanagan became more intrigued with the possibilit­y of AI-generated art, she also grew concerned about images that frequently hewed to stereotype­s or that weren’t representa­tive of the population as a whole. She saw this when she prompted an older AI model to create a picture of hands.

“All the hands returned as male. Maybe there was one that just looked male-ish,” she said. “There was one black hand. The rest were white.”

Flanagan got permission from the National Museum of Women in the Arts and the University of Indiana to access 30,000 paintings and drawings from female artists to train Grace:AI.

It was the featured component of an exhibition this year at the Nancy Littlejohn Fine Art gallery in Upper Kirby. Flanagan fed Grace a diet of clouds, then had it generate dozens of images of its own clouds. The computer on which Grace:AI ran sat in the middle of the room, churning out clouds projected on the floor next to it.

Flanagan has an exhibit involving AI-generated art at the Moody Center for the Arts at Rice. “Urban Impression­s” features video of AI-generated scenes of trees morphing into buildings, and then into wasted urban landscapes. Flanagan then did paintings inspired by some of the AI images.

The exhibit, which runs through the end of the year, includes a video of AI images chosen from students.

“People believed that going from text to image would be much harder than it is. But you can see how much progress has been made so far.”

Vicente Ordóñez, associate professor at Rice University’s computer science department

 ?? Karen Warren/Staff photograph­er ?? Boris Dayma created Craiyon, where visitors enter words to form images that can be realistic, whimsical, even disturbing.
Karen Warren/Staff photograph­er Boris Dayma created Craiyon, where visitors enter words to form images that can be realistic, whimsical, even disturbing.
 ?? Screenshot ?? Craiyon generates distorted images with the prompt, “woman singing to a baby in a photoreali­stic style.”
Screenshot Craiyon generates distorted images with the prompt, “woman singing to a baby in a photoreali­stic style.”
 ?? Screenshot ?? A prompt that began, “Twas in another lifetime, one of toil and blood,” produced this text2image picture.
Screenshot A prompt that began, “Twas in another lifetime, one of toil and blood,” produced this text2image picture.
 ?? Jason M. Allen ?? Theatre d’Opera Spatial, created by Jason M. Allen on the text2image AI platform Midjourney, won a blue-ribbon art prize at the 2022 Colorado State Fair. The award created a social media backlash.
Jason M. Allen Theatre d’Opera Spatial, created by Jason M. Allen on the text2image AI platform Midjourney, won a blue-ribbon art prize at the 2022 Colorado State Fair. The award created a social media backlash.

Newspapers in English

Newspapers from United States