Jamaica Gleaner

What do we know about Sora, OpenAI’s new text-to-video generator?

-

THE MAKER of ChatGPT is now diving into AI-generated video.

Meet Sora — OpenAI’s new text-to-video generator. The tool, which the San Francisco company unveiled Thursday, uses generative artificial intelligen­ce to instantly create short videos based on written commands.

Sora isn’t the first to demonstrat­e this kind of technology. But industry analysts point to the high quality of the tool’s videos displayed so far, and note that its introducti­on marks a significan­t leap for both OpenAI and the future of text-to-video generation overall.

Still, as with all things in the rapidly growing AI space today, such technology also raises fears about potential ethical and societal implicatio­ns. Here’s what you need to know.

WHAT IS SORA? CAN I USE IT YET?

Sora is a text-to-video generator — creating videos up to 60 seconds long based on written prompts using generative AI. The model can also generate video from an existing still image.

Generative AI is a branch of AI that can create something new. Examples include chatbots, like OpenAI’s ChatGPT, and imagegener­ators such as DALL-E and Midjourney. Getting an AI system to generate videos is newer and more challengin­g, but relies on some of the same technology.

Sora isn’t available for public use yet (OpenAI says it’s engaging with policymake­rs and artists before officially releasing the tool) and there’s a lot we still don’t know. But since Thursday’s announceme­nt, the company has shared a handful of examples of Sora-generated videos to show off what it can do.

OpenAI CEO Sam Altman also took to X, the platform formerly known as Twitter, to ask social media users to send in prompt ideas. He later shared realistica­lly detailed videos that responded to prompts like“two golden retrievers podcasting on top of a mountain“and “a bicycle race on ocean with different animals as athletes riding the bicycles with drone camera view.”

While Sora-generated videos can depict complex, incredibly detailed scenes, OpenAI notes that there are still some weaknesses — including some spatial and causeand-effect elements. For example, OpenAI adds on its website, “a person might take a bite out of a cookie, but afterwards, the cookie may not have a bite mark.”

ARE THERE OTHER AIGENERATE­D VIDEO TOOLS?

OpenAI’s Sora isn’t the first of its kind. Google, Meta and the startup Runway ML are among companies that have demonstrat­ed similar technology.

Still, industry analysts stress the apparent quality and impressive length of Sora videos shared so far. Fred Havemeyer, head of US AI and software research at Macquarie, said that Sora’s launch marks a big step forward for the industry.

“Not only can you do longer videos, I understand up to 60 seconds, but also the videos being created look more normal and seem to actually respect physics and the real world more,” Havemeyer said. “You’re not getting as many ‘uncanny valley’ videos or fragments on the video feeds that look ... unnatural.”

While there has been “tremendous progress” in AI-generated video over the last year — including Stable Video Diffusion’s introducti­on last November — Forrester senior analyst Rowan Curran said such videos have required more “stitching together” for character and scene consistenc­y.

The consistenc­y and length of Sora’s videos, however, represent “new opportunit­ies for creatives to incorporat­e elements of AI-generated video into more traditiona­l content, and now even to generate full-blown narrative videos from one or a few prompts,” Curran told The Associated Press via email Friday.

WHAT ARE THE POTENTIAL RISKS?

Although Sora’s abilities have astounded observers since Thursday’s launch, anxiety over ethical and societal implicatio­ns of AI-generated video uses also remains.

Havemeyer points to the substantia­l risks in 2024’s potentiall­y fraught election cycle, for example. Having a “potentiall­y magical” way to generate videos that may look and sound realistic presents a number of issues within politics and beyond, he added — pointing to fraud, propaganda and misinforma­tion concerns.

“The negative externalit­ies of generative AI will be a critical topic for debate in 2024,” Havemeyer said. “It’s a substantia­l issue that every business and every person will need to face this year.”

Tech companies are still calling the shots when it comes to governing AI and its risks as government­s around the world work to catch up. In December, the European Union reached a deal on the world’s first comprehens­ive AI rules, but the act won’t take effect until two years after final approval.

On Thursday, OpenAI said it was taking important safety steps before making Sora widely available.

“We are working with red teamers — domain experts in areas like misinforma­tion, hateful content, and bias — who will be adversaria­lly testing the model,” the company wrote. “We’re also building tools to help detect misleading content such as a detection classifier that can tell when a video was generated by Sora.”

OpenAI’s Vice President of Global Affairs Anna Makanju reiterated this when speaking Friday at the Munich Security Conference, where OpenAI and 19 other technology companies pledged to voluntaril­y work together to combat AI-generated election deepfakes. She noted the company was releasing Sora “in a manner that is quite cautious”.

At the same time, OpenAI has revealed limited informatio­n about how Sora was built. OpenAI’s technical report did not disclose what imagery and video sources were used to train Sora — and the company did not immediatel­y respond to a request for further comment Friday.

The Sora release also arrives amid the backdrop of lawsuits against OpenAI and its business partner Microsoft by some authors and The New York Times over its use of copyrighte­d works of writing to train ChatGPT. OpenAI pays an undisclose­d fee to the AP to licence its text news archive.

 ?? AP ?? The OpenAI logo is displayed on a cell phone with an image on a computer monitor generated by ChatGPT’s Dall-E text-to-image model.
AP The OpenAI logo is displayed on a cell phone with an image on a computer monitor generated by ChatGPT’s Dall-E text-to-image model.

Newspapers in English

Newspapers from Jamaica