Business Day

Latecomer Meta accelerate­s push into generative AI products

• The new artificial technology is now compoundin­g the company’s capacity crunch

- Katie Paul, Krystal Hu Stephen Nellis and Anna Tong

As the summer of 2022 came to a close, Meta CEO Mark Zuckerberg gathered his top lieutenant­s for a five-hour dissection of the company’s computing capacity, focused on its ability to do cutting-edge artificial intelligen­ce (AI) work, according to a company memo dated September 20.

They had a thorny problem: despite high-profile investment­s in AI research, the social media giant had been slow to adopt expensive AI-friendly hardware and software systems for its main business, hobbling its ability to keep pace with innovation at scale even as it increasing­ly relied on AI to support its growth, according to the memo, company statements and interviews with 12 people familiar with the changes, who spoke on condition of anonymity to discuss internal company matters.

“We have a significan­t gap in our tooling, workflows and processes when it comes to developing for AI. We need to invest heavily here,” said the memo, written by new head of infrastruc­ture Santosh Janardhan, which was posted on Meta’s internal message board in September and is being reported now for the first time.

Supporting AI work would require Meta to “fundamenta­lly shift our physical infrastruc­ture design, our software systems and our approach to providing a stable platform”, it added.

For more than a year, Meta has been engaged in a big project to whip its AI infrastruc­ture into shape. While Meta has publicly acknowledg­ed “playing a little bit of catch-up” on AI hardware trends, details of the overhaul, including capacity crunches, leadership changes and a scrapped AI chip project, have not been reported previously.

Asked about the memo and the restructur­ing, Meta spokespers­on Jon Carvill said the company had “a proven track record in creating and deploying state-of-the-art infrastruc­ture at scale combined with deep expertise in AI research and engineerin­g.

“We’re confident in our ability to continue expanding our infrastruc­ture’s capabiliti­es to meet our near-term and longterm needs as we bring new AIpowered experience­s to our family of apps and consumer products,” said Carvill.

He declined to comment on whether Meta had abandoned its AI chip. Janardhan and other executives did not grant requests for interviews.

META ALSO STARTED USING ITS OWN CUSTOM CHIP IT HAD DESIGNED FOR INFERENCE

The overhaul spiked Meta’s capital expenditur­es by about $4bn a quarter, according to company disclosure­s — nearly double its spend as of 2021 — and led it to pause or cancel previously planned data centre builds in four locations.

Those investment­s have coincided with a period of severe financial squeeze for Meta, which has been laying off employees since November at a scale not seen since the dot-com bust.

Meanwhile, Microsoftb­acked OpenAI’s ChatGPT surged to become the fastestgro­wing consumer applicatio­n in history after its November 30 debut. This has triggered an arms race among tech giants to release products using so-called generative AI, which, beyond recognisin­g patterns in data like other AI, creates human-like written and visual content in response to prompts.

Generative AI gobbles up reams of computing power, amplifying the urgency of Meta’s capacity scramble, said five of the sources.

A key source of the trouble, those five sources said, can be traced back to Meta’s belated embrace of the graphics processing unit (GPU) for AI work.

GPU chips are uniquely well suited to artificial intelligen­ce processing because they can perform large numbers of tasks simultaneo­usly, reducing the time needed to churn through billions of pieces of data.

However, GPUs are also more expensive than other chips, with chipmaker Nvidia controllin­g 80% of the market and maintainin­g a commanding lead on accompanyi­ng software, the sources said.

Nvidia did not respond to a request for comment for this story.

Instead, until 2022, Meta largely ran AI workloads using the company’s fleet of commodity central processing units (CPUs), the workhorse chip of the computing world, which has filled data centres for decades but performs AI work poorly.

According to two of those sources, the company also started using its own custom chip it had designed in-house for inference, an AI process in which algorithms trained on huge amounts of data make judgments and generate responses to prompts.

By 2021, that two-pronged approach proved slower and less efficient than one built around GPUs, which were also more flexible in running different types of models than Meta’s chip, the two people said.

Meta declined to comment on its AI chip’s performanc­e.

As Zuckerberg pivoted the company towards the metaverse — a set of digital worlds enabled by augmented and virtual reality — its capacity crunch was slowing its ability to deploy AI to respond to threats, such as the rise of social media rival TikTok and Apple-led ad privacy changes, said four of the sources.

The stumbles caught the attention of former Meta board member Peter Thiel, who resigned in early 2022, without explanatio­n.

At a board meeting before he left, Thiel told Zuckerberg and his executives they were complacent about Meta’s core social media business while focusing too much on the metaverse, which he said left the company vulnerable to the challenge from TikTok, according to two sources familiar with the exchange.

Meta declined to comment on the conversati­on.

CATCH-UP

After pulling the plug on a largescale rollout of Meta’s own custom inference chip, which was planned for 2022, executives instead reversed course and placed orders that year for billions of dollars worth of Nvidia GPUs, one source said.

Meta declined to comment on the order.

By then, Meta was already several steps behind peers such as Google, which had begun deploying its own custom-built version of the GPU, called the TPU, in 2015.

Executives also that spring set about reorganisi­ng Meta’s AI units, naming two new heads of engineerin­g in the process, including Janardhan, the author of the September memo.

More than a dozen executives left Meta during the months-long upheaval, according to their LinkedIn profiles and a source familiar with the departures, a near wholesale change of AI infrastruc­ture leadership.

Meta next started retooling its data centres to accommodat­e the incoming GPUs, which draw more power and produce more heat than CPUs, and which must be clustered closely together with specialise­d networking between them.

The facilities needed 24 to 32 times the networking capacity and new liquid cooling systems to manage the clusters’ heat, requiring them to be “entirely redesigned”, according to Janardhan’s memo and four sources familiar with the project, details of which have not previously been disclosed.

As the work got under way, Meta made internal plans to start developing a new and more ambitious in-house chip, which, like a GPU, would be capable of both training AI models and performing inference.

The project, which has not

FACEBOOK AI RESEARCH HAS BEEN PUBLISHING PROTOTYPES OF THE TECHNOLOGY SINCE LATE 2021

been reported previously, is set to finish about 2025, two sources said.

Carvill, the Meta spokespers­on, said data centre constructi­on that was paused while transition­ing to the new designs would resume later in 2023.

He declined to comment on the chip project.

While scaling up its GPU capacity, Meta, for now, has had little to show as competitor­s such as Microsoft and Google promote public launches of commercial generative AI products.

CFO Susan Li acknowledg­ed in February that Meta was not devoting much of its current computing to generative work, saying “basically all of our AI capacity is going towards ads, feeds and Reels”, its TikTok-like short-video format that is popular with younger users.

According to four of the sources, Meta did not prioritise building generative AI products until after the launch of ChatGPT in November.

Even though its research lab Facebook AI Research has been publishing prototypes of the technology since late 2021, the company was not focused on converting its well-regarded research into products, the sources said.

As investor interest soars, that is changing. Zuckerberg announced a new top-level generative AI team in February that he said would “turbocharg­e” the company’s work in the area.

Chief technology officer Andrew Bosworth likewise said earlier in April that generative AI was the area in which he and Zuckerberg were spending the most time, forecastin­g that Meta would release a product in 2023.

Two people familiar with the new team said its work was in the early stages and focused on building a foundation model, a core program that later could be fine-tuned and adapted for different products.

Carvill said the company had been building generative AI products on different teams for more than a year.

He confirmed that the work had accelerate­d in the months since ChatGPT’s arrival.

 ?? /Bloomberg/File ?? Racing into new areas: An attendee wears a Meta Platforms Oculus Quest 2 virtual reality headset on the second day of the Mobile World Congress at the Fira de Barcelona venue in Barcelona, Spain, in February, 2023.
/Bloomberg/File Racing into new areas: An attendee wears a Meta Platforms Oculus Quest 2 virtual reality headset on the second day of the Mobile World Congress at the Fira de Barcelona venue in Barcelona, Spain, in February, 2023.

Newspapers in English

Newspapers from South Africa