League of its own

How DeepMind made an AI capable of beating StarCraft II pros – and what it could mean for the real world

2019-02-28 -

How DeepMind made an AI that managed to beat StarCraft II pros

First StarCraft II, then the world. Such is the trajectory of neural network AlphaStar. Trained in 18 months by a team at Google’s artificial intelligence company DeepMind, it’s designed to play Blizzard’s realtime strategy game at a competitive level. In a test match at the tail end of last year, DeepMind pitted AlphaStar against top-ranking professional StarCraft players – and humans – Dario “TLO” Wünsch and Grzegorz “MaNa” Komincz. “I was just hoping to see the agent play like it knows how to play,” lead researcher Oriol Vinyals says. “I didn’t want it to break, of course, and I hoped to see a reasonably long game, hopefully balanced in terms of not being too one-sided – for the human.” AlphaStar trounced both players 5-0.

As ever, though, this is not just about making an AI good at a computer game for the sake of it. Videogames have long proved excellent virtual testing grounds for AI research, their systems capable of simulating and facilitating various aspects of human intelligence: Atari, Quake III, Super Mario and Dota 2 have all played their part in helping develop more sophisticated AI. But with its layers of tactical complexity, an AI victory over a human player in StarCraft II – even using heavily compromised versions of the game – has emerged as a “grand challenge” among the AI research community, something entirely out of reach. Until AlphaStar, that is.

“We wanted to select a game which was canonical, and challenging,” fellow lead researcher David Silver says. “StarCraft has been widely used across academia for over ten years now, so for us it was a really natural choice. It’s one of the most successful PC games of all time, and it tests and stretches humans in all kinds of dimensions.” As the more recent version of the game, StarCraft II was the logical next step. But the sequel wasn’t available as a benchmark at the time, and so it would be necessary to work more closely with Blizzard to set it up as a testbed for AI algorithms to work with. “The first and most obvious change that we had to work with Blizzard on is that the game runs on Windows and Mac, and most research environments use Linux,” Vinyals says.

They decided to run multiple types of machine learning to program AlphaStar’s deep neural network, and that meant using Linux. “For us, it was actually not so difficult,” Vinyals says, “but for Blizzard who had not released the game for Linux, that was a big first step.” It was important to DeepMind that AlphaStar would play the full, unsimplified version of StarCraft II. And so it became necessary to work with Blizzard to release an open-source set of tools – PsySC2, which includes the largest number of anonymised game replays ever released – among the research community for it to begin working out how to approach the multiple problems StarCraft II poses to players. These include an algorithm having to calculate strategy in constant realtime instead of by turns, “imperfect information” unlike in Go or chess where a player has to actively seek out hidden and crucial information, and long-term planning in terms of ‘macro’ plays involving economy and big-picture management as well as ‘micro’ management of individual units. Naturally, Blizzard was not exactly keen to open-source StarCraft II itself – that would be asking for trouble. But the dev didn’t need to. “They just released a binary, which is much like the binary release for Windows or Macintosh,” Vinyals explains. “It connects with these tools that we co-developed with them for enthusiasts and researchers to develop agents. The only thing they had to be careful about was to make sure that their game remained safe for players to use.” But the possibility of better in-game AI opponents for future games was undoubtedly tempting for Blizzard. “At the moment, human-versushuman is the only really satisfying mode to play in a lot of these RTS games,” Silver says. “And I think Blizzard is very interested in the future possibility of what could be done with AI, not just to build in opponents to have other capabilities such as automated testing, or being able to play with an AI on your side in 2v2 or have it issue commands on your behalf.”

With everything in place and everyone on board, training AlphaStar began in earnest, with that blend of imitation and reinforcement learning. For the imitation learning part, Vinyals recalls, they had each agent process about

“At the moment, human-versushuman is the only really satisfying mode to play in a lot of RTS games”

100,000 games of StarCraft II, “much like a human might go and watch on YouTube or Twitch, learning how you play the game even before you start playing it. The first step is to get an agent to understand where people click, what people do, after seeing situations that humans have been put in before.” But the second, reinforcement learning-based part would be just as crucial. “Even 100,000 games is not enough to really learn the full details of all the different possible macroand micro-strategies,” Silver says. “We wanted the agent to be able to learn essentially by playing against different versions of itself, and go beyond what we see in the human data.” To facilitate this, DeepMind developed the AlphaStar League algorithm, a kind of virtual tournament which has different versions of the agent play against old and new versions of themselves to become stronger. “At the same time, we’re also branching new versions of those agents and adding them into the league to increase the diversity,” Silver continues, “And this is done by making sure that different agents are playing against different opponents within the league, or that they may adapt incentives to build particular unit types, for example.”

Although the League took considerable time and effort to develop, Vinyals believes that it – and its combination with the foundation of imitation learning – was key to how quickly AlphaStar managed to become capable of outfoxing some of the world’s best StarCraft II pros (indeed, according to a 2017 Wired article, one of DeepMind’s advisors predicted it would be five years before a bot beat a human at StarCraft). “Things were starting to happen a year ago, and then six months ago we started scaling up the process, and also developing new things in the algorithms that are too detailed to describe, but we are preparing a publication to do so,” he says. “It was just an extra week of training from a TLObeating agent to a [superior] MaNabeating agent, so the speed of progress has definitely picked up in the last couple of months – but many of these ingredients started development quite a while ago.”

Not that it was all smooth sailing; where AI goes, quirks often follow. AlphaStar’s main vice for quite some time was ‘worker rushing’, an elementary strategy which involves a player taking all of their mining units and catching an opponent off-guard by throwing them at their base. It’s fairly basic to defend against it, and it leaves the initiator highly vulnerable. “What was interesting was that, because we were originally learning by playing not against ourselves but different versions of the agent against themselves, it becomes hard to escape from these traps,” Silver says. “They’re kind of local basins of attraction you need to learn something much, much more sophisticated to get out of. We were very happy when we first saw our first units that were not just worker rushing.” For Vinyals, one of the most satisfying moments of AlphaStar’s creation came during a visit from TLO, as they described how the agents would counter strategies and develop a meta. “So you would get like, oh my god, invisible units are great – but then agents would discover the cannons that would detect invisible units, and later that they can use observers which are mobile, and so on. I described this to TLO, and he said that was actually how humans went through phases. It was great to hear that what we saw was not that dissimilar to the discovery phase of the game a few years ago, and was revealed by research.”

That feeling was nothing, however, to seeing AlphaStar take on a human and win, the team standing in a small room separate from the human player to watch. “We couldn’t see the agent’s perspective because technically Blizzard hasn’t done the observer bit,” Vinyals says, “so we saw exactly what TLO or MaNa were seeing on their screen. You have to estimate whether it’s looking good or not, but I understand the game enough, and I thought, ‘Oh, this is looking really good.’ And then the player says ‘GG’ and quits. That was almost unbelievable, a whole test of the whole approach: first working with Blizzard, then open-sourcing in a field where others can participate in the process, and then really focusing on this approach of imitation learning first and then the AlphaStar League second. Even winning a single game for me would have been a great victory.” This emotional moment for the DeepMind team was also a significant achievement in a much larger sense. “I really felt the momentousness of this from a historic perspective,” Silver says. “In the history of AI, there’s been a number of key moments where you look at challenge domains people select for artificial intelligence – like chess, when Deep Blue beat Garry Kasparov, or Go, when AlphaGo beat Lee Sedol. There are these moments where a domain kind of transitions from being ‘impossible’ to ‘done’, and it’s a real privilege to be there at the moment where the transition occurs.”

Now, DeepMind is looking to the future: to what benefits AlphaStar’s mastery of StarCraft II could bring to the wider world. Better in-game AI opponents or teammates is a probability, of course, as well as AI’s impact on the meta of competitive games. “People tend to worry that this will somehow negatively impact the games; that somehow people will feel their interest will be diminished by the fact that now computers are superior,” Silver says. “But what we’ve seen each time is the opposite: it draws a lot of new excitement for the game. AI discovers new strategies that people didn’t know about, new ways to think about the game, fresh perspectives. Pro players have universally said that the most fascinating thing for a pro to do right now is to go and look at AlphaStar, not at human games. It means different ways to learn about the game, different ways to train, different dynamics, different games being developed as AI is used to help the development process during testing and so forth. It’s an exciting turning point.”

And DeepMind’s ultimate goal lies beyond videogames. Videogame AI is a benchmark for us to understand how much progress we’re making with general artificial intelligence that can solve a wide

“AI discovers new strategies that people didn’t know about, new ways to think about the game”

array of society’s big problems. “StarCraft

II is interesting to us as a challenge because it has some particular difficulties that benchmarks like chess and Go don’t have, but which really matter for the real world,” Silver says. “It’s got a huge range of different strategies, and so you have to find a very robust solution which is able to respond to a very wide range of different corner cases – all the different cheeses which humans can pull, for example. This is kind of similar to how in the real world you might need to deploy an AI in a context where it might be interacting with people who are very unpredictable and you don’t exactly know what those people are going to do in those interactions.”

The problems that AlphaStar is trained to solve – continuous adaptive strategy and actions, imperfect information, longterm planning and micromanagement – are the same as those involved in longterm forecasting, climate modelling, understanding language patterns for machine translation and text summarisation, and working in safetycritical domains such as energy. The potential end goal, then, is an advanced, widely applicable learning algorithm that’s part of moving AI research much further towards the goal of artificial general intelligence. For now, at least,

StarCraft II’s pro players simply have some new competition to worry about.

?? ?? David Silver (top) and Oriol Vinyals, lead researchers on DeepMind’s AlphaStar — David Silver (top) and Oriol Vinyals, lead researchers on DeepMind’s AlphaStar

?? ?? For DeepMind, it’s exciting to see pro players developing strategies to counter AI and vice versa – and watching the meta cycle build up — For DeepMind, it’s exciting to see pro players developing strategies to counter AI and vice versa – and watching the meta cycle build up

?? ?? This graph neatly showcases the two stages of AlphaStar’s development. On the left is the progress of agents trained via learning based on game data; the right side shows the rapid evolution of the AI as the reinforcement learning phase began, and how short the time between the AI beating TLO and MaNa was — This graph neatly showcases the two stages of AlphaStar’s development. On the left is the progress of agents trained via learning based on game data; the right side shows the rapid evolution of the AI as the reinforcement learning phase began, and how short the time between the AI beating TLO and MaNa was

?? ?? ‘Actions per minute’ has long been an indicator of skill in StarCraft II. Despite AlphaStar’s APM being throttled to a paltry 280, its precision clicking and lack of human error is still an advantage. Issues of fairness will doubtless be debated hotly — ‘Actions per minute’ has long been an indicator of skill in StarCraft II. Despite AlphaStar’s APM being throttled to a paltry 280, its precision clicking and lack of human error is still an advantage. Issues of fairness will doubtless be debated hotly

League of its own

How DeepMind made an AI capable of beating StarCraft II pros – and what it could mean for the real world

Newspapers in English

Newspapers from Australia