What makes AlphaGo so smart?

HWM (Malaysia) - - THINK -

AlphaGo isn’t even one sin­gle ma­chine, but is in­stead dis­trib­uted soft­ware run­ning across a com­puter net­work, which com­prises over 170 GPUs and 1,200 CPUs. It’s a prime ex­am­ple of a pro­gram based on deep neu­ral nets – hard­ware and soft­ware net­works that ap­prox­i­mate the struc­ture and func­tion of the web of neu­rons in our brains.

It uses a com­bi­na­tion of Monte Carlo tree search pro­grams, search al­go­rithms, and two types of deep neu­ral net­works; a pol­icy and value net­work. AlphaGo uses these neu­ral net­works to guide its Monte Carlo pro­gram, which in­volves look­ing ahead and play­ing out the re­main­der of the game in its mind.’

Dur­ing each sim­u­lated game, the pol­icy net­work sug­gests which moves to make based on what it thinks the op­po­nent’s next move will be, while the value net­work eval­u­ates the re­sult­ing po­si­tion. Fi­nally, AlphaGo se­lects the most suc­cess­ful move in its sim­u­la­tion.

These neu­ral net­works em­power a tech­nique called ma­chine learn­ing, which en­ables a com­puter to learn’ with­out need­ing to be fed ex­plicit in­struc­tions for spe­cific sce­nar­ios. Com­put­ers re­ly­ing on ma­chine learn­ing re­quire huge amounts of data to be­come smarter, and Deep­Mind started train­ing the pol­icy net­work with 30 mil­lion moves from games between top Go play­ers.

Deep­Mind’s goal was to win the best play­ers, not merely ape them. AlphaGo had to dis­cover new strate­gies for it­self, so Deep­Mind set it to play thou­sands of games against it­self, grad­u­ally im­prov­ing its tac­tics by a trial-and-er­ror process known as re­in­force­ment learn­ing. Ul­ti­mately, the value net­works be­came so ca­pa­ble that they could eval­u­ate any Go po­si­tion and es­ti­mate the even­tual win­ner, a feat once thought to be im­pos­si­ble.

Newspapers in English

Newspapers from Malaysia

© PressReader. All rights reserved.