American Go E-Journal

New version of AlphaGo self-trained and much more efficient

Wednesday May 24, 2017

by Andy Okun, reporting from the  ‘Future of Go’ summit in Wuzhen, China

The version of AlphaGo that defeated Ke Jie 9p in the first round of the three game challenge match yesterday was trained entirely on the self-2017.05.24_hassabis-ke-silverplay games of previous versions of AlphaGo, a Google DeepMind engineer told an audience in China.  David Silver (at right), lead researcher on the AlphaGo project, told the Future of AI Forum in Wuzhen that because AlphaGo had become so strong, its own games constituted the best available data to use.

The version of AlphaGo that beat Fan Hui 2p in 2015 (AlphaGo Fan) and the one that defeated Lee Sedol 9p last year in Seoul (AlphaGo Lee) each included a “value network,” designed to evaluate a position and give the probability of winning, and a “policy network,” designed to suggest the best next move, that were trained using hundreds of thousands of skilled human games.  The most recent version, AlphaGo Master, trained both networks on a database of its self-play games generated by its predecessors.

This was not the only new information Silver revealed about system.  The version playing Ke Jie is so much more efficient that it uses one tenth the quantity of computation that Alphago Lee used, and runs on a single machine on Google’s cloud, powered by one tensor processing unit (TPU).  AlphaGo Lee would probe 50 moves deep and study 100,000 moves per second.  While that sounds like a lot, by comparison, the tree search powering the Deep Blue chess system that defeated Gary Kasparov in the 1990s looked at 100 million moves per second.

“AlphaGo is actually thinking much more smartly than Deep Blue,” Silver said.

2017.05.24_google-deepmindIn addition, Silver revealed that DeepMind had measured the handicap needed between different versions of the software. AlphaGo Fan could give four stones to the previous best software, such as Zen or CrazyStone, which had reached 6d in strength. AlphaGo Lee, in turn, could give AlphaGo Fan three stones, and AlphaGo Master, which at the new year achieved a 60-game undefeated streak against top pros before coming to this challenge, is three stones stronger than AlphaGo Lee.  Silver delivered this with the caveat that these handicap stones are not necessarily directly convertible to human handicaps.  Professional players suggested that this may be due to AlphaGo’s tendency to play slowly when ahead — i.e., an AlphaGo receiving a three stone handicap may give its opponent ample opportunities to catch up, just as yesterday’s AlphaGo let Ke Jie get to a 0.5 point margin. This also reveals that AlphaGo is able to play with a handicap, previously a matter of speculation in the go community.

Silver’s talk came after DeepMind chief Demis Hassabis gave a passionate account of how go and AI research have fed each other. Go is so combinatorially large that playing it well is intuitive as well as a matter of calculation.  The methods that have worked so well with AlphaGO have generated moves and strategies that seem high level, intuitive, even creative. These same methods have applications in medicine, energy and many other areas. He quoted Kasparov: “Deep Blue was the end.  AlphaGo is the beginning.”

photos by Dan Maas