Self-taught AlphaGo Zero bests all previous versions in record time

Wednesday October 18, 2017

AlphaGo Zero, the latest version of the go-playing AlphaGo AI, defeated all previous versions of AlphaGo in just forty days. More importantly, AlphaGo Zero was taught go’s rules, but given no additional instructions, instead learning the best moves by playing 2017.10.18_alphago-zero-saran-poroongmillions of games against itself. Details of the new program were published Wednesday in the journal Nature.

“AlphaGo Zero independently found, used and occasionally transcended many established sequences of moves used by human players,” write the AGA’s Andy Okun and Andrew Jackson in an accompanying article. “In particular, the AI’s opening choices and end-game methods have converged on ours — seeing it arrive at our sequences from first principles suggests that we haven’t been on entirely the wrong track. On the other hand, some of its middle-game judgements are truly mysterious and give observing human players the feeling that they are seeing a strong human play, rather than watching a computer calculate.”

The ability to self-train without human input is a crucial step towards the dream of creating a general AI that can tackle any task, reports Nature. In the nearer-term, though, it could enable programs to take on scientific challenges such as protein folding or materials research, said DeepMind chief executive Demis Hassabis at a press briefing. “We’re quite excited because we think this is now good enough to make some real progress on some real problems.”

Prof Satinder Singh, a computer scientist at Michigan University, who reviewed the findings for the journal said: “The AI massively outperforms the already superhuman AlphaGo and, in my view, is one of the biggest advances, in terms of applications, for the field of reinforcement learning so far.”
