The Computational Intelligence of MoGo Revealed in Taiwan’s Computer Go Tournaments C.S. Lee, M.H. Wang, G. Chaslot, J.B. Hoock et. al., IEEE Trans. Comp. Intelligence and AI in games, 2009
Go, the Asian board game, has long been considered to be a profound challenge for artificial intelligence. John McCarthy described it as the “new drosophila of AI”, Hans Berliner as a “task par excellence for AI”, and David Mechner as a “grand challenge task”. Confucius was less emphatic in his support, commenting that, “Even playing [go] is better than being idle. I can only presume that Confucius would have had more reverence for the game had he tried to program a computer to play it. Among AI researchers, however, it has taken on something of a “holy grail” status. Years have been spent carefully constructing go engines without success. In 1998, a top go computer was beaten by a 6th dan player even though it was given a massive 29 stone advantage, meaning that it’s rating was something like 25 kyu. If you’re not familiar with martial arts ratings systems, well, 25 kyu is only a little above a total beginner. By 2003, another go program had progressed to about 15 kyu. A big improvement, but nevertheless a beginner could beat it with a few months of training. Computer go, in a nutshell, was very weak.
In 2007, MoGo, a Monte Carlo Tree Search based system developed by Paris University PhD candidate Sylvain Gelly, burst onto the scene and promptly thrashed all the other computers. Its rating was around 2 kyu, almost a “black belt” level. Then in 2008, MoGo beat a 7th dan professional player with a 9 stone handicap, putting its rating at around 2nd dan amateur. A few months ago MoGo beat a 9th dan professional player with just a 6 stone handicap, putting its rating at around 3rd dan amateur. Needless to say, the days of computers being unable to play go are over. Only professionals and very highly ranked amateur players can now be confident of a victory in a game without handicap.
In post game analysis of the recent competition in Taiwan, with 20x more computer time MoGo managed to identify most of the mistakes it made and come up with better moves. This means that in 5 or so years time MoGo will be significantly stronger due to faster hardware alone. Even without more computer power, it appears that many of the mistakes made in Taiwan will be fixable with improved algorithms. Given the rate at which computer go is now progressing, one can’t help but wonder how much longer humans will reign supreme in this ancient game.
Technical comments
While it’s interesting that MC tree search combined with modern brute force has proven effective in a game with such a high branching factor (often over 300), this isn’t the kind of profound insight into intelligence that AI older timers had in mind. Moreover, much of the system is quite ad hoc and heuristic. For example, rather than using fairly standard approaches to exploration vs. exploitation in the initial tree policy, such as Upper Confidence Bounds, top performing programs seem to use all sorts of intuitive, but otherwise somewhat arbitrary equations for this that include various statistics from the MC simulations. The most important one is to note that go moves are somewhat recordable and thus if a move was made at all in a winning sequence then this is some evidence for making the move next. While this estimator is biased as game moves aren’t truly recordable, it is a much lower variance statistic as the move will appear in many more simulations. Thus at the beginning the policy in the initial tree uses this statistic the most, but as the number of simulations increases it switches to the less bias statistic where the move is first. It’s a nice idea. A strange thing about the random MC player is that better random players don’t always produce better total system performance. This is kind of weird and doesn’t seem to be very well understood. As a result, optimising the random player is a bit of a “dark art” that involves months of testing and fiddling around.
This is a naive comment without Googling first, but has there been much research into reinforcement learning with Go? It’s been successful in Backgammon, but of course Go (and even Chess) are significantly more complicated and have much higher branching factors so maybe that would create policies that are just as computationally expensive as brute force techniques.
A naive application of elementary RL methods fail with go due to the branching factor, as you suspected. There are however people trying RL approaches that use the MC Tree Search as part of the algorithm. I haven’t read about this research myself, I’ve just been told about it. I believe it’s still actively being researched.
I suggest to you this this interesting link
http://k21st.wordpress.com/2009/03/04/next-big-future-ai-milestone-supercomputer-given-67-stone-handicap-able-to-win-professional-19×19-go-games/