<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>vetta project &#187; Research Review</title>
	<atom:link href="http://www.vetta.org/category/research-review/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.vetta.org</link>
	<description></description>
	<lastBuildDate>Tue, 31 Jan 2012 23:29:15 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
		<item>
		<title>AGI 2010</title>
		<link>http://www.vetta.org/2009/11/agi-2010/</link>
		<comments>http://www.vetta.org/2009/11/agi-2010/#comments</comments>
		<pubDate>Sun, 22 Nov 2009 20:10:01 +0000</pubDate>
		<dc:creator>Shane Legg</dc:creator>
				<category><![CDATA[Research Review]]></category>
		<category><![CDATA[AGI]]></category>

		<guid isPermaLink="false">http://www.vetta.org/?p=816</guid>
		<description><![CDATA[The third Conference on Artificial General Intelligence will be taking place in Lugano, Switzerland from Friday the 5th to Monday the 8th of March (the picture on the front page of my website is of Lugano). The keynote speaker is &#8230; <a href="http://www.vetta.org/2009/11/agi-2010/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>The third <a href="http://agi-conf.org/2010/">Conference on Artificial General Intelligence</a> will be taking place in <a href="http://en.wikipedia.org/wiki/Lugano">Lugano</a>, Switzerland from Friday the 5th to Monday the 8th of March (the picture on the front page of my website is of Lugano).  The keynote speaker is the famous reinforcement learning researcher Rich Sutton, and it seems that the inventor of Kolmogorov complexity, Solomonoff induction and universal probability theory, Ray Solomonoff, will also be speaking.  The general conference chair is Marcus Hutter, and the local chair is JÃ¼rgen Schmidhuber.  There will also be Kurzweil Prizes worth $1000 for both the best paper and the best new idea.</p>
<p>Given that AGI is still a young and relatively unknown part of the wider AI community, it&#8217;s great to see such well known researchers putting their names behind this conference.  As a member of the program committee I&#8217;ve been able to check out some of the submissions so far and I&#8217;ve been pleasantly surprised by their quality &#8212; indeed, this is what gave me the impetus to write this post!  If you&#8217;d like to submit something there&#8217;s still time: the deadline is the 1st of December.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.vetta.org/2009/11/agi-2010/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>1973 Lighthill debate</title>
		<link>http://www.vetta.org/2009/11/1973-lighthill-debate/</link>
		<comments>http://www.vetta.org/2009/11/1973-lighthill-debate/#comments</comments>
		<pubDate>Mon, 09 Nov 2009 18:09:20 +0000</pubDate>
		<dc:creator>Shane Legg</dc:creator>
				<category><![CDATA[Research Review]]></category>
		<category><![CDATA[AI]]></category>
		<category><![CDATA[intelligence]]></category>
		<category><![CDATA[Lighthill]]></category>

		<guid isPermaLink="false">http://www.vetta.org/?p=772</guid>
		<description><![CDATA[Some of you might know about the Lighthill report from 1973 which was deeply critical of progress in AI. This report was the main factor behind cutting the funding of AI research in the UK, and seems to have contributed &#8230; <a href="http://www.vetta.org/2009/11/1973-lighthill-debate/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>Some of you might know about the Lighthill report from 1973 which was deeply critical of progress in AI.  This report was the main factor behind cutting the funding of AI research in the UK, and seems to have contributed to the more global cuts around this time known as the &#8220;AI winter&#8221;.  Via <a href="http://www.gatsby.ucl.ac.uk/~ywteh">Yee Whye Teh</a> I recently came across a BBC debate between James Lighthill and three supporters of AI research: Richard Gregory, John McCarthy and Donald Michie.  You can download the televised debate <a href="http://www.aiai.ed.ac.uk/events/lighthill1973/1973-BBC-Lighthill-Controversy.mov">from here</a>, though be warned that it&#8217;s 160MB.  </p>
<p>Now, 36 years later, it&#8217;s interesting to think about how the speakers&#8217; various views and predictions have played out.  Overall, the analysis by Lighthill felt the most coherent to me, and I&#8217;d say that what has since happened largely backs him up, though it can be argued that he helped to cause this outcome.  I agree that he slowed AI down a lot, but 36 years is a rather long time and in the types of problems that he was focusing on there hasn&#8217;t been much progress.  In response the other debaters mostly just pointed to small advances that had occurred and indicated that they felt that more advances were on the way.  Lighthill then denied that these advances showed any real progress towards intelligence.</p>
<p>This feels a lot like today: sceptics say that AI has made no progress, optimists point to lots of advances, and sceptics then say that these advances are not what they consider to be real intelligence.  I think this points to perhaps the most fundamental problem in the field: if you can&#8217;t define intelligence, how do you judge whether progress is being made?  It&#8217;s as true today as it was then, and it&#8217;s why I think that trying to <a href="http://www.vetta.org/documents/UniversalIntelligence.pdf">define intelligence</a> is so important.  I like the fact that they keep on saying that an intelligent machine should be able to perform well in a &#8220;wide range of situations&#8221;, because, of course, this is very much the view of intelligence that I have taken.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.vetta.org/2009/11/1973-lighthill-debate/feed/</wfw:commentRss>
		<slash:comments>12</slash:comments>
<enclosure url="http://www.aiai.ed.ac.uk/events/lighthill1973/1973-BBC-Lighthill-Controversy.mov" length="169265179" type="video/quicktime" />
		</item>
		<item>
		<title>Halloween lecture online</title>
		<link>http://www.vetta.org/2009/11/halloween-lectur/</link>
		<comments>http://www.vetta.org/2009/11/halloween-lectur/#comments</comments>
		<pubDate>Sun, 01 Nov 2009 14:09:23 +0000</pubDate>
		<dc:creator>Shane Legg</dc:creator>
				<category><![CDATA[Research Review]]></category>
		<category><![CDATA[AGI]]></category>
		<category><![CDATA[AIXI]]></category>
		<category><![CDATA[Friendly AI]]></category>
		<category><![CDATA[intelligence]]></category>
		<category><![CDATA[Neuroscience]]></category>
		<category><![CDATA[Singularity]]></category>

		<guid isPermaLink="false">http://www.vetta.org/?p=721</guid>
		<description><![CDATA[My Halloween lecture has been uploaded to youtube. The basic outline is: * what is intelligence? * Solomonoff induction * Hutter&#8217;s AIXI * Monte Carlo AIXI (here&#8217;s the missing video of it playing pac-man) * universal intelligence measure * what &#8230; <a href="http://www.vetta.org/2009/11/halloween-lectur/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p><img src="http://www.vetta.org/VettaPics/ExtroBrit_pic.jpg" alt="" /></p>
<p>My Halloween lecture has been uploaded to youtube.  The basic outline is:</p>
<p>* what is intelligence?<br />
* Solomonoff induction<br />
* Hutter&#8217;s AIXI<br />
* Monte Carlo AIXI  (here&#8217;s the missing video of it <a href="http://www.vetta.org/video/AIXI_Pacman.wmv">playing pac-man</a>)<br />
* universal intelligence measure<br />
* what neuroscience can teach us about AGI design<br />
* early 2020&#8242;s: the Halloween scenario</p>
<p>You can get the <a href="http://www.vetta.org/documents/extrobrit_talk.pdf">slides here</a>.  I talked for 2 hours, so it&#8217;s broken up into many parts on youtube: <a href="http://www.youtube.com/user/KoanPhilosopher#p/u/11/MGfcy9RpqBY">Part 1</a> <a href="http://www.youtube.com/user/KoanPhilosopher#p/u/7/ZgarxJJ6noY">Part 2</a> <a href="http://www.youtube.com/user/KoanPhilosopher#p/u/10/n-Ry0TE_nRA">Part 3</a> <a href="http://www.youtube.com/user/KoanPhilosopher#p/u/9/ywUf75Q0_2U">Part 4</a> <a href="http://www.youtube.com/user/KoanPhilosopher#p/u/6/MQO_k5uOD0w">Part 5</a> <a href="http://www.youtube.com/user/KoanPhilosopher#p/u/5/WRaFyI5M96g">Part 6</a> <a href="http://www.youtube.com/user/KoanPhilosopher#p/u/4/f0qf5Iu0aLg">Part 7</a> <a href="http://www.youtube.com/user/KoanPhilosopher#p/u/3/o-UCGUipg34">Part 8</a> <a href="http://www.youtube.com/user/KoanPhilosopher#p/u/8/gPW7oojUCKs">Part 9</a> <a href="http://www.youtube.com/user/KoanPhilosopher#p/u/2/fe3c3YcQZng">Part 10</a> <a href="http://www.youtube.com/user/KoanPhilosopher#p/u/1/p7Aw_7sBRPc">Part 11</a> <a href="http://www.youtube.com/user/KoanPhilosopher#p/u/0/s7ZXLd5_1_0">Part 12</a></p>
<p>Thanks to David Wood at ExtroBritannian for organising this, and all the people who attended &#8212; especially those who travelled from other cities and countries, the intelligent questions during my talk, and all the positive feedback I&#8217;ve received since.  Thanks also to Anders Sandberg for the picture of me speaking that I stole from his <a href="http://www.flickr.com/photos/arenamontanus/">flicker stream</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.vetta.org/2009/11/halloween-lectur/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Monte Carlo AIXI</title>
		<link>http://www.vetta.org/2009/09/monte-carlo-aixi/</link>
		<comments>http://www.vetta.org/2009/09/monte-carlo-aixi/#comments</comments>
		<pubDate>Fri, 18 Sep 2009 20:55:34 +0000</pubDate>
		<dc:creator>Shane Legg</dc:creator>
				<category><![CDATA[Research Review]]></category>
		<category><![CDATA[AGI]]></category>
		<category><![CDATA[AIXI]]></category>
		<category><![CDATA[Monte Carlo]]></category>

		<guid isPermaLink="false">http://www.vetta.org/?p=635</guid>
		<description><![CDATA[While I was visiting Marcus Hutter at ANU a month or so ago, I got talking to one of his students, Joel Veness, who&#8217;s working on making computable approximations to AIXI. Joel has a background in writing Go algorithms so &#8230; <a href="http://www.vetta.org/2009/09/monte-carlo-aixi/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>While I was visiting Marcus Hutter at ANU a month or so ago, I got talking to one of his students, Joel Veness, who&#8217;s working on making computable approximations to AIXI.  Joel has a background in writing Go algorithms so is perhaps perfect for the job.  I saw recently that the <a href="http://arxiv.org/abs/0909.0801">Monte Carlo AIXI</a> paper describing this work is now available online if you want to check it out.</p>
<p>The basic idea goes as follows.  In full AIXI you have an extended Solomonoff predictor to model the environment, and an expecti-max tree to compute the optimal action.  In order to scale AIXI down and still have something of roughly the same form, you need to find a tractable way to replace both of these two items.  Here&#8217;s what they did: in the place of extended Solomonoff induction a version of context tree weighting (CTW) is used. CTW has to be extended for this application similar to the way Hutter had to extend Solomonoff induction to active environments for AIXI. In the place of the expecti-max tree search a Monte Carlo tree search is used, similar to that used in Go playing programs: initial selection within the tree, tree expansion, a so called play-out policy, followed by a backup stage to propagate the new information back into the model. You have to be a bit careful here because as the agent imagines different future observations and actions it has to update its hypothetical beliefs to reflect these in order for its analysis and decision making to be consistent. Then, once this possible future has been evaluated, the effect of this on the agent&#8217;s model of the world has to be unwound so that the agent doesn&#8217;t, in effect, start confusing its fantasies with its present reality.<br />
<span id="more-635"></span></p>
<p>The algorithm is both embarrassingly-parallel and any-time, which is very nice. In less technical language: it would be fairly easy to get it to run efficiently on a massively parallel supercomputer, and it also has the property that it can be forced to decide what action to take at any moment always returning the best action it had been able to compute so far.  Thus, if you want a smarter agent, just give it more time and/or CPUs.  Already they have shown that MC-AIXI can learn to solve a bunch of basic POMDP problems, including playing a somewhat reasonable game of Pac-man.  It would be interesting to see what it was capable of on a supercomputer with ten thousand times the resources of their desktop PC.</p>
<p>A key question for future research is to make better sequence predictors, in particular to be able to identify more complex types of patterns in the agent&#8217;s history. I guess all sorts of machine learning techniques could come into play hereâ€¦ and possibly combine to produce quite a powerful RL agent?</p>
]]></content:encoded>
			<wfw:commentRss>http://www.vetta.org/2009/09/monte-carlo-aixi/feed/</wfw:commentRss>
		<slash:comments>13</slash:comments>
		</item>
		<item>
		<title>Reinforcement learning in the brain</title>
		<link>http://www.vetta.org/2009/06/reinforcement-learning-in-the-brain/</link>
		<comments>http://www.vetta.org/2009/06/reinforcement-learning-in-the-brain/#comments</comments>
		<pubDate>Sun, 21 Jun 2009 22:10:48 +0000</pubDate>
		<dc:creator>Shane Legg</dc:creator>
				<category><![CDATA[Research Review]]></category>
		<category><![CDATA[Neuroscience]]></category>
		<category><![CDATA[reinforcement learning]]></category>

		<guid isPermaLink="false">http://www.vetta.org/?p=548</guid>
		<description><![CDATA[Model-free reinforcement learning (RL) algorithms are computationally cheap as each state-action pair keeps a cached estimate of its value that can easily be looked up in order to make a decision. Their weakness is that they are not easy to &#8230; <a href="http://www.vetta.org/2009/06/reinforcement-learning-in-the-brain/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>Model-free reinforcement learning (RL) algorithms are computationally cheap as each state-action pair keeps a cached estimate of its value that can easily be looked up in order to make a decision.  Their weakness is that they are not easy to update when the agent&#8217;s goals, or the state of the world, changes in some critical way.  Model-based RL, on the other hand, is better in this respect as it can use reasoning or search on a model in order to find paths leading to the fulfilment of the agent&#8217;s current goals.  The downside, of course, is much greater computational cost.</p>
<p>So what does the brain do?  For over a decade it has been known that temporal difference learning, a type of model-free RL algorithm, appears to explain the activity of dopamine neurons and their dorsolateral striatal projections.  It has also been observed that parts of the prefrontal cortex appear to implement some kind of model-based RL algorithm.  Mammalian brains, then, appear to get the best of worlds by having model-free <em>and</em> model-based RL algorithms and then choosing which to use on the fly.  Pretty clever huh?<br />
<span id="more-548"></span></p>
<p>A key question then is how this choice is made.  I recently read <a href="http://www.gatsby.ucl.ac.uk/~dayan/papers/dawnivd05.pdf">Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control</a> from <em>Nature Neuroscience</em>, by Nathaniel Daw, Yael Niv and Peter Dayan.  They suggest that the brain may be using some kind of Bayesian principle based on the uncertainty estimates generated by each system.  They implement such a system and show how it can explain a range of experimental data from animal studies.</p>
<p>This research is now four years old and there is plenty of significant research in the area that follows it &#8212; an &#8220;embarrassment of riches&#8221; as Dayan recently described it.   Nevertheless, I think it&#8217;s a good example of the kind of crossover between machine learning and neuroscience that is starting to take place.  Indeed, the more neuroscientists learn about the brain, the more it starts to look like the rough outline of an AGI design.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.vetta.org/2009/06/reinforcement-learning-in-the-brain/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>The unreasonable effectiveness of data</title>
		<link>http://www.vetta.org/2009/06/the-unreasonable-effectiveness-of-data/</link>
		<comments>http://www.vetta.org/2009/06/the-unreasonable-effectiveness-of-data/#comments</comments>
		<pubDate>Sat, 20 Jun 2009 16:11:48 +0000</pubDate>
		<dc:creator>Shane Legg</dc:creator>
				<category><![CDATA[Research Review]]></category>
		<category><![CDATA[AGI]]></category>
		<category><![CDATA[complexity]]></category>
		<category><![CDATA[machine learning]]></category>

		<guid isPermaLink="false">http://www.vetta.org/?p=475</guid>
		<description><![CDATA[We recently had a visitor to the Gatsby Unit talk about his work in reinforcement learning, in particular the use of planning and forward models to speed up the learning of difficult tasks.  The substance of his talk was good, &#8230; <a href="http://www.vetta.org/2009/06/the-unreasonable-effectiveness-of-data/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>We recently had a visitor to the Gatsby Unit talk about his work in reinforcement learning, in particular the use of planning and forward models to speed up the learning of difficult tasks.  The substance of his talk was good, but that&#8217;s not what I want to talk about: it was the motivation he gave in his introduction that bothered me.  Basically he said that humans learn much faster than reinforcement learning algorithms, and thus we should try to figure out how to make our algorithms learn faster.</p>
<p>Really?  It takes babies half a year or more to learn to control their limbs in fairly basic ways.  How many reinforcement learning algorithms get run for six months in a single learning trial?  As an adult if we try to learn some new control task, such as balancing a pole, it can take hours of effort despite having years of prior motor control experience.  A reinforcement learning algorithm, on the other hand, can learn to solve some of these problems in seconds with no prior experience <em>at all</em>.  In a few minutes algorithms can even learn the much more difficult double pole balancing problem.  This is a problem that would take me months to master, if indeed I could ever get the hang of it.  If we think about problems that humans can learn to solve quite quickly, but that machines have not yet mastered, there is usually a massive amount of prior knowledge that people are using, knowledge that may have taken years to acquire.<br />
<span id="more-475"></span></p>
<p>It appears to me that we now have some very powerful learning algorithms, in the sense that they can learn moderately complex control tasks with very little data.  The performance of these algorithms is already significantly super human in some contexts. This is true not just for reinforcement learning, but many kinds of machine learning algorithms.  Unfortunately, for the more ambitious goals of artificial intelligence these highly data efficient algorithms on our moderately sized data sets haven&#8217;t worked very well.  One key reason for this, I suspect, is that these are inherently messy and complex problems that can only be solved with truly massive amounts of data.   I think that lot of human abilities that AI research has struggled with probably fall into this category, e.g. vision, language and common sense knowledge.</p>
<p>Something similar has been expressed in <a href="http://www.computer.org/portal/cms_docs_intelligent/intelligent/homepage/2009/x2exp.pdf">The Unreasonable Effectiveness of Data</a> by Alon Halevy, Peter Norvig and Fernando Pereira.  One of the examples they cite is that for years people tried to construct a grammar for the English language by hand.  Even at 1,700 pages, however, the grammar was still incomplete!  Similar efforts have gone into systems to translate one language into another.  To start with a manual approach looked promising as relatively few rules cover a significant percentage of the cases.  However, as you try to improve the system the number of rules needed starts to grow rapidly, and eventually explodes.  The solution, which is now used by the best translation engines, has been to move to a data driven approach: you take a learning algorithm that scales well and then feed it massive quantities of data.  Given enough data, all sorts of subtleties and complexities of the language become statistically learnable.  I think there is a key idea here for wannabe AGI designers.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.vetta.org/2009/06/the-unreasonable-effectiveness-of-data/feed/</wfw:commentRss>
		<slash:comments>9</slash:comments>
		</item>
		<item>
		<title>What&#8217;s up with go?</title>
		<link>http://www.vetta.org/2009/04/whats-up-with-go/</link>
		<comments>http://www.vetta.org/2009/04/whats-up-with-go/#comments</comments>
		<pubDate>Tue, 21 Apr 2009 19:52:01 +0000</pubDate>
		<dc:creator>Shane Legg</dc:creator>
				<category><![CDATA[Research Review]]></category>
		<category><![CDATA[AI]]></category>
		<category><![CDATA[Games Go]]></category>

		<guid isPermaLink="false">http://www.vetta.org/?p=288</guid>
		<description><![CDATA[The Computational Intelligence of MoGo Revealed in Taiwan&#8217;s Computer Go Tournaments C.S. Lee, M.H. Wang, G. Chaslot, J.B. Hoock et. al., IEEE Trans. Comp. Intelligence and AI in games, 2009 Go, the Asian board game, has long been considered to &#8230; <a href="http://www.vetta.org/2009/04/whats-up-with-go/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p style="padding-left: 30px;"><a href="http://hal.inria.fr/docs/00/36/97/86/PDF/TCIAIG-2008-0010_Accepted_.pdf">The Computational Intelligence of MoGo Revealed in Taiwan&#8217;s Computer Go Tournaments</a> C.S. Lee, M.H. Wang, G. Chaslot, J.B. Hoock et. al.,<em> IEEE Trans. Comp. Intelligence and AI in games</em>, 2009</p>
<p>Go, the Asian board game, has long been considered to be a profound challenge for artificial intelligence.  John McCarthy described it as the &#8220;new drosophila of AI&#8221;, Hans Berliner as a &#8220;task par excellence for AI&#8221;, and David Mechner as a &#8220;grand challenge task&#8221;.  Confucius was less emphatic in his support, commenting that, &#8220;Even playing [go] is better than being idle.  I can only presume that Confucius would have had more reverence for the game had he tried to program a computer to play it.  Among AI researchers, however, it has taken on something of a &#8220;holy grail&#8221; status.  Years have been spent carefully constructing go engines without success.  In 1998, a top go computer was beaten by a 6th dan player even though it was given a massive 29 stone advantage, meaning that it&#8217;s rating was something like 25 kyu.  If you&#8217;re not familiar with martial arts ratings systems, well, 25 kyu is only a little above a total beginner.  By 2003, another go program had progressed to about 15 kyu.  A big improvement, but nevertheless a beginner could beat it with a few months of training.  Computer go, in a nutshell, was very weak.</p>
<p>In 2007, MoGo, a Monte Carlo Tree Search based system developed by Paris University PhD candidate Sylvain Gelly, burst onto the scene and promptly thrashed all the other computers.  Its rating was around 2 kyu, almost a &#8220;black belt&#8221; level.  Then in 2008, MoGo beat a 7th dan professional player with a 9 stone handicap, putting its rating at around 2nd dan amateur.  A few months ago MoGo beat a 9th dan professional player with just a 6 stone handicap, putting its rating at around 3rd dan amateur.  Needless to say, the days of computers being unable to play go are over.  Only professionals and very highly ranked amateur players can now be confident of a victory in a game without handicap.</p>
<p><span id="more-288"></span></p>
<p>In post game analysis of the recent competition in Taiwan, with 20x more computer time MoGo managed to identify most of the mistakes it made and come up with better moves.  This means that in 5 or so years time MoGo will be significantly stronger due to faster hardware alone.  Even without more computer power, it appears that many of the mistakes made in Taiwan will be fixable with improved algorithms.  Given the rate at which computer go is now progressing, one can&#8217;t help but wonder how much longer humans will reign supreme in this ancient game.</p>
<p><em>Technical comments</em></p>
<p>While it&#8217;s interesting that MC tree search combined with modern brute force has proven effective in a game with such a high branching factor (often over 300), this isn&#8217;t the kind of profound insight into intelligence that AI older timers had in mind.  Moreover, much of the system is quite ad hoc and heuristic.  For example, rather than using fairly standard approaches to exploration vs. exploitation in the initial tree policy, such as Upper Confidence Bounds, top performing programs seem to use all sorts of intuitive, but otherwise somewhat arbitrary equations for this that include various statistics from the MC simulations.  The most important one is to note that go moves are somewhat recordable and thus if a move was made at all in a winning sequence then this is some evidence for making the move next.  While this estimator is biased as game moves aren&#8217;t truly recordable, it is a much lower variance statistic as the move will appear in many more simulations.  Thus at the beginning the policy in the initial tree uses this statistic the most, but as the number of simulations increases it switches to the less bias statistic where the move is first.  It&#8217;s a nice idea.  A strange thing about the random MC player is that better random players don&#8217;t always produce better total system performance.  This is kind of weird and doesn&#8217;t seem to be very well understood.  As a result, optimising the random player is a bit of a &#8220;dark art&#8221; that involves months of testing and fiddling around.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.vetta.org/2009/04/whats-up-with-go/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
	</channel>
</rss>

