Monte Carlo AIXI

While I was visiting Marcus Hutter at ANU a month or so ago, I got talking to one of his students, Joel Veness, who’s working on making computable approximations to AIXI. Joel has a background in writing Go algorithms so is perhaps perfect for the job. I saw recently that the Monte Carlo AIXI paper describing this work is now available online if you want to check it out.

The basic idea goes as follows. In full AIXI you have an extended Solomonoff predictor to model the environment, and an expecti-max tree to compute the optimal action. In order to scale AIXI down and still have something of roughly the same form, you need to find a tractable way to replace both of these two items. Here’s what they did: in the place of extended Solomonoff induction a version of context tree weighting (CTW) is used. CTW has to be extended for this application similar to the way Hutter had to extend Solomonoff induction to active environments for AIXI. In the place of the expecti-max tree search a Monte Carlo tree search is used, similar to that used in Go playing programs: initial selection within the tree, tree expansion, a so called play-out policy, followed by a backup stage to propagate the new information back into the model. You have to be a bit careful here because as the agent imagines different future observations and actions it has to update its hypothetical beliefs to reflect these in order for its analysis and decision making to be consistent. Then, once this possible future has been evaluated, the effect of this on the agent’s model of the world has to be unwound so that the agent doesn’t, in effect, start confusing its fantasies with its present reality.

The algorithm is both embarrassingly-parallel and any-time, which is very nice. In less technical language: it would be fairly easy to get it to run efficiently on a massively parallel supercomputer, and it also has the property that it can be forced to decide what action to take at any moment always returning the best action it had been able to compute so far. Thus, if you want a smarter agent, just give it more time and/or CPUs. Already they have shown that MC-AIXI can learn to solve a bunch of basic POMDP problems, including playing a somewhat reasonable game of Pac-man. It would be interesting to see what it was capable of on a supercomputer with ten thousand times the resources of their desktop PC.

A key question for future research is to make better sequence predictors, in particular to be able to identify more complex types of patterns in the agent’s history. I guess all sorts of machine learning techniques could come into play here… and possibly combine to produce quite a powerful RL agent?

This entry was posted in Research Review and tagged , , . Bookmark the permalink.

13 Responses to Monte Carlo AIXI

  1. Roko says:

    ” It would be interesting to see what it was capable of on a supercomputer with ten thousand times the resources of their desktop PC.”

    – why don’t we set up a SKYNET@HOME distributed uFAI initiative to find out?

    • Shane Legg says:

      I wouldn’t be surprised if they do get to run it on a supercomputer, or at least a decent cluster, in the coming years. Most supercomputer labs are pretty open to donating limited amounts of time to such things.

      There are some quite basic types of pattern learning that this design would struggle with due to the somewhat limited nature of CTW. For example, it can’t build proper hierarchies of abstraction about its environment and thus it can’t really do anything “deep”. Nevertheless, it would be interesting to see how this type of simple design tops out in practice as you add more computer power. That would help clarify the system’s most serious limitations and perhaps point to where more future research might be useful.

      MC-AIXI is certainly no Skynet! :-) But, at least to my mind, it looks like an interesting starting point for future work.

      • Roko says:

        It certainly looks like a piece of research that will contribute, in the medium-long term, to the goal of tiling the universe with paperclips.

        Incidentally, how likely do you think AGI academics are to agree to the idea of not publishing their research for safety reasons?

        • Shane Legg says:

          “It certainly looks like a piece of research that will contribute, in the medium-long term, to the goal of tiling the universe with paper clips.”

          But you think this of concrete advances in AGI research in general right?

          “Incidentally, how likely do you think AGI academics are to agree to the idea of not publishing their research for safety reasons?”

          Even if most did agree to this, information tends to get out, other people working in close areas discover the same things a few years later, alternative methods are found that do the same thing… and who would fund them if they don’t publish their research? Come to think of it, what if we all stopped telling each other about our research? 15 years from now somebody like Hutter out of the blue announces that he’s built a machine with clearly super human intelligence. And no, he’s still not going to tell you how he did it… for safety reasons. Does this really help from your perspective?

          AGI is coming fast. I read this morning that IBM is already talking to astronomers about putting together an exa-flop machine for them before 2020, building on the work for their 20 peta-flop machine that’s coming out in the next couple of years.

          • Roko says:

            “and who would fund them if they don’t publish their research?”

            – the funding agencies would have to be persuaded of the risks of AGI. Actually, beaurocrats might like the idea of having loads of control over academics, so it might be easier to get them on side.

          • Roko says:

            “But you think this of concrete advances in AGI research in general right?”

            – correct.

          • Roko says:

            ” “It certainly looks like a piece of research that will contribute, in the medium-long term, to the goal of tiling the universe with paper clips.”

            But you think this of concrete advances in AGI research in general right?”

            – well, it does depend upon whether the result is more or less likely to be used by a careful team of FAI developers, rather than by a misathropic team who don’t really care whether they kill the human race.

  2. Kevembuangga says:

    Though I don’t think the “key” to AI lies within the realm of maths or logic, other approches to computerized mathematics seem to me of more import than AIXI.

    And don’t worry about “paper clips”, entropy will take care of snuffing out the hubris of us demented monkeys.

  3. Pingback: Accelerating Future » Computable AIXI — Should We Be Afraid?

  4. Pingback: Surpassing Human Intelligence: The Importance of AGI « Extrapolating Values

Comments are closed.