Some of you might remember the talk I gave at the 2010 Singularity Summit about Algorithmic IQ, or AIQ for short. It was an attempt to convert the theoretical Universal Intelligence Measure into a working practical test of machine intelligence. The results were preliminary, but it seemed to work…
It’s now over a year later so I guess some of you are wondering what happened to AIQ! I’ve been very busy working on other cool stuff, however Joel Veness and I have been tinkering with AIQ in our spare time. We’re pleased to report that it has continued to perform well, surprising well in fact. There was some trickiness to do with getting it to work efficiently, but that aside, it worked perfectly straight out of the box.
We recently wrote a paper on AIQ that was accepted to the Solomonoff Memorial Conference. You can get the paper here, the talk slides here, and we have also released all the Python AIQ source code here. It’s designed to be easy to plug in your own agents, or other reference machines, if you fancy having a go at that too.
If you’re not sure you want read any of that, here’s the summary:
We implemented the simple BF reference machine and extended it in the obvious ways to compute RL environments. We then sampled random BF programs to compute the environments, and tested against each of these. This can be a bit slow, so we used variance reduction techniques to speed things up. We then implemented a number agents. Firstly, MC-AIXI, a model based RL agent that can learn to play simple games such as TicTacToe, Kuhn poker and PacMan, but is rather slow to learn. Then HLQ(lambda), a tabular RL agent similar to Q learning but with an automatic learning rate. Then Q(lambda), a standard RL agent, and Q(0), a weaker special case. Finally, Freq, a simple agent that just does the more rewarding action most of the time, occasionally trying a random action. There was also a random agent, but that always got an AIQ of zero, as expected. The results appear below, across various episode lengths:
The error bars are 95% confidence intervals for the estimates of the mean. As you can see, AIQ orders the agents exactly we would expect, including picking up the fact the MC-AIXI, while quite powerful compared to the other agents, is also rather slow to learn and thus needs longer episode lengths. We ran additional tests where we scaled the size of the context used by MC-AIXI, and the amount of search effort used, and in both cases the AIQ score scaled sensibly. See the talk slides for more details, or the paper itself.