Reinforcement learning in the brain

Model-free reinforcement learning (RL) algorithms are computationally cheap as each state-action pair keeps a cached estimate of its value that can easily be looked up in order to make a decision. Their weakness is that they are not easy to update when the agent’s goals, or the state of the world, changes in some critical way. Model-based RL, on the other hand, is better in this respect as it can use reasoning or search on a model in order to find paths leading to the fulfilment of the agent’s current goals. The downside, of course, is much greater computational cost.

So what does the brain do? For over a decade it has been known that temporal difference learning, a type of model-free RL algorithm, appears to explain the activity of dopamine neurons and their dorsolateral striatal projections. It has also been observed that parts of the prefrontal cortex appear to implement some kind of model-based RL algorithm. Mammalian brains, then, appear to get the best of worlds by having model-free and model-based RL algorithms and then choosing which to use on the fly.  Pretty clever huh?
Continue reading

Posted in Research Review | Tagged , | 5 Comments

The unreasonable effectiveness of data

We recently had a visitor to the Gatsby Unit talk about his work in reinforcement learning, in particular the use of planning and forward models to speed up the learning of difficult tasks.  The substance of his talk was good, but that’s not what I want to talk about: it was the motivation he gave in his introduction that bothered me.  Basically he said that humans learn much faster than reinforcement learning algorithms, and thus we should try to figure out how to make our algorithms learn faster.

Really? It takes babies half a year or more to learn to control their limbs in fairly basic ways. How many reinforcement learning algorithms get run for six months in a single learning trial?  As an adult if we try to learn some new control task, such as balancing a pole, it can take hours of effort despite having years of prior motor control experience. A reinforcement learning algorithm, on the other hand, can learn to solve some of these problems in seconds with no prior experience at all.  In a few minutes algorithms can even learn the much more difficult double pole balancing problem. This is a problem that would take me months to master, if indeed I could ever get the hang of it.  If we think about problems that humans can learn to solve quite quickly, but that machines have not yet mastered, there is usually a massive amount of prior knowledge that people are using, knowledge that may have taken years to acquire.
Continue reading

Posted in Research Review | Tagged , , | 9 Comments

Prospect theory investors

I recently completed a finance paper on the implications of prospect theory for portfolio choice and asset pricing. I worked on this with Prof. Enrico De Giorgi during my post doc at the Swiss Finance Institute. This post is meant as an introduction to this work; the full paper can be downloaded here.
Continue reading

Posted in My Research | Tagged , , | 5 Comments

Most surprising thing since 1999?

I just read this article on the scale of time by Mike Treder. Part the way through it has an interesting question: What would surprise a person from the year 2000 most about the year 2010? As I don’t know what will happen in the next year, I prefer the 1999 vs. 2009 question: If I got on the phone with 1999 me, what would be the most surprising news?

Let’s start with what was going on in 1999: I had my first cell phone. Black and white LCD screen. No text messaging. I started working for Intelligenesis (later called Webmind, founded by Ben Goertzel). The machines we had were 500 MHz and had 256 MB of RAM. I discovered Google. Internet at home at 56k, but something like 256k at work. I was using Linux and was well aware of open source software. Quake was popular. Computers had CD drives, but DVD drives were starting to come out. Nobody had LCD monitors except on laptops. Dot.com boom was going crazy. The Matrix was a big hit.

Ok, so what would be the biggest surprise for 1999 me? I think the single biggest surprise would be that a black man had been elected president of the United States.  I thought it would be at least another generation or two before this would be possible.  The next most surprising thing would have been Wikipedia. Though given that Linux development was working well at the time, I guess with the right control structures in place it shouldn’t have been all that surprising.  Still, it continues to amaze me at just how good it in fact is.

Many other things seem to have been fairly predictable: internet got faster, bigger, computer specs all went up, people started watching video on the internet, voice and video chatting over the internet, more mobile internet… Would any of these things have surprised me in 1999?  I don’t think so.  Even the recent rise of social networking: I couldn’t have predicted what that would have looked like, but it’s not all that surprising.  Same for internet banking.  A lot of what seems to have been going on over the last 10 years is just the maturation of the internet and mobile devices.

What are the most surprising things for you over the last 10 years?

EDIT: Add to my list: free email service with almost 10 GB of storage (gmail), and Google street view.

Posted in Uncategorized | Tagged , | 6 Comments

Black swan research

A month or so ago I became a “twit”, in internet speak.  I didn’t really see the point in Twitter, but given that it’s the new big thing in internet land I figured that the only way to understand it was to try it…  I got myself a Twitter account.  I soon realised that it’s basically the same as a Facebook status, which I already used, but without the Facebook walls.  I soon configured the two to sync.  Anyhow, my favourite Twitter feed so far is that of Tyler Emerson.  He seems to find all sorts of interesting stuff, you might want to check it out.  Some of his recent tweets are links to two articles about research and risk, which is what I really want to talk about.

The first is an editorial in Nature called A risk worth taking.  I think this quote sums it up, “Researchers long ago learned that the last people they should tell about their big ideas are their sources of financial support.”  It then goes on to describe the radical approach that the Bill and Melinda Gates foundation is taking to overcome this problem.  Good on them, but even if this works it only solves part of the problem: if you do obtain funding to undertake radical research and your research fails, which is rather likely, what then becomes of you?  Will you get the next job/grant, or will the guy who did less risky research and got some not-altogether-surprising results that were then published in a mainstream journal?

Continue reading

Posted in Uncategorized | Tagged , , | Leave a comment

On universal intelligence

It’s been a while since my journal paper on universal intelligence came out, and even longer since the intelligence order relation was published by Hutter that this was based on. Since then there have been a number of reactions; here I will make some comments in response.

One point of contention concerns whether efficiency should be part of the concept of intelligence. Hutter and I have taken the position that it should not, and I continue to think that this is the right way to go. As what we are debating is a definition, it’s hard to claim that one of these two possibilities is in some absolute sense “correct”. All we can argue is that one is more in line with what is typically meant when the word is used. Looking over the many definitions of intelligence that we have collected, in the vast majority the internal computational cost of the agent is not taken into account. Thus, among professional definitions the pattern is clear.

What about naive usage of the concept then? I think it’s the same. Imagine that you discovered that some friend of yours, who seemed completely normal, actually had only half a brain. Due to his smaller brain making more efficient use of its resources it wasn’t obvious from the outside that anything strange was going on, until a brain scan revealed this. Would you now say that your friend was twice as intelligent as you had previous thought?  Consider a more futuristic hypothetical. It may well be the case that intelligence (in my sense) scales in a sub-linear way with respect to computational resources. Indeed, many learning, modelling and prediction algorithms scale in a sub-linear way with respect to computational resources. This raises the possibility that after a singularity the world could be run by a computationally vast and phenomenally smart machine which, in an efficiency sense, has significantly sub-human “intelligence”.

Continue reading

Posted in My Research | Tagged , , | 29 Comments

What’s up with go?

The Computational Intelligence of MoGo Revealed in Taiwan’s Computer Go Tournaments C.S. Lee, M.H. Wang, G. Chaslot, J.B. Hoock et. al., IEEE Trans. Comp. Intelligence and AI in games, 2009

Go, the Asian board game, has long been considered to be a profound challenge for artificial intelligence.  John McCarthy described it as the “new drosophila of AI”, Hans Berliner as a “task par excellence for AI”, and David Mechner as a “grand challenge task”.  Confucius was less emphatic in his support, commenting that, “Even playing [go] is better than being idle.  I can only presume that Confucius would have had more reverence for the game had he tried to program a computer to play it.  Among AI researchers, however, it has taken on something of a “holy grail” status.  Years have been spent carefully constructing go engines without success.  In 1998, a top go computer was beaten by a 6th dan player even though it was given a massive 29 stone advantage, meaning that it’s rating was something like 25 kyu.  If you’re not familiar with martial arts ratings systems, well, 25 kyu is only a little above a total beginner.  By 2003, another go program had progressed to about 15 kyu.  A big improvement, but nevertheless a beginner could beat it with a few months of training.  Computer go, in a nutshell, was very weak.

In 2007, MoGo, a Monte Carlo Tree Search based system developed by Paris University PhD candidate Sylvain Gelly, burst onto the scene and promptly thrashed all the other computers.  Its rating was around 2 kyu, almost a “black belt” level.  Then in 2008, MoGo beat a 7th dan professional player with a 9 stone handicap, putting its rating at around 2nd dan amateur.  A few months ago MoGo beat a 9th dan professional player with just a 6 stone handicap, putting its rating at around 3rd dan amateur.  Needless to say, the days of computers being unable to play go are over.  Only professionals and very highly ranked amateur players can now be confident of a victory in a game without handicap.

Continue reading

Posted in Research Review | Tagged , | 3 Comments

The innovator’s dilemma

The way in which technological change occurs in industries has always interested me. One quite well known book on this subject is “The Innovator’s Dilemma” by Clayton M. Christensen. Here’s a nice post on a friend’s blog that summarises the essential ideas.  The book contains many fascinating examples of disruptive changes and is certainly worth a read.

Posted in Uncategorized | Tagged , , | Leave a comment

Tick, tock, tick, tock…

I recently read about IBM’s Sequoia supercomputer that will be operational in 2011.  It will perform 20 Peta FLOPS and have 1.6 Peta bytes of RAM.  To put that in perspective: if it were to attempt to simulation a human cerebral cortex it would be able to allocate 50 bytes of RAM and 700 calculations per second to every synapse in the model.  Unless the human brain is doing something pretty weird, the quest to build a computer with comparable raw processing power is almost over.

As I do at the start of each year, I’ve spent some time reconsidering when I think roughly human level AGI will exist.  I’ve again decided to leave it at 2025, but now with a reduced standard deviation of 5 years.  Computer power is a limitation as researchers typically have limited hardware budgets, unlike the DOD guys and their monster supercomputers.  From what I’ve read, computer power should continue to grow exponentially for at least the next 5 years, and probably the next 10.  So I don’t see this as being too much of an issue in the coming decade.  On the algorithm side, I think things are progressing really well.  I know a number of very talented people who are working on what I think are the key building blocks required before the construction of a basic AGI can begin.  I’m certain these problems are solvable, but whether it takes 2 years or 10 years is hard to guess.  This is my main source of uncertainty.

Continue reading

Posted in Uncategorized | Tagged , , , , | 28 Comments

Learning to predict the future

One of the things I’ve been thinking about recently is the prediction of the future.  Many people really enjoy doing this and come up with all sorts of wild speculations.  It’s kind of like having the liberty to write your own science fiction, but then taking it a step further by convincing yourself to actually believe it.  Sooner or later the future arrives, and many of the recorded predictions look rather silly.  More cautions people take note of this and often avoid easily falsifiable predictions.  That’s all very well as it avoids them ending up looking like a fool, however it also makes becoming a better predictor problematic as they’re never really forced to contemplate their mistakes.  My preference is to make an honest attempt at specific predictions, along with the reasoning behind them.  Then when the time comes, go back over them and try to work out what went right, what went wrong, and mostly importantly why.  Was it bad luck?  Was I overconfident?  Under confident?  Was some kind of systematic bias at work?

One example of this has been trying to predict the medium term direction of the stock market over the last 15 years.  The evidence so far shows that I’m consistently good at predicting what will happen, but that I predict that it will happen much sooner than it actually does; I roughly need to double my time estimates.  I’m now trying to mentally correct for this bias in the trades I make, but it will take some years to see if this is working.

Continue reading

Posted in Uncategorized | Tagged , , , , | 6 Comments