vetta project

vetta project header image 1

Reinforcement learning in the brain

June 21st, 2009 · Research Review

Model-free reinforcement learning (RL) algorithms are computationally cheap as each state-action pair keeps a cached estimate of its value that can easily be looked up in order to make a decision. Their weakness is that they are not easy to update when the agent’s goals, or the state of the world, changes in some critical way. Model-based RL, on the other hand, is better in this respect as it can use reasoning or search on a model in order to find paths leading to the fulfilment of the agent’s current goals. The downside, of course, is much greater computational cost.

So what does the brain do? For over a decade it has been known that temporal difference learning, a type of model-free RL algorithm, appears to explain the activity of dopamine neurons and their dorsolateral striatal projections. It has also been observed that parts of the prefrontal cortex appear to implement some kind of model-based RL algorithm. Mammalian brains, then, appear to get the best of worlds by having model-free and model-based RL algorithms and then choosing which to use on the fly.  Pretty clever huh?

A key question then is how this choice is made. I recently read Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control from Nature Neuroscience, by Nathaniel Daw, Yael Niv and Peter Dayan. They suggest that the brain may be using some kind of Bayesian principle based on the uncertainty estimates generated by each system. They implement such a system and show how it can explain a range of experimental data from animal studies.

This research is now four years old and there is plenty of significant research in the area that follows it — an “embarrassment of riches” as Dayan recently described it.  Nevertheless, I think it’s a good example of the kind of crossover between machine learning and neuroscience that is starting to take place.  Indeed, the more neuroscientists learn about the brain, the more it starts to look like the rough outline of an AGI design.

→ 1 CommentTags: ·

The unreasonable effectiveness of data

June 20th, 2009 · Research Review

We recently had a visitor to the Gatsby Unit talk about his work in reinforcement learning, in particular the use of planning and forward models to speed up the learning of difficult tasks.  The substance of his talk was good, but that’s not what I want to talk about: it was the motivation he gave in his introduction that bothered me.  Basically he said that humans learn much faster than reinforcement learning algorithms, and thus we should try to figure out how to make our algorithms learn faster.

Really? It takes babies half a year or more to learn to control their limbs in fairly basic ways. How many reinforcement learning algorithms get run for six months in a single learning trial?  As an adult if we try to learn some new control task, such as balancing a pole, it can take hours of effort despite having years of prior motor control experience. A reinforcement learning algorithm, on the other hand, can learn to solve some of these problems in seconds with no prior experience at all.  In a few minutes algorithms can even learn the much more difficult double pole balancing problem. This is a problem that would take me months to master, if indeed I could ever get the hang of it.  If we think about problems that humans can learn to solve quite quickly, but that machines have not yet mastered, there is usually a massive amount of prior knowledge that people are using, knowledge that may have taken years to acquire.

It appears to me that we now have some very powerful learning algorithms, in the sense that they can learn moderately complex control tasks with very little data. The performance of these algorithms is already significantly super human in some contexts. This is true not just for reinforcement learning, but many kinds of machine learning algorithms.  Unfortunately, for the more ambitious goals of artificial intelligence these highly data efficient algorithms on our moderately sized data sets haven’t worked very well.  One key reason for this, I suspect, is that these are inherently messy and complex problems that can only be solved with truly massive amounts of data.  I think that lot of human abilities that AI research has struggled with probably fall into this category, e.g. vision, language and common sense knowledge.

Something similar has been expressed in The Unreasonable Effectiveness of Data by Alon Halevy, Peter Norvig and Fernando Pereira. One of the examples they cite is that for years people tried to construct a grammar for the English language by hand. Even at 1,700 pages, however, the grammar was still incomplete! Similar efforts have gone into systems to translate one language into another. To start with a manual approach looked promising as relatively few rules cover a significant percentage of the cases. However, as you try to improve the system the number of rules needed starts to grow rapidly, and eventually explodes. The solution, which is now used by the best translation engines, has been to move to a data driven approach: you take a learning algorithm that scales well and then feed it massive quantities of data. Given enough data, all sorts of subtleties and complexities of the language become statistically learnable.  I think there is a key idea here for wannabe AGI designers.

→ 4 CommentsTags: ··

Prospect theory investors

June 17th, 2009 · My Research

I recently completed a finance paper on the implications of prospect theory for portfolio choice and asset pricing. I worked on this with Prof. Enrico De Giorgi during my post doc at the Swiss Finance Institute. This post is meant as an introduction to this work; the full paper can be downloaded here.

Finance models, like all mathematical models, suffer from the following problem: if you don’t make the initial assumptions simple and easy to work with the theoretical analysis that follows is too difficult to manage. In finance this usually translates into assuming that investors are fully informed, completely rational and are just out to maximise their expected future utility. You also tend to assume that the returns on risky assets, for example stocks, follow geometric Brownian motion in continuous time, or have returns that are log-normal distributed when working in discrete time. These assumptions are somewhat close to reality, but simple enough to permit theoretical analysis.

So what does the analysis say? Among other things, it says that people should be investing a large proportion of their wealth into stocks. In reality, however, most people don’t own any stock, and most of those who do don’t have a particularly large proportion of their wealth in stocks unless they are very wealthy. Perhaps this is ok in a prescriptive sense, i.e. telling you that you really should consider owning more stock.  However, as a descriptive model of investors, i.e. describing what investors do and why, they seriously fail. Playing with the parameters does not save you: in order to get people holding so little stock you have to push the level of people’s risk aversion up far beyond the range of values that have been empirically estimated. Thus, if our theoretical analysis is correct and produces the wrong answers, it must be that our basic assumptions were wrong.

This isn’t really news, indeed it’s well known that people are not rational expected utility maximisers.  When we have to make decisions, all sorts of cognitive biases and distortions come into play.  Seminal work in this area was done by Kahneman and Tversky.  They produced a model of human decision making known as prospect theory, work that Kahneman later won a Noble prize for (sadly Tversky died some years before the award). Due to some technical problems, this was later refined to produce cumulative prospect theory, which I will now very superficially describe.

[Read more →]

→ 4 CommentsTags: ··

Swine flu statistics

June 16th, 2009 · Uncategorized

As some of your might know, I’ve been sick at home for the last week with the flu.  Naturally, this has got me thinking and reading a bit about the current swine flu pandemic and so I thought I’d share a few of my thoughts (besides the fact that being stuck at home is getting a bit boring).

First of all, the general trend over the last month is very clear: this thing is exponential.  Any graph that shows total human suffering increasing exponentially is something to be rather concerned about.  Indeed, if anything the graphs are probably far far below the true number of cases.  One authority in the US said that the true number of cases was likely to be 20 times the number of confirmed cases, one UK virologist put it at up to 300 times.  The US Centre for Disease Control described the official number of confirmed cases as “largely irrelevant”.  In many countries now you don’t even get tested even if you are suspected of having it.  Thus if you see that some (but certainly not all) graphs of confirmed cases seem to be flattening out, well don’t believe it, it’s probably due to changes in data collection.

Here in the UK I had a look into whether I could easily get tested for swine flu.  Currently there are about 1,600 confirmed cases, which probably means between 10,000 to 50,000 actual cases according to some experts’ guestimates.  Basically the answer is no.  Unless I know somebody with swine flu, they don’t want to know.  Even then they aren’t all that interested in testing me unless I’m really sick.  Which is a pity as it would be kind of fun to know, as well as giving me piece of mind in the future knowing that I’d already had it.  In any case, the fact that I can come down with flu out of season and during an official pandemic and not be tested just reinforces the fact that the number of confirmed cases is meaningless.  Of course this doesn’t stop the latest numbers from being updated in the media every night!  All we really know about the total number of cases is that it’s way bigger than the number of confirmed cases and it’s growing exponentially.

The good news, as no doubt you’ve seen in the media, is that the death rate is fairly low, perhaps 1 in every 5,000 cases.  And if you’re healthy it should be significantly less again.  And unlike total cases, death statistics are inherently much more reliable.  Nevertheless, if this continues to grow exponentially for the next six months, and we don’t expect widely available vaccines before then, the total number of casualties could be pretty substantial, especially in densely populated poor countries.

→ 1 CommentTags: ·

Most surprising thing since 1999?

June 16th, 2009 · Uncategorized

I just read this article on the scale of time by Mike Treder. Part the way through it has an interesting question: What would surprise a person from the year 2000 most about the year 2010? As I don’t know what will happen in the next year, I prefer the 1999 vs. 2009 question: If I got on the phone with 1999 me, what would be the most surprising news?

Let’s start with what was going on in 1999: I had my first cell phone. Black and white LCD screen. No text messaging. I started working for Intelligenesis (later called Webmind, founded by Ben Goertzel). The machines we had were 500 MHz and had 256 MB of RAM. I discovered Google. Internet at home at 56k, but something like 256k at work. I was using Linux and was well aware of open source software. Quake was popular. Computers had CD drives, but DVD drives were starting to come out. Nobody had LCD monitors except on laptops. Dot.com boom was going crazy. The Matrix was a big hit.

Ok, so what would be the biggest surprise for 1999 me? I think the single biggest surprise would be that a black man had been elected president of the United States.  I thought it would be at least another generation or two before this would be possible.  The next most surprising thing would have been Wikipedia. Though given that Linux development was working well at the time, I guess with the right control structures in place it shouldn’t have been all that surprising.  Still, it continues to amaze me at just how good it in fact is.

Many other things seem to have been fairly predictable: internet got faster, bigger, computer specs all went up, people started watching video on the internet, voice and video chatting over the internet, more mobile internet… Would any of these things have surprised me in 1999?  I don’t think so.  Even the recent rise of social networking: I couldn’t have predicted what that would have looked like, but it’s not all that surprising.  Same for internet banking.  A lot of what seems to have been going on over the last 10 years is just the maturation of the internet and mobile devices.

What are the most surprising things for you over the last 10 years?

EDIT: Add to my list: free email service with almost 10 GB of storage (gmail), and Google street view.

→ 5 CommentsTags: ·

Black swan research

May 29th, 2009 · Uncategorized

A month or so ago I became a “twit”, in internet speak.  I didn’t really see the point in Twitter, but given that it’s the new big thing in internet land I figured that the only way to understand it was to try it…  I got myself a Twitter account.  I soon realised that it’s basically the same as a Facebook status, which I already used, but without the Facebook walls.  I soon configured the two to sync.  Anyhow, my favourite Twitter feed so far is that of Tyler Emerson.  He seems to find all sorts of interesting stuff, you might want to check it out.  Some of his recent tweets are links to two articles about research and risk, which is what I really want to talk about.

The first is an editorial in Nature called A risk worth taking.  I think this quote sums it up, “Researchers long ago learned that the last people they should tell about their big ideas are their sources of financial support.”  It then goes on to describe the radical approach that the Bill and Melinda Gates foundation is taking to overcome this problem.  Good on them, but even if this works it only solves part of the problem: if you do obtain funding to undertake radical research and your research fails, which is rather likely, what then becomes of you?  Will you get the next job/grant, or will the guy who did less risky research and got some not-altogether-surprising results that were then published in a mainstream journal?

[Read more →]

→ No CommentsTags: ··

On universal intelligence

May 8th, 2009 · My Research

It’s been a while since my journal paper on universal intelligence came out, and even longer since the intelligence order relation was published by Hutter that this was based on. Since then there have been a number of reactions; here I will make some comments in response.

One point of contention concerns whether efficiency should be part of the concept of intelligence. Hutter and I have taken the position that it should not, and I continue to think that this is the right way to go. As what we are debating is a definition, it’s hard to claim that one of these two possibilities is in some absolute sense “correct”. All we can argue is that one is more in line with what is typically meant when the word is used. Looking over the many definitions of intelligence that we have collected, in the vast majority the internal computational cost of the agent is not taken into account. Thus, among professional definitions the pattern is clear.

What about naive usage of the concept then? I think it’s the same. Imagine that you discovered that some friend of yours, who seemed completely normal, actually had only half a brain. Due to his smaller brain making more efficient use of its resources it wasn’t obvious from the outside that anything strange was going on, until a brain scan revealed this. Would you now say that your friend was twice as intelligent as you had previous thought?  Consider a more futuristic hypothetical. It may well be the case that intelligence (in my sense) scales in a sub-linear way with respect to computational resources. Indeed, many learning, modelling and prediction algorithms scale in a sub-linear way with respect to computational resources. This raises the possibility that after a singularity the world could be run by a computationally vast and phenomenally smart machine which, in an efficiency sense, has significantly sub-human “intelligence”.

[Read more →]

→ 25 CommentsTags: ··

Overheard around Gatsby

May 1st, 2009 · Uncategorized

It was recently discovered that in a sleeping brain applying just 200 spikes to a single neuron over a few seconds was enough to change the entire brain state into a different regime with different brain wave frequencies.  This shows that brains can be very responsive to fairly small perturbations.

Specialised neuron modelling chips recently reached a real-time efficiency of 600,000 neurons per Watt. These chips have highly flexible software configurable connectivity patterns. Thus, for the same amount of power as 20 home electric kettles you can now simulate as many neurons as are in the human cortex.

By the end of the year Markram is apparently going to publish full details of the cortex circuit that his group has developed.  He also has a few papers lined up that, according to him, contain key insights into how the cortex works based on his group’s BlueBrain simulations.

Next week is a neuroscience workshop with loads of short talks by researchers from here and around the world.  Should be really interesting.

→ No CommentsTags: ··