Model-free reinforcement learning (RL) algorithms are computationally cheap as each state-action pair keeps a cached estimate of its value that can easily be looked up in order to make a decision. Their weakness is that they are not easy to update when the agent’s goals, or the state of the world, changes in some critical way. Model-based RL, on the other hand, is better in this respect as it can use reasoning or search on a model in order to find paths leading to the fulfilment of the agent’s current goals. The downside, of course, is much greater computational cost.
So what does the brain do? For over a decade it has been known that temporal difference learning, a type of model-free RL algorithm, appears to explain the activity of dopamine neurons and their dorsolateral striatal projections. It has also been observed that parts of the prefrontal cortex appear to implement some kind of model-based RL algorithm. Mammalian brains, then, appear to get the best of worlds by having model-free and model-based RL algorithms and then choosing which to use on the fly. Pretty clever huh?
Continue reading