It’s been a while since my journal paper on universal intelligence came out, and even longer since the intelligence order relation was published by Hutter that this was based on. Since then there have been a number of reactions; here I will make some comments in response.
One point of contention concerns whether efficiency should be part of the concept of intelligence. Hutter and I have taken the position that it should not, and I continue to think that this is the right way to go. As what we are debating is a definition, it’s hard to claim that one of these two possibilities is in some absolute sense “correct”. All we can argue is that one is more in line with what is typically meant when the word is used. Looking over the many definitions of intelligence that we have collected, in the vast majority the internal computational cost of the agent is not taken into account. Thus, among professional definitions the pattern is clear.
What about naive usage of the concept then? I think it’s the same. Imagine that you discovered that some friend of yours, who seemed completely normal, actually had only half a brain. Due to his smaller brain making more efficient use of its resources it wasn’t obvious from the outside that anything strange was going on, until a brain scan revealed this. Would you now say that your friend was twice as intelligent as you had previous thought?Â Consider a more futuristic hypothetical. It may well be the case that intelligence (in my sense) scales in a sub-linear way with respect to computational resources. Indeed, many learning, modelling and prediction algorithms scale in a sub-linear way with respect to computational resources. This raises the possibility that after a singularity the world could be run by a computationally vast and phenomenally smart machine which, in an efficiency sense, has significantly sub-human “intelligence”.
Why then do some people feel the need to define intelligence with respect to computational efficiency rather than purely in terms of decision making performance? The reason, I believe, is that at some level they recognise that if the definition of intelligence does not take efficiency into account, then intelligence is not the right metric for their research. And they are right! An intelligent machine will consist of some impressive hardware combined with clever algorithms that can efficiently turn the computational power of that hardware into intelligence. The job of the hardware people is to come up with more and more powerful hardware, and they are clearly doing a wonderful job of this. The job of the AI people is the second part: to come up with the most efficient way to convert computation into intelligence. If you want to build a metric for your AGI algorithm research, a measure of the efficiency of intelligence is what you really need — let the hardware people take care of the other side of things.Â If we both do our jobs well, the end result will be a lot of machine intelligence.
Another point that often comes up concerns whether universal intelligence is in fact too broad. For practical AGI researchers, the answer is probably yes. More specifically, if you want to produce a system with a somewhat human like intelligence, and that is optimised for the universe we live in, rather than very general semi-computable probabilistic environments, then yes, you will want a more focused kind of “intelligence” than universal intelligence. Still very broad, sure, but your target is not quite as extremely general.Â Why then didn’t we try to do this? Answer: one step at a time! Constructing a practical general intelligence measure for AGI is not easy, and almost certainly too big a job for a PhD research project. Thus my goal with universal intelligence was to try to capture the concept in the cleanest, most formal, and most general sense possible in the hope that this might provide some theoretical foundation for later work on practical tests of AGI intelligence. If that’s your goal, then go for it, and I hope that my theoretical work is of some use to you.
A related point to the one above concerns the sensitivity of the universal intelligence measure to the choice of the reference machine. In some situations, for example with Solomonoff induction, the choice of the reference machine doesn’t matter too much. With universal intelligence it doesn’t work out as well. The invariance theorem for Kolmogorov complexity provides some protection, but it’s not enough. The usual trick in Kolmogorov complexity is to then minimise the state-symbol complexity of the reference machine. As there exist very simple UTMs, and there aren’t many of them, that succeeds in locking things down fairly tightly. When you read criticisms of Kolmogorov complexity based work that show strange results by varying the reference machine, have a look to see if they limit themselves to minimal reference machines. Almost always they completely ignore this constraint, because with it their criticisms would no longer work, or at least be much weaker. I sometimes wonder why this happens and I suspect that part of the reason might be the way in which Kolmogorov complexity is taught. What we should do is to always start by squeezing as much complexity out of the reference machine as possible in order to ensure that the measured complexity of an object is, to the greatest degree possible, a property of that object and not our reference machine. Only then should we mention that there is this invariance theorem that is often useful to prove things.Â And then point out that for various asymptotic results, such as the randomness of infinite sequences, the reference machine is completely irrelevant.
Anyway, in the case of universal intelligence, the reference machine in effect defines how we weight the agent’s performance in different environments when trying to compute an overall score. As such, if you want to weight the environments in a way that somehow reflects the universe we live in, you might prefer to select a reference machine that is not an absolutely minimal one. This makes some sense, and it certainly seems that some UTMs are somehow more “natural” than others. Various theory people have tried to go down this path over the years, and so far not much as come of it, as least far as I’m aware. A word of warning then: if you want to solve this problem in a theoretically tidy way, be careful, this appears to be a problem that initially seems easier than it really is. That said, good luck, for if you do succeed such a result could be extremely useful.Â Failing that, one clever way to further reduce the test’s sensitivity to the choice of minimal reference machine was suggested to me by Peter Dayan. The idea is to allow the agent to maintain state between different test environments. This would mitigate any bias introduced as intelligent agents would then be able to adapt to the test’s reference machine as different environments were randomly sampled.Â In other words, it can learn any bias that the refernce machine choice introduces to the distribution over environments and then compensate for this.
Another thing that sometimes comes up is people worrying about the fact that the environment defines the reward. The objection mostly seems to come from cognitive science people, rather than math people and so I think it’s a cultural problem. When we say “environment”, and we stick the reward generation mechanism in there, we aren’t claiming that real environments are what define an agent’s rewards: it’s just a mathematical convenience that makes it easier to mix over all the different kinds of problems and goals. You could separate them out if you wanted to, as I note in my thesis. It adds a few more terms to the equations, but because we mix over the whole space in the end doesn’t make much difference. Also, when we say “agent”, we are using the word in the sense that reinforcement learning people use it (see the introductory parts of the Sutton and Barto book for an explanation of this point). When we say “agent”, in non-RL speak we really mean just an optimisation and decision making part of a real agent. I talk a bit about this in my thesis, but I didn’t have space for these finer points in the Benelearn paper, and our target audience was more RL people then anyway.
One objection that I’ve heard a few times is that a key property of intelligent systems is their ability to choose their own goals. It certainly seems that we have this ability while an agent, as defined in our framework, does not.Â Ask yourself this: how do you choose your goals? One possibility is that you do it in a deterministic way, that is, somewhere in your brain an algorithm runs that looks at all sorts of information and spits out a decision as to what your goal is going to be. For example, you might read some holy book or philosophical text, process the information therein, and then decide, using this algorithm in your brain, to base your life on following these principles. In this way you have chosen some of your goals. If you think about it, an agent in our framework can do the same thing: its goal might be to read in some information from its environment and then take this to be a function which it then tries to optimise.Â Both you and the agent have an underlying goal that generates and selects new sub-goals, perhaps with input from the environment, and then follows these.Â When you choose one goal over another, it is this choosing mechanism that is you true underlying goal.Â Adding randomness doesn’t make any fundamental different to this. Even if your goal is to think up a random goal and then follow it, one may characterise your underlying goal to be just that: to generate and then follow a randomly generated goal.Â Admittedly, your real underlying goal is almost certainly very complex and messy; even if we could fully observe your brain, it might well be next to impossible to extract out a succinct description of your underlying goal.Â But that’s not the point: the important point here is that the framework we define is not as limiting as it might first appear to be.
Related to the above are occasional criticisms that our approach to intelligence is wrong because it takes a behavioural stance, and behaviourism has been thoroughly debunked.Â These people are really missing the point.Â Our goal here is not to explain how human intelligence works. Or how any other intelligent system works, for that matter.Â We take this outside view of things because what we want is a measure that applies across many different kinds of systems with potentially radically different internals.Â It’s not because we’re closet behaviourists.Â Indeed, I work at a theoretical neuroscience institute because I think this will give me pointers on how to design an AGI.Â That makes me the inverse of a behaviourist and their incomprehensible-black-box view of the brain!Â Put it this way: if I want to measure how fast your car is, I don’t really care how it works.Â But if I want to understand why your car is so fast, then I’ll pop the hood.
The final thing I’d like to respond to is the objection that a definition of intelligence should be computable. I wrote a response to this based on the definition of randomness in my thesis. See the bottom of page 77 though to the end of the section on the next page. In short: any definition of randomness that isn’t incomputable would be provably flawed. Sometimes then it is best to define a concept in an ideal and incomputable way, and accept that our ability to measure it in practice is always going to be limited.