Tick, tock, tick, tock… BING

Am I the only one who, upon hearing the year 2010, imagines some date far off in the future? I think I felt the same way in the weeks before 2000, so I’m sure it will pass. Anyway, another year has gone, indeed another decade, and it’s time for my annual review of predictions. You can find my last annual post here.

It’s been an interesting year in which I’ve been exposed to far more neuroscience than ever before. What I’ve learnt, plus other news I’ve absorbed during the year, has helped to clarify my thinking on the future of AI. First, let’s begin with computer power. I recently gave a talk at the Gatsby Unit on the singularity in which I used the following graph showing the estimated LINPACK scores of the fastest computers over the last 50 years.


The first two points beyond 2010 are for some supercomputers that are already partly constructed. In the past performance estimates for these kinds of machines near to their delivery have been reasonably accurate so I’ve put these on the graph. Rather more speculative is the 2019 data point for the first ExaFLOPS machine. IBM is in discussions about how to put this machine together based on the technology used in the 20 PetaFLOPS machine due in a year and a bit. Based on articles on supercomputer sites like top 500, it appears to be a fairly mainstream opinion that this target should be achievable. Nevertheless, 9 years is a while away so I’ve marked it in grey.

First observation: just like the people who told me in 1990 that exponential growth in supercomputer power couldn’t continue for another decade, the people who told me this in 2000 were again completely wrong. Ha ha, told you so! So let me make another prediction: for the next decade this pattern will once again roughly hold, taking us to about 10^18 FLOPS by 2020.

Second observation: I’ve always been a bit sceptical of Kurzweil’s claim that computer power growth was double exponential, but I’m now thinking that there is some evidence for this having spent some time putting together data for this graph and attempting to compensate for changes in measurement etc. in the data. That said, I think it’s unlikely to remain double exponential much longer.

Third observation: it looks like we’re heading towards 10^20 FLOPS before 2030, even if things slow down a bit from 2020 onwards. That’s just plain nuts. Let me try to explain just how nuts: 10^20 is about the number of neurons in all human brains combined. It is also about the estimated number of grains of sand on all the beaches in the world. That’s a truly insane number of calculations in 1 second.

Desktop performance is also continuing this trend. I recently saw that a PC with just two high end graphics cards is around 10^13 FLOPS of SGEMM performance. I also read a paper recently showing that less powerful versions of these cards lead to around 100x performance increases over CPU computation when learning large deep belief networks.

By the way, in case you think the brain is doing weird quantum voodoo: I had a chat to a quantum physicist here at UCL about the recent claims that there is some evidence for this. He’d gone through the papers making these claims with some interest as they touch on topics close to his area of research. His conclusion was that it’s a lot of bull as they make assumptions (not backed up with new evidence) in their analysis that essentially everybody in the field believes to be false, among other problems.

Conclusion: computer power is unlikely to be the issue anymore in terms of AGI being possible. The main question is whether we can find the right algorithms. Of course, with more computer power we have a more powerful tool with which to hunt for the right algorithms and it also allows any algorithms we find to be less efficient. Thus growth in computer power will continue to be an important factor.

Having dealt with computation, now we get to the algorithm side of things. One of the big things influencing me this year has been learning about how much we understand about how the brain works, in particular, how much we know that should be of interest to AGI designers. I won’t get into it all here, but suffice to say that just a brief outline of all this information would be a 20 page journal paper (there is currently a suggestion that I write such a paper next year with some Gatsby Unit neuroscientists, but for the time being I’ve got too many other things to attend to). At a high level what we are seeing in the brain is a fairly sensible looking AGI design. You’ve got hierarchical temporal abstraction formed for perception and action combined with more precise timing motor control, with an underlying system for reinforcement learning. The reinforcement learning system is essentially a type of temporal difference learning though unfortunately at the moment there is evidence in favour of actor-critic, Q-learning and also Sarsa type mechanisms — this picture should clear up in the next year or so. The system contains a long list of features that you might expect to see in a sophisticated reinforcement learner such as pseudo rewards for informative queues, inverse reward computations, uncertainty and environmental change modelling, dual model based and model free modes of operation, things to monitor context, it even seems to have mechanisms that reward the development of conceptual knowledge. When I ask leading experts in the field whether we will understand reinforcement learning in the human brain within ten years, the answer I get back is “yes, in fact we already have a pretty good idea how it works and our knowledge is developing rapidly.”

The really tough nut to crack will be how the cortical system works. There is a lot of effort going into this, but based on what I’ve seen, it’s hard to say just how much real progress is being made. From the experimental neuroscience side of things we will soon have much more detailed wiring information, though this information by itself is not all that enlightening. What would be more useful is to be able to observe the cortex in action and at the moment our ability to do this is limited. Moreover, even if we could, we would still most likely have a major challenge ahead of us to try to come up with a useful conceptual understanding of what is going on. Thus I suspect that for the next 5 years, and probably longer, neuroscientists working on understanding cortex aren’t going to be of much use to AGI efforts. My guess is that sometime in the next 10 years developments in deep belief networks, temporal graphical models, liquid computation models, slow feature analysis etc. will produce sufficiently powerful hierarchical temporal generative models to essentially fill the role of cortex within an AGI. I hope to spend most of next year looking at this so in my next yearly update I should have a clearer picture of how things are progressing in this area.

Right, so my prediction for the last 10 years has been for roughly human level AGI in the year 2025 (though I also predict that sceptics will deny that it’s happened when it does!) This year I’ve tried to come up with something a bit more precise. In doing so what I’ve found is that while my mode is about 2025, my expected value is actually a bit higher at 2028. This is not because I’ve become more pessimistic during the year, rather it’s because this time I’ve tried to quantify my beliefs more systematically and found that the probability I assign between 2030 and 2040 drags the expectation up. Perhaps more useful is my 90% credibility region, which from my current belief distribution comes out at 2018 to 2036. If you’d like to see this graphically, David McFadzean put together a graph of my prediction.

This entry was posted in Singularity. Bookmark the permalink.

34 Responses to Tick, tock, tick, tock… BING

  1. Carl Shulman says:

    Do you think that most people with comparable smarts and AI/neuro knowledge would have similarly clustered predictions? It seems that the answer is no, which means that for you to have a 90% (!!!) confidence on the 2018-2036 window you need a good explanation of why others are coming to different conclusions,

    • Shane Legg says:

      I don’t know of many people who are following current research in theoretical neuroscience, are interested in how this might help with the design of an artificial general intelligence, and are thinking about singularity type things such as the exponential growth we continue to see in computer power.

      Among the very few people I know who fall into this category and who I’ve asked (quite likely a selection bias here both in terms of who I ask and who bothers to study all these areas), they have roughly similar estimates.

      • When you talk of “theoretical neuroscience”, as something AGI-relevant, what kind of areas do you mean, specifically? Could you give the references to a couple of papers? Last year, I’ve read a fair share of various neuroscience papers, and I wasn’t left with an impression that there was anything distinctly AGI-relevant. For example, hippocampus research seemed deep enough to uncover something about the algorithms, and I’ve got at least one new idea from it, but as always, there seems to be no timeline for when things become clearer…

        • Shane Legg says:

          To give an example: many people over the years have told me that reinforcement learning has nothing to do with AGI. The fact that the brain does reinforcement learning, more specifically temporal difference learning, and that this plays an important role, is a good hint to AGI designers that this can be an important part of a viable AGI design. The fact that the brain is doing both model based and model-free RL and switches, as I wrote about in a blog post a few months ago, is another good hint. Internal rewards for informative queues is another good hint. And so on.

          As one accumulates these kinds of hints from neuroscience the space in which a working design is known to exist contracts exponentially.

      • Carl Shulman says:

        Josh Tenenbaum disagrees very strongly with your estimates (especially the 90%) and has very strong AI and neuroscience expertise:

        http://web.mit.edu/cocosci/josh.html

        • Shane Legg says:

          Let’s hope that he’s right.

          From what I recall he comes at this from a cognitive science perspective, which in my opinion is unlikely to be very fruitful in terms of helping AGI development.

          • Carl Shulman says:

            I’d be curious to know your reasons for thinking that cogsci won’t be helpful. Have you had much exposure to it?

          • Shane Legg says:

            I wouldn’t say that I’ve had a lot, no, but I keep on bumping into it. A few years ago some people tried to convince me that there was good cognitive science being done and sent me a few example papers… but once again I wasn’t very impressed. It usually seems shaky and speculative to me, almost like philosophy rather than science. In my view the solid answers that are pushing us forward are coming from machine learning and neuroscience.

  2. >>there is currently a suggestion that I write
    >>such a paper next year [...] but for the time
    >>being I’ve got too many other things to
    >>attend to.

    That paper would make my day ;-) In lack of it, which review articles, or up-to-date textbooks, would you recommend to someone who wants to bring himself up to speed on all this ?

    An uncommented list of key terms, like “actor-critic”, would also be very helpful.

    • Shane Legg says:

      It’s difficult because the things I’m finding that are relevant to AGI design are scattered all over the place. Often just parts of papers, or specific sections of books.

      Various of papers by my current supervisor Peter Dayan are interesting, in particular stuff to do with temporal difference learning and dopamine. For a more general grounding see the textbook by Sutton and Barto, and also papers by these two. The “Cortex and Mind” book by Fuster has a reasonable very high level overview of cortex. Many of the papers by people like Hinton and Bengio on deep belief networks are good. Also look at slow feature analysis and echo state computation. For an overview of neuroscience that isn’t too massive, try Instant Notes Neuroscience by Alan Longstaff. If you’re completely new to neuroscience, On Intelligence by Hawkins is a fun read, though I don’t buy his Chinese room arguments and there isn’t anything very original in his book — it’s more that the book does a good job of introducing these ideas outside neuroscience. That should at least get you started.

      I really need to start putting together some notes in preparation for writing this NS for AGI paper one day.

      • Do you include reinforcement learning as in Sutton and Hinton’s RBMs in “theoretical neuriscience”? Not what the phrase normally brings to mind.

        When I studied this stuff, I was unable to make heads or tails of “echo state computation”. Is it lucid/interesting?

        • Shane Legg says:

          When Manuel said “all this”, I took it to mean work in both theoretical neuroscience and machine learning. This was reinforced by the fact that the keyword he suggest was “actor-critic” which is clearly a machine learning term.

          Echo state computation is nice because it’s so simple and yet manages to achieve quite a lot. It also shows how a system can compute many things at the same time in a mixed up convoluted way, and yet at some level be conceptually extremely simple in design. It seems likely to me that some of the processing in the brain might turn out to have a similar kind of flavour.

      • >> When Manuel said “all this”, I took it to mean
        >> work in both theoretical neuroscience and
        >> machine learning.

        “NS for AGI” pretty much nails it, thank you.

        And don’t hesitate to put any future notes for the paper on your blog…

  3. vv111y says:

    Hi Shane,
    Can’t find original or get into the details of this:

    http://www.insidescience.org/research/computers_faster_only_for_75_more_years

    Does it conflict with anything you say? Do you know anything about this paper? ie. What assumptions they are making?
    Just passing it along FYI.

    • Shane Legg says:

      The limits they are talking about are 75 years away, so, no, this doesn’t affect the computer performance I’m talking about which is just 10-20 years away.

      • vv111y says:

        I’m curious about the validity of this claim. It sounds like they have a solid number, but I wonder if there are any iffy assumptions that they made.

        • Roko says:

          75 years more of moore’s law would mean 10^33 or so FLOPS in a supercomputer of size, power consumption and cost roughly what we have today – which is actually pretty small compared to what is possible (going all-out and spending the entire R+D/defense budget of a major superpower in 2085 on a supercomputer might add 4 orders of magnitude in FLOPS)

          10^33 FLOPS is just really hard to think about well. There are about 10^26 or so atoms in your brain – this 2085 supercomputer would be able to do 10,000,000 calculations per second for each atom(!!!!!!!!!!!!!!!!!!!) in your brain.

        • Shane Legg says:

          I haven’t gone through the details as I didn’t really care much for the result — the problem is that what is permitted by the law of physics, and what we can actually manage to build, might be very very different.

  4. samk says:

    If the brain could compile some optimization problems into protein structures which it solves by (nonsimulated) annealing, you might need a few more orders of magnitude of compute power. Does this seem any less ridiculous than the brain being a quantum computer?

    • Shane Legg says:

      As far as I know, and I’m not a quantum physicist by any stretch, everything we know about maintaining quantum coherence says that this isn’t going to work in a warm liquid environment like the body. This is where all the current theory and evidence lies.

      Is there any solid evidence, experiment or theoretical, to the contrary? As far as I know, no.

  5. Tim Tyler says:

    If you look at my own estimate (dated Oct 2008), we appear to agree to the year:

    http://alife.co.uk/essays/how_long_before_superintelligence/

    See the probability density function at the bottom of the page.

  6. Pingback: the Foresight Institute » Is the brain a reasonable AGI design?

  7. I tried to define a list of steps that could be useful to speed up our way to AGI.

    Here it is: http://bit.ly/4yH0kG

    AGI list members are currently debating about how AGI development could be funded. They are not agreeing on many aspects, and are involved in endless flame wars on the differences of each researcher approach (wasting way too much time in this fights).

    I am convinced that a distributed agent based approach to AGI could be funded by social network companies if some effective social game byproducts could give real value and generate interesting feedback.

    What do you think?

    Marco ( @mgua on twitter )

  8. Shane Legg says:

    @Marco

    Few AGI researchers spend time on the internet arguing – they are too busy doing things. Most that I know have almost no internet presence, and don’t mention AGI in their research papers either, for fear of being labelled a crank in academic circles where they work. So keep in mind that people on things like the AGI list (which I don’t read by the way) are a very particular subset of the area.

    There is a large gap between the level of sketch you have provided and a convincing design for an AGI. I think there are still some serious mountains to be climbed in the technical details of how some key parts of such a system would work.

  9. Aron says:

    samk’s point is of interest to me. Any good links along those lines?

    Our brain is doing all this on 30 watts. Are we gonna get there with process shrinkage? Something is fundamentally amiss about the hardware path we are going down with regard to the variety of computation likely needed for AGI.

  10. Shane Legg says:

    @Aron

    Energy (and heat) scale quadratically with frequency in a CPU. Thus if you cut frequency by a factor of 10, and have 100 times as many cores, then you can get much better performance on fine grained parallel problems without increasing energy consumption. This is the direction that GPUs are going in. It also explains the low energy consumption of the brain: neurons fire at 100′s of Hz, vs billions of Hz in a CPU. Thus in terms of parallelism and energy consumption, the move towards GPU style computing is heading in the direction of more brain style computation. Of course the degree to which a GPU does this is still many orders of magnitude away from the brain.

  11. Pingback: Progress with AI « dw2

  12. Pingback: Video: The case for Artificial General Intelligence « dw2

  13. Pingback: Chapter finished: A journey with technology « dw2

  14. gwern says:

    > So let me make another prediction: for the next decade this pattern will once again roughly hold, taking us to about 10^18 FLOPS by 2020.

    I’ve logged this prediction at http://predictionbook.com/predictions/1752

  15. Tim Tyler says:

    The prediction graph link in the post seems to have got broken. I had made a backup – and I have now uploaded it here:

    http://alife.co.uk/essays/how_long_before_superintelligence/

  16. Pingback: Optimism as Artificial Intelligence Pioneers Reunite | Hugo Penedones blog

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>