Tick, tock, tick, tock…

I recently read about IBM’s Sequoia supercomputer that will be operational in 2011.  It will perform 20 Peta FLOPS and have 1.6 Peta bytes of RAM.  To put that in perspective: if it were to attempt to simulation a human cerebral cortex it would be able to allocate 50 bytes of RAM and 700 calculations per second to every synapse in the model.  Unless the human brain is doing something pretty weird, the quest to build a computer with comparable raw processing power is almost over.

As I do at the start of each year, I’ve spent some time reconsidering when I think roughly human level AGI will exist.  I’ve again decided to leave it at 2025, but now with a reduced standard deviation of 5 years.  Computer power is a limitation as researchers typically have limited hardware budgets, unlike the DOD guys and their monster supercomputers.  From what I’ve read, computer power should continue to grow exponentially for at least the next 5 years, and probably the next 10.  So I don’t see this as being too much of an issue in the coming decade.  On the algorithm side, I think things are progressing really well.  I know a number of very talented people who are working on what I think are the key building blocks required before the construction of a basic AGI can begin.  I’m certain these problems are solvable, but whether it takes 2 years or 10 years is hard to guess.  This is my main source of uncertainty.

UPDATE 11 April 2009: Note that these predictions do not take into account my apparent bias towards predicting that things will happen faster than they actually do (see previous post). The required compensation for technology events appears to be about 50% more time. Thus if you want the “Shane meta predictor”, then take 2033 as the expected date, perhaps with a standard deviation of 7 years. At least with financial markets I trust my meta predictor more than my straight predictions and thus I buy and sell accordingly to it. So I suppose that if I had to put money on a date, I should go with 2033. But don’t ask me why it’s going to take that long: I really don’t know.

This entry was posted in Uncategorized and tagged , , , , . Bookmark the permalink.

28 Responses to Tick, tock, tick, tock…

  1. What do you think the key building blocks of AGI are? Even a list without explanations would be interesting.

    For me, it’s currently implicit (compressed) representation of huge Bayesian networks by “local” contexts, expressive enough to model arbitrary computations and some range of network structures, in particular capable of representing its own execution process. Too vague, but 10 or 20 years could change that. I don’t particularly expect the solution to come faster, but it could.

    I agree, hardware is not the issue at this point, or won’t be in the near future.

  2. Carl Lumma says:

    So much obsession with human intelligence. We must have passed the hardware threshold for squirrel intelligence already — wither ASI?

  3. Justin says:

    I think you are too optimistic there.

    (1) Another major problem with building AGI is software engineering. Building something like Google is a piece of cake compared to that.

    (2) Counting synapses and comparing it with bytes is not enough. The amount of actual memory needed will not be clear before implementation actually starts and has progressed. Bookeeping is huge. Without giving this a serious second thought, I argue that the amount of memory needed to simulate a brain scales superlinear with the number of synapses in it. (The more RAM you have, the more bytes you need for pointers, etc.)

    (3) Scale this thing. Concurrency is not understood very well, new programming paradigms have to emerge and to be mastered. It took object orientation 25 years to be considered seriously by companies and scientists.

    Thus I think that the major issues with AGI are engineering problems and training data. Also, I don’t believe that the choice of learning algorithm will make a huge difference. As soon as the engineering problems can be mastered, the computer to simulate a human brain will have been around a long time.

  4. Shane Legg says:

    Vladimir

    The short answer is: large-scale self-organising hierarchical temporal networks, similar to what we see in cortex. There are various ways to approach this, steps along the way that need to be taken, as well as a number supporting algorithms. I believe that this is the key to recognizsably brain-like intelligence. UK visa permitting, I’ll be working full time on this shortly with a small group of researchers who have similar interests.

    By the way, I haven’t forgotten your question about safety vs. speed of AGI development. ;-)

  5. Shane Legg says:

    Carl

    Human intelligence is a natural milestone given that we are human. Also, once a machine this powerful has been constructed by humans, it will be at a level where it is able to understand its own design. That has deep implications and it’s something that a squirrel can’t do.

    Secondly, squirrel brains and human brains are very similar. If we manage to build a ASI using a brain-like approach, then we’re already 95% of the way to human level AGI. In terms of hardware, I’m sure we passed ASI level some years ago. It’s the algorithm that’s missing.

  6. Shane Legg says:

    Justin

    1) It depends on how you think the AGI will be designed. If it depends on hand coding a large number of modules that each implement complex processes then yes, it will be a big software engineering challenge. But that’s not the view I take. I expect it will be more like the Pybrain project, but with really big networks running on really big distributed systems. With a few good software engineers who have worked on distributed supercomputers before, I don’t think the software will be more than a few years work. Most of the code, I expect, will be in various supporting and testing system. The (really) hard part of the project lies in figuring out the topology and dynamics of these networks. But that’s not software engineering.

    2) These are all order-of-magnitude estimates. Three 64 bit pointers require 24 bytes of RAM, and they each index spaces much larger than the number of synapses in the human brain (about 2^47). That leaves 26 bytes of storage for the state of the synapse; for example, six single precision floats and two bytes for any additional information. That’s enough for a fairly realistic biological model, and I’d be surprised if the fundamental information processing occurring at a synapses required more than this. And if you’re still not convinced, just wait another 3 years until supercomputers exist with 10 times as much RAM.

    3) I believe in using brain-like network designs that are already inherently distributed.

  7. That sounds like what I’m developing in the background, a line of thought that started from pattern-inference intuition that is written up on my blog in the last sequence starting here, and moving on through question of how to implement that on a locally connected 2D matrix (“inference surface”) of active elements.

    The problem is that I can’t quite figure out the semantics of what’s going on in such network. I think I finally know what kind of dynamics will produce impressive results, and so hesitate discussing the details (returning to the question of math destruction), although experience tells me to expect that it’s a bust. I’m currently studying inference in graphical models, maybe it’ll give necessary insight.

  8. David says:

    “large-scale self-organising hierarchical temporal networks, similar to what we see in cortex”

    that sounds a lot like Jeff Hawkins work at Numenta. Vladimir, I believe they have a publicly available release to play around with their HTM technology

  9. Shane Legg says:

    David

    Yes, Hawkins influenced me quite a bit a few years ago. I think he was on the right track in his book, but I think he’s taken a wrong turn since then. Besides thinking that he’s now on the wrong track, I also wouldn’t touch his code with a barge pole due to the scary license it comes with.

  10. Justin says:

    Shane,

    There is a lot more bookkeeping involved. For example. you will have to save an ordering of the synapses in order to walk them in the correct order. You will want to distribute this onto multiple nodes in order to speed up computation, you will want implement the operations efficiently, you need special cases for different nodes etc.

    As you know, I have been spending time on porting the core functionality of PyBrain to C++. Often there is suddenly the need for a datastructure that you have not thought about beforehand and your memory requirements go up.

    This is why I am saying that AGI is not only a question of theory, but also a pretty big question of engineering.

  11. R. Jones says:

    Artificial Intelligences Exist Today

    Scientists have formulated at least 100 definitions of intelligence (for a partial listing see the vetta definitions-of-intelligence) Several computer programs exist which qualify as intelligent according to at least the vast majority of these definitions. (for instance my Asa, Trans. of the Kansas Academy of Sci., vol. 109, # 3/4, pg. 159, 2006, http://www.bioone.org/archive/0022-8443/109/3/pdf/i0022-8443-109-3-159.pdf) Not all creatures are equally intelligent so there is no need for an AI to be as intelligent as the average human. In point of fact, however, AIs exist which outperform humans at a number of important tasks.

  12. jfromm says:

    You are right, the clock is ticking. The computing will soon be enough to approach human level AGI, especially if we consider large distributed networks of computers. But raw computing power alone is not enough, the whole thing must be embedded and connected to an environment, too. And we must find the right learning algorithms to let it grow.

  13. Shane Legg says:

    Justin

    I’ve worked on neural network libraries for companies before that had to be distributed across machines. Sure it’s not something that you’d code in a month; it takes a team of people a few years to sort out. But that’s peanuts compared to a couple of the bigger projects I’ve worked on that had a million plus lines of code. As far as coding projects go, even fancy neural networks aren’t all that big.

    Regarding memory consumption. You want an extra pointer? How many can a single link in the network need? Surely no more than 10 indices! If we stay with 3 floats per link, then having 10 pointers would require a super computer with about twice as much memory, or roughly another 18 months of hardware development.

    Once we are in about the right order of magnitude, a factor of 2 or so in the hardware requirements doesn’t make much difference.

  14. Shane Legg says:

    R. Jones

    I went to have a look at that paper you link too, sounds interesting. Unfortunately, I get a 404 page not found error.

  15. Carl Lumma says:

    Shane: Yes, my point exactly. Hence, I was implying we should 1. not concern ourselves so much with the speed of the fastest supercomputers and 2. focus on ASI for now, i.e. learn to walk before trying to run.

    That said, if we want to indulge, Modha’s group is doing 1/10-realtime mouse cortex on Blue Gene (as of 2007). Modha estimates mouse cortex at 45 TFLOPS and human cortex at 350 PFLOPS, using a highly simplified model. Markram’s more nuts-and-bolts approach wants 1 PFLOP for mouse. Finally, we should keep in mind that we can expect no more than 1/2 the rated performance of supercomputers in a real application, and often much less than that.

  16. Vladimir Golovin says:

    Shane, what do you think went wrong with Hawkins’ approach after the book?

    (I have no information at all about his progress after the book, and would sure like to know more. Does he have a blog?)

  17. Carl Lumma says:

    Vladimir: I won’t answer for Shane, but Hawkins started a full-blown corporation to commercialize the technology:
    http://www.numenta.com

    They released an updated version of their visual recognizer demo earlier this week, in fact.

    I went to their conference last year and played with their SDK a bit, and it turns out to be just a toolkit of standard machine learning stuff. If there was anything novel in Hawkins’ book, it didn’t make it into the final product.

    One of the things Hawkins insisted on in his book is a prediction-based approach. Unfortunately the Numenta toolkit doesn’t do prediction yet. However, they said they’re working on it, and it will be interesting to revisit their stuff when that’s done. -Carl

  18. Shane Legg says:

    Valdimir

    My thoughts are in line with Carl’s.

    Carl

    For a direct simulation approach we’ll need a lot of resources. I think that’s partly because we don’t really know what we’re doing. If we manage to work out the fundamental algorithm we might be able to get away with a fraction of the resources. For example, simulated synapses and neurons can be made far more reliable than real ones. Parts of the algorithm might be able to be done far more accurately and efficiently with just a few numerical computations, rather than using a neural network structure. Of course until we have a working system this is just speculation…

  19. Mark says:

    I think the pursuit of human intelligence in computing is a bit of a dead end. I think long before we have the requisite technology and software there will be developments in AI that move towards a natural computing intelligence that doesn’t function in the same way as human intelligence. Once these developments hit and prove themselves to be useful (and money spinners) all the important research will happen in these fields – ie specialized intelligences designed to do the jobs humans don’t have the intelligence capacity for.
    It’s rather like the development of robotics – human bipedal robots do very little work in the world and remain as entertaining curiosities because this design is limited by the human form. But there are hundreds of thousands of robots in the world that don’t make any attempt to mimic the human form, because it simply isn’t the best design for the job.
    Once we have these intelligent specialized AIs they will quickly evolve into intelligences that are nothing like human but are necessarily more intelligent than humans and will make the search for human AI seem a little irrelevant.

  20. Shane Legg says:

    Mark:

    “…all the important research will happen in these fields – ie specialized intelligences designed to do the jobs humans don’t have the intelligence capacity for.”

    In a sense that’s already true. Computers are vastly super human at numerical calculations, remembering data, playing checkers, and many other things. I see it as a frontier that is slowly getting pushed back.

    What limits further increasing the economic value of computers isn’t so much what they already do well, but rather what they can’t yet do. Things that you need a human for.

    Anyway, it’s not that I’m wanting to build an artificial human. The reason that I’m interested in how the brain works is that this is a real working system that solves many of these problems that computers can’t yet solve. I want to learn some of the brain’s techniques and tricks. Some say that the brain is too complex to ever understand in this way. I think they are wrong.

  21. Nathan Helm-Burger says:

    Hi Shane,
    I’ve been reading your blogs and am impressed both with the quality of your writing and the quality of the comments you’ve been getting. I’d like to ask your opinion on some new technology.
    First, my background is in neuroscience. I’m fascinated with AGI, but have no expertise whatsoever in it. I’ve been following Markhram’s and Modha’s work, and was very excited by Hawkin’s book when it came out.
    Recently I read Ray Kurzweil’s “The Singularity is Near” and found his arguments pretty persuasive.
    The new technology that has me excited, for the neuroscience field at least, is Kwabena Boahen’s brain-on-a-chip. It is much more programmable, more powerful, and more scalable than previous attempts. Enough so that I think it has cross-over potential for AGI research. I haven’t found much flap about this on the internet in regards to AGI, so I’m wondering if I’m right in thinking this.

    My rationale is that there is a sort of sliding scale of elegance, efficiency, and unpredictability versus brute-force and steady progress in AGI research. I’ll try to lay it out here:

    Most elegant, and unpredictable – some very clever and well implemented software cuts straight to the chase and enables AGI with a minimum of brute computing power for machine learning.

    Medium on both — increased computer power enables scaling of Hierarchical Temporal Memory or some similar machine learning algorithm with only a moderate amount of cleverness in the software to emulate & surpass human intelligence.

    Slow, inelegant, brute-force computing combined with slow painstaking reverse engineering of the brain enables emulation of the human brain on future supercomputers. (e.g. Markhram’s Blue Brain project)

    So, the reason this new brain-on-a-chip is game changing (in my inexpert worldview) is that it enables the slow, inelegant-but-likely-to-succeed method to happen without significant further improvement in either technology or software. It’s simply a question of funding and scaling this technology up enough. (My best estimate would be a few tens of millions of dollars) And if technology does improve (as it likely will), then the funding will be quite attainable in just a few years (a couple million or less). In ten years from now, if this project pans out, a common desktop computer attached to one of these chips could emulate a super-intelligent human.
    Since my upper-limit on likely time-frame of the singularity occurring was based on the fast method (hare) not working out, and the slow method (tortoise) setting the pace. Now it appears to me that the slow method’s timeframe has been reduced from 25-30 years from now (my personal estimate of Markhram’s project turning into a feasible super-human emulator) to a mere 10 years or less.
    This is both exciting and rather shocking. Also, if braingate’s technology (now at Braingate 2) continues to improve, ten years should be sufficient for safe, effective high-bandwidth brain-computer interface which could be linked to Boahen’s (or FACETS’ similar, but less impressive, chip) allowing for brain upgrades. So there wouldn’t be any need for software at all, just programming the brain-chip to learn from the brain-interface like normal neurons do…..

    Okay, I’ve gone on too long. If you’ve gotten this far, thanks for reading.
    -Nathan

    • Nathan Helm-Burger says:

      Oops, I meant to give a link…. I like Boahen’s talk on youtube about his chip. He’s got more specific info on his website.
      http://www.youtube.com/watch?v=mC7Q-ix_0Po

    • Shane Legg says:

      Yes, the brain on a chip stuff is pretty cool. I read about it from time to time. One of the main things about it that has impressed me is the power efficiency compared to using a more general purpose massively parallel supercomputer. I’m not sure how significant these chips are going to be. The great thing about a more normal computer is that it is so much more flexible: you can try not just a different set of connections, but all sorts of weird and wonderful ideas. Thus my prediction is that normal computers will dominate for the time being, even in computational neuroscience.

      Regarding Markram’s work. I’m not sure how valuable this will turn out to be. On one hand he’s made perhaps the most detailed cortical model so far. On the other hand, is it detailed enough? There are various things it misses and they could prove to be decisive in some subtle way. Or maybe not. Also, even if his model is correct, we still may not really understand what the underlying algorithm is doing.

      My current bet is on something half way along your scale. Not a detailed brain simulation, but rather a partly brain inspired general design with key insights or even components of the system coming from algorithms in machine learning.

  22. Jake Cannell says:

    Hello Shane, I just found and am enjoying your blog. I currently work in graphics but find AI is getting really fascinating. I attended the Singularity Summit in 08 I quite liked it – sounds like it was even bigger and better this year.

    Anyway, perhaps from reading the Singularity is Near, or the Blue Brain Project, I’ve thought that the computational power for full brain emulation is still a ways off, and that it would happen a decade or so out due to Moore’s Law, not a new hardware breakthrough. Looks like the latter is actually far more likely. To echo Nathan’s point, more recently I’ve been reading up on progress in neuromorphic hardware designs and its pretty amazing – this is probably going to reach brain level hardware capability soon – if it was scaled up to the high tech, high density fab processes it could happen today (the hardware at least, the exact wiring schematics – the chip layout is still a reverse engineering effort). The FACETS project’s prototype chip and wafer-scale integration system is especially interesting (turn the entire wafer into a usable circuit, no need to cut it up into chips, defects are ok!). Whats even more incredible is that since the synapses are emulated directly with just a handful of transistors (instead of digitally simulating with hundreds of thousands and many clock cycles), they can run the hardware at many many times real-time – 1,000 to 10,000 while still using low power. The implications of that are rather mind boggling. By the time full scale models are built and the brain’s wiring architecture is understood and emulated, these neuromorphic systems will be able to think thousands of times faster than humans.

    Its possible that some more traditional software based AGI approach could reach the goal 1st, but the brain reverse engineering and emulation approach looks like its getting the fast track. And more importantly, from day 1 it will have an unbelievable speed advantage.

    • Shane Legg says:

      Hi Jake, I would bet against these technologies, at least for the next 5 years. The problem is market volume: conventional CPUs and GPUs have a lot of it while new alternative technologies don’t. By the time some alternative technology gets ready for market, the more conventional stuff is already 10x more powerful than it was before, and developers are used to working with it etc.

  23. Jake Cannell says:

    That is very true in the near term at least. But looking out a little farther to human level AI, the problem is the massive training time. If you believe the most likely route to AGI is going to be cortically inspired, some form of ‘self-organising hierarchical temporal networks’ (and there are good reasons to believe the brain is a well optimized physical solution to the agent problem – it is after all the result of a massively long optimization) then the key determinant is a large memory capacity and the ability to access and compute across all of it at a high rate. This entire class of parallel network inference algorithms have a memory+weight update operation at their core.

    If the brain is fairly well optimized, then you need to get a network close to the brain’s connection capacity for brain-like intelligence, and to run it real-time, you probably have to be able to simultaneously update all or most of the nodes at say 100hz, perhaps 1000hz. (conceivably there are algorithm improvements that improve locality and get some gains, but I doubt it would be orders of magnitude) Upcoming fast GPU’s (such as Nvidia’s Fermi) are just about there with a 1GB of memory each and 100GB/s of bandwidth. So if the synapse roughly equals a byte, you only need a million GPU’s – very expensive indeed, but perhaps not unrealistic in a decade. Without moving to custom ASIC, you could climb above 1000kz by running the whole thing from cache – but at astronomical cost. On the opposite side of the spectrum, you could use CPUs hooked up to flash SSD’s, and build a network of the same capacity for orders of magnitude less. But it would run about 1/100th of real time. The bandwidth/memory ratio isn’t changing much across this spectrum – moore’s law is only reducing the total cost.

    Now here’s the rub: running at 100-1000hz, it takes the biological brain about a couple of years just to get to primitive speech, 5-8 years to develop to the point where it can use a small vocabulary and read and so on – and thats with supervised learning (parents). So the speed of the simulation is of crucial importance, and it will dominate large scale simulations.

    The problem is a full test cycle of an AGI requires a full developmental training cycle – from infant to adult! Thats so incredibly expensive that moving to more custom hardware to speed it up far beyond real-time will be important. And it will be important long before you get to a full human model. Its almost like writing code that takes 30 years to compile. There are many pitfalls during development – most routes usually leading to low intelligence or worse.

    You could say that your particular AGI model will reach human level with less training cycles, but thats essentially a bet, an expensive one, and with some evidence against (nature has already ran an astronomical large number of full tweak & test cycles).

  24. Kevembuangga says:

    But looking out a little farther to human level AI, the problem is the massive training time.

    Wow, for once I am (almost) entirely in agreement with a singularitarian.
    No matter the route it will take, brain simulation, machine learning, quantum computing, whatever, Artificial General Intelligence will take ages to develop up to and beyond human intelligence.
    Intelligence isn’t magic, it doesn’t sprout out of thin air once you’ve guessed the magic spell, it takes collecting knowledge and the more elaborated the knowledge the more expensive it is to learn (computing wise).
    This is the main reason I don’t “believe” in the Singularity and I am not afraid of “Big Bad AI”, I think those scares are just wishful thinking and nightmares of a handful of paranoid monkeys.

  25. Jake Cannell says:

    Kevembuangga:

    There seems to be a general trend in AGI towards more cortical inspired designs (also going hand in hand with all the advances in computational neuroscience). Actually, I think a full virtual rat is an immediately achievable milestone (we have the computational power already). If a team could produce that, and more importantly, get a simulacra of a rat body that looks like a rat and does rat-like things in a detailed simulation (some of these bio-AI teams should really team up with a full game simulation engine & team), it could really prove the cortical algorithm scalability, and then things could move *rapidly* after that – attracting funding to scale quickly to cat/dog like brains which then can be built into custom hardware for robotics. Right now there isn’t enough money in AGI for it to attract billions of dollars, but a couple of key advances like that could change that almost overnight.

    That would be a rapid, sudden breakthrough – and once that is achieved, you’re very close to an infant posthuman brain. That by itself is not scary or profound. But once the cortical model is developed, you can pour money into translating it to custom hybrid digital/analog hardware, and you can run it 1000x real-time. That *is* profound, and that will quickly bring about a Singularity (a time of near-instantaneous change). Smart human minds that can think a 1000x faster than us are not the traditional ‘Big Bad AI’, but they will change the world over night (literally) and are potentially extremely dangerous.

    Making them ‘friendly’ will be very important, but that won’t involve programming (its not a technical challenge), it will involve careful social planning, parenting, education, and a controlled knowledge environment.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>