Yudkowsky has been posting a lot on Overcoming Bias recently about his theory of metaethics. Today he posted a summary of sorts. Essentially he seems to be saying that morality is a big complex function computed by our brain that doesn’t derive from any single unifying principle. Rather, this function is a mishmash of things and even we don’t really know what our own function is, in the sense that we are unable to write down an exact and complete formulation. It’s just something that we intuitively use.
I’m not convinced that ethics can’t be derived from some deeper unifying principle. I’m also not convinced that it can, lest you misunderstand me. What I do accept is that if this is possible then finding such a principle and convincingly arguing for it is likely to be difficult in the extreme, and probably not something that is likely to happen before the singularity. Nevertheless, I haven’t yet seen any argument so devastating to this possibility that I’m willing to move it from being extremely difficult to certainly impossible. Any system of ethics that does derive from some unifying metaethical principle is almost certainly going to be different to our present (western?) ethical notions. I think some degree of this is acceptable, given that our ethical ideas do change a bit over time. Furthermore, no matter how human we try to make the ethical system of a powerful AGI, post-singularity we are still going to be faced with ethical challenges that our pre-singularity ethics were never set up to deal with. Thus, our ethics are going to have to be modified and updated in order to remain somewhat consistent and viable, otherwise we’ll end up with this kind of nonsense.
Anyway, let’s assume that this unifying principle either does not exist, or at least can’t be found. How can we tell if an AGI is ethical given that we can’t explicitly and completely specify what this means? This seems like the problem Turing faced when trying to determine whether a machine is intelligent or not. He figured that he couldn’t explicitly and completely say what intelligence is, unlike the research by Hutter and myself, and thus he tried to dodge the issue in the obvious way by setting up an imitation game that doesn’t require an explicit description of intelligence.
Here we can do something similar: set up a group of people and the AGI and ask them ethical questions from a panel of expert judges. If the judges cannot tell which the machine is, then it passes. Given that the morality function varies between people, and that we can’t say explicitly and completely what our own function is, this seems to be about the best we could hope for. Naturally, this doesn’t prove that the AGI, or indeed any of the humans participating, are “good”. An evil genius could probably pass such a test. Rather, it is simply designed to test whether the AGI is at least able to compute a version of the human morality function which is sufficiently similar to ours that it is able to pass as being human. Whether the AGI (or human) actually takes its human-passable morality function and reliably and consistently seeks to follow it into the future is a whole other set of problems. Thus, passing such a test is perhaps a necessary, but certainly not a sufficient condition for having an ethical AGI.
I’m sure somebody must have proposed this idea before, but at least my half hearted attempt to find the idea on Google didn’t turn up anything. I should also point out that in order for this test to work you’d probably want the AGI to pass a more general Turing test first so that it doesn’t get singled out by the judges for various other reasons. Only then should you bring in a group of expert ethicists to try to judge which of the test subjects was ethically inhuman. We would also want to include in the test subjects a few very nice people and a couple of professional ethicists as we wouldn’t want the AGI to be able to “fail” for being too nice or consistently ethical.
I wanted to ask a question which is, perhaps marginally, related to your post. It is about the singularity. When I try to follow the logic of the proponents of the singularity I imaging the following.
a) Ever accelerating technological progress culminates in a singular process/event that completely changes the world as we know it. These singular changes propagate in all directions at a speed close the speed of light c, because I do not see any reason for this event to be local.
b) I also assume that the singularity is a natural consequence of an evolutionary process capable of producing something reasonably intelligent (e.g. human-level intelligence).
c) I also assume that such an event (and hence such an evolutionary process) have not occurred within our past light cone (i.e. at distance R from us more than R*c time ago).
Finally, I find it almost impossible to simultaneously accept a), b) and c). This is a sort of contradiction.
Question: do you see a contradiction or some flawed reasoning here?
How do you know that everything would expand out at the speed of light post singularity? And even if it did, the universe is a very big place. We might be the only intelligent life in the Milky Way and nearby galaxies, i.e. within a 100 million light year radius. Also, the Milky Way is a pretty old galaxy. Perhaps intelligent life is evolving in some other galaxies not too far away, but either haven’t gotten to a singularity yet or only just got there a few millions years ago. A post singularity society might even go into hiding in order to try to avoid encountering other post singularity societies that have different goals. Who really knows?
There are too many unknowns here to have a contradiction in my opinion.
You are right: it is not a contradiction in a strict sense.
I just pointed to certain aspects that, imho, will need to be explained to some extent, if one wants to be taking seriously by the scientific community. It seems odd to me that many singularity experts seem to take for granted inevitability of the singularity. Please correct me if I am wrong.
> How do you know that everything would expand out at the speed of light post singularity?
Of course, I do not know. I am simply equating progress=exploration. It is strange to think that a technologically advanced society with means to travel to other stars would not do it.
> And even if it did, the universe is a very big place. We might be the only intelligent life in the Milky Way and nearby galaxies, i.e. within a 100 million light year radius.
It is a possibility that we are the first. But just keep in mind some numbers:
number of stars in a typical galaxy – 10^10
number of galaxies in the observable universe – 10^11
age of the universe – 10^10 years
Remember also that scientists are seriously looking for signs of (at least primitive) life on other planets. The idea that our Earth is somehow unique – is very uncommon.
> Also, the Milky Way is a pretty old galaxy
But the Sun is not.
> Perhaps intelligent life is evolving in some other galaxies not too far away, but either havenâ€™t gotten to a singularity yet or only just got there a few millions years ago
From the numbers that I gave one can conclude that there is something very unnatural about evolution. I would even suggest that a person with understanding of probability theory (and given the current scientific knowledge about the origin of life) should conclude that the singularity is a very unlikely outcome of an evolution.
> A post singularity society might even go into hiding in order to try to avoid encountering other post singularity societies that have different goals
All post singularity societies hide from all other hiding post singularity societies?
> It seems odd to me that many singularity experts seem to take for granted inevitability of the singularity.
Personally, I think it’s likely, but certainly not inevitable.
There really are so many unknowns in this area. If faster than light travel was possible and not too vastly difficult, then I’d expect to see aliens here already. Assuming they exist and want us to know about their existence. One possibility is that even post singularity societies can only do space travel at small fraction of the speed of light due to the vast amount of energy required. Or maybe evolution is a very rare event, even if earth like planets aren’t so rare. Or maybe 99% of societies go extinct as soon as a singularity occurs…
I do agree with you somewhat though: it seems a little strange to me that it’s so quiet out there in space.
Here we can do something similar: set up a group of people and the AGI and ask them ethical questions from a panel of expert judges. If the judges cannot tell which the machine is, then it passes.
I’m not sure how this avoids the obvious problem of changing moralities. What if the people in the Middle Ages somehow managed to build an AGI and then reprogrammed it when it produced wrong answers to the question “should all infidels be forcibly converted to the faith, or failing that, killed”? Unless you require the AGI to lie and simply come up with answers it thinks we’d want to hear…
Trying to extrapolate what our ethics will be in the future is a much harder (impossible?) problem that is beyond what this test tries to achieve. Other approaches will be needed for such things, if indeed these other problems can be solved at all.
If you were one of the judges in my test, and one of the test subjects said that executing anybody who didn’t convert to their faith was ethical, would this raise or lower your expectation that the individuals was a machine?
Lying is not a problem for the test: if an agent can consistently give human-like answers to ethical questions then it must actually be able to compute this function. That’s all that the test is checking for.
Ah, right. You’re saying this test would be used, not for a finished AI that’s supposed to extrapolate the “final” target morality, but for a prototype stage that’ll simply identify what we currently consider a good morality? That makes more sense.
As for whether that answer would raise or lower my expectation that the individual was a machine… well, it would depend on what I thought the composition of the test subjects was. If I thought the group included people of all kinds, including religious fanatics, probably the only things that would make me suspect the answerer was a machine would be ones that somehow struck me as, well, bizarre or inhuman. (This hilights one possible problem with your proposed test – if the criteria is simply “determine whether the individual’s answers are those of a human or not”, then any human morality would have a chance of passing, even the most extreme and hateful ones.)
It’s not really a “problem” with the test, in the sense that its aim is quite specific. Let me try to explain.
The point of this test is not to try to check whether an AGI is completely ethical or safe. That’s (seemingly) a really big hard problem. The purpose of this test is much more limited: it’s simply trying to check whether an AGI can compute a morality function that can pass as being human.
If an AGI fails this test, then it clearly should not be considered safe, I’m sure you’ll agree. If an AGI passes this test, then we can at least rule out many really bizarre failure scenarios that could arise with an AGI that couldn’t compute a human passable morality function. With this kind of problem crossed off, we can then move on to trying to deal with all the other potential problems, of which there are plenty.
So in reply to your last point: yes, all sorts of human moral ideas would get through this test. The idea is that the space of all human passable moral functions is already a tiny subset of the space of all moral functions. For example, no sane human would answer that turning all 5 year old girls into paper clips is an ethical thing to do.
Now if you want to establish that the AGI is capable of computing a “nice” morality function, rather than merely a human one, then that opens up lots of additional issues, such as deciding which kind of human morality is desirable and which isn’t… but at least we’re now working in a much smaller space.
On the subject of post-singularity entities keeping a low profile:
This imitation test seems like a subset of the Turing test – since the judges there can swing the topic around to ethics if they so choose.
Yes, it’s a kind of Turing test for a specific human ability.