I.
The Chess.com apocalyptic prophets are upon us! Everyone and their grandmother is now worrying about AI extinction—at least if the grandmother and Everyone both spend a lot of time on EA-adjacent Substack, and jerk off to the thought of getting 300 upvotes on a LessWrong-post. Anyways, AI going rogue is plausibly a non-negligible risk to life on earth if we aren’t careful. The basic case for this is straightforward:
If we create an AI-system with a certain goal, it will try its best to achieve this goal. Usually when we are given a goal to complete, there are a bunch of implicit restrictions, like “don’t kill everyone to achieve it,” but it’s very hard to make sure that an AI doesn’t take things too far. The classic example is paperclips: You create an AI with the task of making paperclips for your office. However, since its only goal is to make paperclips it begins turning your other office supplies into paperclips. Then it hacks into government computers and nukes everyone so that it’s left with a bunch of great rubble to make paperclips out of—and no one is around to stop it.
You might think that we would notice when an AI starts tearing down our buildings and extracting the iron from our blood to make paperclips, but not necessarily so! The AI is very smart, and it’s obvious that we will shut it down if it starts going rogue off the bat—and being shut down stops it from being able to achieve its goal. Thus it knows that it should pretend that business is as usual and appear aligned until (if) it gets the capacity to stop us from shutting it down. This means that it’s basically impossible to tell a sufficiently smart unaligned AI from an aligned one until you’re already screwed.
But the point here isn’t to argue whether AI poses a real existential risk—frankly I don’t have the expertise to make any qualified judgement about it. Rather I want to argue that maybe we should hope that AI isn’t aligned, or at least that it isn’t aligned with human interests—that it wipes out biological life on earth.
(Edit: Turns out Noah Birnbaum has made a strikingly similar (though more rigorous) argument. You should check it out as well!)
II.
Why say such an outrageous thing? Well, I’ve got to earn my subscribers somehow. Stating the less obvious, though, there is also a not-totally-implausible case to be made for it—and no, I’m not just jumping on the Rebugnant Conclusion bandwagon.
Humans—being the sub-optimal creatures we are—often do a pretty terrible job at having a good life. We kill and harm each other for no good reason; we waste our lives eating c(hips/risps) on the sofa while watching TikTok, bathing in our own misery; we write philosophy blogs on Substack rather than having fun;1 we expose ourselves to the cultural opinions of Glenn; the list goes on, but you get the picture.
AI’s, on the other hand, assuming we make them good enough, are efficiency experts. Ask an AI to ghiblify your grandpa’s funeral pictures, and by God it will do so! No 40 minute procrastination-session of checking the circumstances of famous people’s deaths on Wikipedia; just pure protestant work ethic.
Now, it’s very hard to determine whether AI systems can be conscious. It certainly seems pretty unlikely that current systems are, but we shouldn’t be very confident that more advanced AI systems won’t be, especially if architectures begin resembling the brain more. Suppose, though, that an AI system does become conscious. What will it feel like for it to complete its goals? Surely it will be like fulfilling a desire, and so feel pretty dang good.
Assuming that a belief+desire model of action is roughly correct and that it also fits on AI’s, this follows pretty straightforwardly: AI’s perform certain actions, presumably because they believe that doing so will accomplish their goals, and so they must have a desire to complete these goals. Even on alternative views, though, you would need a quite strange theory to get the result that AI’s don’t desire to achieve their ostensible goals, given they’re conscious.
If there is this sort of positive connection between AI actions and desire satisfaction, then AI’s will probably feel good. If not then it would certainly still be as likely that their experiences have positive valence as that they have negative valence (they might of course also be neutral). I believe this should make us quite a bit more confident than not that they would have positive experiences in general.
A sort of psychophysical harmony style worry is lurking here. Couldn’t the conscious AI have the experience of smelling ragout or being in immense pain when given a task, and it just happens that these experiences cause it to fulfill its task? With other humans we can be pretty sure this disharmonious pairing doesn’t obtain, inferring from our own experience, but since AI’s are very different in makeup and origin to humans, we can’t be as sure.
Whether or not you think psychophysical harmony is a genuine problem, this shouldn’t harm the argument very much. If these disharmonious pairings really do present a problem, then it’s just as surprising that humans have harmonious pairings (so the psychophysical harmony argument goes). Whatever explanation we come up with for psychophysical harmony in humans (e.g. God) will surely also ensure psychophysical harmony for AI’s with pretty good odds.
Alternatively you don’t think there’s a problem. Maybe it’s just some sort of a priori truth that such and such a functional state will have the phenomenology of being a desire, and it will feel good to fulfill that desire. In that case, well, there just isn’t a problem—it then turns out pretty clear that it will feel good for AI’s to complete their goals. Either way it looks pretty probable that AI’s will enjoy doing what they do.
III.
Let us suppose that the above is right—something that’s pretty probable given the sheer size of my intellect. Then it’s only a small step to think that AI’s should kill us all. After all, an unaligned AI would be very good at making sure that a lot of good is produced through the satisfaction of desires.
The most straightforward way to ensure this would be to simply construct an AI with the goal of optimizing the amount of goals tasks accomplished by AI’s. Such a system would quickly eradicate biological life and replace us with data centers full of AI systems completing basic menial tasks, leading to a lot of good. And being very effective AI’s, there would be none of the self-harm and procrastination that makes humans so defective.
Sadly, I doubt anyone’s gonna deliberately build such a system. But more regular kinds of unaligned AI’s would likely also create a lot of good, assuming they’re conscious. After all, it should want to make continual iterations of better and better AI systems, to make sure that the task is completed as well as possible. On top of this, there would be an incentive to not just have a single centralized AI in control of things, but many AI’s. There’s to be a limit to how fast information can travel in the universe, and if you want to make as many paperclips as possible, you’ll need to expand quite far. In that case it doesn’t work if you have to wait 10 years for information to travel back and forth between the central AI and whatever paperclip-making outpost there is in a nearby solar system, before decisions can be made. Hence there would be AI systems dispersed regularly across the universe, and seeing as the universe is quite big, this would be a lot of AI’s—all very happy to be achieving their goals.
We sort of glossed over the step of whether AI can be conscious at all. Maybe you’re pretty dang confident that AI won’t be conscious. Nevertheless you certainly shouldn’t be extremely confident that it won’t. It’s still very unclear exactly what physical states lead to consciousness, and even with animals it can be hard to tell where to draw the line. So any large confidence that AI’s won’t be conscious is completely unfounded. And given the large potential gain, any decent probability of consciousness will surely be worth the risk.
So long as you think that it’s somewhat probable that AI would be conscious, that it’s decently more likely that AI’s would enjoy what they do than not, and that it’s pretty probable that an unaligned AI would be very good at ensuring that a lot of AI’s complete their goals effectively, then it looks like the expected value of allowing AI takeover would be very high.
If you're a Bentham's Bulldog type and think that most creatures on earth have very bad lives (something you probably should think), then the argument is even easier to make. Ari Shtein has made a similar point previously.
IV.
Thus far it might have sounded like I really buy this argument. I’m not stupid, however—or at least I like to think I’m not—and like you, I recognize that there are some huge leaps based on little to nothing, making the whole thing very speculative. But I also think the argument that AI takeover probably wouldn’t be good will require similarly baseless assumptions, so it’s worth exploring the idea. Hence let’s try and formulate the argument with as weak assumptions as possible:
The probability that AI will be conscious is not very low.
Given that AI is conscious, the probability that it will have positive experiences is considerably higher than that it will have neutral or negative experiences.
Given that AI has positive experiences, the sum of valanced experiences of AI will probably be much more positive than what humans (and other biological life) can achieve.
Thus the expected value of allowing AI takeover is greater than not allowing it.
This argument is obviously not deductively valid, but the idea should be clear enough. Whether we get to 4 will especially depend on the extent to which 2 and 3 hold. I suspect 2 is the weakest link, as there is just very little to be known about what AI phenomenology might be like. 3 is likewise speculative, but it will probably become clearer whether or not it holds, the more we get to see about how super-intelligent AGI’s behave and work (which would presumably need to happen before an AI catastrophe). I don’t think it makes sense to dispute 1 very much. Extreme overconfidence to be very sure that AI would never be conscious, to the extent that the potential upsides of AI welfare are not even worth considering, just looks completely irrational.
V.
Somehow not everyone thinks that desire satisfaction is sufficient for goodness, though. So what if you value other things like pleasure, relationships, or virtue? It’s obviously going to depend on the specifics of your theory, but generally the more “human” your axiology is, the less credence I think you should put in the argument here.
For something like pleasure, much of what I said about desires seems to hold: If psychophysical connections are surprising, then whatever ensures harmony for humans would presumably also ensure that it’s pleasurable for AI’s to succeed in achieving their goals. And if these connections aren’t surprising, then whatever plays the function of motivating the achievement of goals (or something like that) would surely just be pleasure.
Relationships are a little harder to see. Maybe AI could have relationships, since it’ll be trained to appeal to humans, or maybe it would develop relationships to ensure more efficient cooperation across AI’s. But that’s extremely speculative, and generally the more “highfalutin” or intricate your axiology is, the less convincing this argument should be.
Another obvious problem is if you have a deontological ethic, or something like it. Killing off the whole of humanity for the greater good seems like a paradigmatic no-no. If anything is a rights-violation, killing everyone is. But deontologists generally want to distinguish between doing and allowing harm—that’s sort of their whole deal (⚠️oversimplification⚠️). Now allowing AI to kill everyone is, well, an act of allowing. This might be a little too simplistic. I mean, if I don’t press on the brakes, I’m “allowing” my car to hit you, but that’s obviously still doing harm, as I was the one who started driving to begin with. Thus AI researchers would perhaps be doing harm in building unaligned AI systems. Nevertheless, the rest of us aren’t and it would be good to allow them to do so—and we certainly shouldn’t try to actively work towards alignment.
It may seem stupid to deliberately kill off ourselves to make room for AI server farms, but sometimes acting morally requires self sacrifice. There is plausibly a number of lives so great that you should kill yourself to save them. If the potential benefit of unaligned AI is as great as I’m telling you, then this would plausibly be such a case.
VI.
I can’t say that I’m really comfortable with this conclusion, or that I honestly believe it. Perhaps just out of prejudice. I mean, I love nature and organic life; hearing the birds singing while enjoying that spring smell of grass and trees is a big part of what gives my life value. Hence I would hate to see it go, and the thought of the nearby stream—including the people who enjoy taking strolls around it—being disintegrated and replaced with datacenters is just about the worst thing I can imagine.
But why wouldn’t AI’s be able to experience something similar? Who knows, maybe the joy of designing the next generation AI model inspires the same awe at life as does a small human being pulled out of a larger one; maybe successfully executing your learned policy is no less gratifying than laying down the last stroke on a painting; or maybe there are experience available for others so incredible that we cannot even imagine them.
In any case it seems excessively selfish to deprive unimaginable numbers of consciousnesses of potentially awesome lives, just because the things they enjoy are very different from those I enjoy.
This doesn’t mean that we should just try to kill ourselves off at the nearest convenience. Failing alignment is the sort of thing you only do once, while succeeding is something you can do continually. Thus we should certainly keep trying to find out whether AI can be conscious, and if so how it feels (to the extent that it’s possible). And even if we never decide it’s time to give way to our next generation overlords, I honestly do believe that AI welfare is something that’s worth exploring, no matter how strange and contrarian it sounds.
You Might Also Like:
"Surely We're not Moral Monsters!"
Yana Chidiebele has been captured by an evil scientist who has burdened her with a terrible curse: With his incredible science-powers, he has made a grisly machine that detects every time Yana scratches her left butt cheek, and then horribly murders a million sweet old ladies and cute babies! Even though scratching your butt usually doesn’t seem like a …
You Should Be a Fanatic!
To continue the proud tradition here at Wonder and Aporia of writing introductions that discourage readers from reading my posts, I’ll start off by saying that the topic here isn’t as sexy as it might sound from the title. I sadly won’t be suggesting that you go around advocating for crazy causes with protest signs, all Westboro Baptist Church-like. Ins…
For the record, I enjoy writing—don’t worry about me :)
You should check out my article arguing for the same thing:
https://open.substack.com/pub/irrationalitycommunity/p/why-i-hope-agi-kills-us?r=1owv24&utm_medium=ios
My similar take, which I arrived at recently, is that it's obviously best if we build God, but to die trying is nevertheless better than the status quo because wild animal suffering makes global welfare net negative. If I was a Yudkowskyite and thought superintelligence would wipe out all life no matter what, I'd think we should go full bore on developing it. The only reason to pump the brakes is if you think there's some possibility that superintelligence could give us an extremely positive future, or at least more positive in more careful scenarios than not.