Cathedrals and morphine

A vision I often hear articulated by post-Singularity utopians goes as follows: benevolent machines solve all our social ills; humans, assisted by advances in medical technology, continue to occupy more or less their current form and live out happy posthuman lives until the heat death of the universe. The standard AI disaster scenario, by contrast, looks like this: we create a superintelligence which we think shares our values; the computer thinks deeply about its value function, then proceeds to tear the earth apart atom by atom and fill the solar system with smiling dolls, brains attached to morphine drips, etc.

The problem of “Friendly AI”, as it is presently considered, seems to be of ensuring that we avoid the second scenario and ensure the first. In particular, it is assumed that a correct and stable encoding of the human value function¹ will ensure the continued survival of Homo sapiens or some incremental refinement. This is not obvious to me; indeed, it rather seems probable that the destruction of all recognizably human life is consistent with many reasonable value systems we might inject into an AI.

The morphine-drip future is horrifying because we instinctively feel that the Good Life ought to involve more than physical pleasure—instead some kind of intellectual effort, some mode of aesthetic satisfaction involving neural machinery more sophisticated than endorphin receptors. So we describe mathematics, write symphonies and cookbooks.

Presumably we want superintelligences to seek out even more challenging and more complexly structured aesthetic experienes, and want to maximize the quality, quantity and perhaps diversity of such experiences integrated over time. Such maximization need not be friendly (in the little-f sense) to humans. We will, at some point, compete for resources; it will be necessary to choose between the maintenance of a human mind and the simulation of numerous non-human ones.

I’m obviously correct to throw myself in front of a bullet for a stranger who shares my value system, and is better equipped to act on it; we should a fortiori be prepared to trade the entire human species for a class of beings strange but similarly-motivated, and more powerful than us in every respect.

This is not a novel observation—certainly contemplated by the corner of the Internet that spends its time worrying about AI. I suppose what surprises me is this: It seems quite probable to me that most reasonable value systems call for the expeditious destruction of recognizeably human life. This fact ought to be central to all the current public discourse around AI, but it isn’t.

If a being which I cannot distinguish from the infamous paperclip optimizer tells me it needs the atoms in my body to build a structure it finds more beautiful than any cathedral, who am I to say no? If we design AI to protect us from ever having to answer such a request, what Sistine Chapels have we written out of existence before they could be conceived?

One one hand, even this is intuitive—I know plenty of people for whom the decision to eat meat is a carefully calculated tradeoff between the pleasure / nutrition they derive and the moral status of the meal in question. On the other hand, there many such tradeoffs (considerably less exotic than AI). Can we conceive of a ritual of human sacrifice so heartbreakingly beautiful its worth performing once a year on purely aesthetic grounds? This seems instinctually horrifying, but what do you call the NFL? The conditions under which we’re willing to exchange human lives for entertainment are in fact numerous but idiosyncratic; we but participate in all kinds of beautiful activities that result in an expected loss of more than one life yearly but find others horrifying.

We’re willing to exchange a few lives in this fashion; is it so strange to imagine that we might someday be willing to exchange them all?

I exclude from this discussion proposals to replace bodies with simulations—these destroy me in a sense nearly as complete as the one discussed hereinafter. There’s a corollary to Vinge here: “Is a simulation of me, on better hardware and with different senses, still me? —Yes, but only very briefly.” ↩