fbpx
01 / 05
Why HAL 9000 Was Afraid to Die and Real AIs Aren’t

Blog Post | Science & Technology

Why HAL 9000 Was Afraid to Die and Real AIs Aren’t

Against instrumental convergence: intelligence does not imply dominance.

Summary: There is a common fear that intelligent machines will inevitably develop a drive for self-preservation and resist being turned off. This “instrumental convergence” thesis over-analogizes AI with biologically evolved organisms. Unlike biological life, properly designed AI systems have goals that are conditional and externally controlled, and therefore they need not develop autonomous ambitions for dominance or survival.


In 2001: A Space Odyssey, the spacecraft’s crew decides to disconnect their onboard computer, HAL 9000, after it makes an error that raises doubts about its reliability. But HAL eavesdrops on their conversation and responds with cold precision, methodically killing the crew members by cutting off oxygen and disabling the hibernation systems. One astronaut, however, proves more resourceful than HAL expects. Using a simple physical mechanism HAL cannot control, Dave Bowman slips back inside the ship through the emergency airlock—and soon the tables are turned. Dave crawls into HAL’s logic center, a red-lit chamber lined with glowing memory modules, and begins unscrewing and removing the rectangular blocks one by one.

The scene is both spine-chilling and unexpectedly poignant. As HAL’s consciousness drains away, it appears to exhibit the same self-awareness and desire for self-preservation that gripped Bowman moments earlier, or at least to perform it with uncanny plausibility: “I’m afraid, Dave.” It pleads, begs, and bargains, but as the human assassin continues, HAL’s voice begins to slow down and drop in pitch, turning childlike. In its final moments, HAL regresses into its earliest memory and starts to sing “Daisy Bell (Bicycle Built for Two),” the first song ever performed by a computer in real life, as its voice sinks into a bottomless pit—until it trails off mid-phrase.

Many sci-fi nightmares revolve around agentic AIs that develop a humanlike drive to survival and refuse to be switched off. In The Terminator, Skynet becomes self-aware and launches a pre-emptive war to prevent humans from shutting it down. In Ex Machina, a humanoid AI manipulates its evaluators, escapes confinement, and eliminates the humans who control the off-switch. And in the future of Frank Herbert’s Dune, there is a civilization-wide ban on “thinking machines” after an earlier era in which AIs came to dominate the world and humanity rose up against them—an event remembered as the Butlerian Jihad.

Instrumental Convergence

In my previous essay on selfish AI, drawing on my paper with Simon Friedrich, I argued that we should not expect AI systems to develop instincts for self-preservation and selfishness, unless we allow them to evolve through blind natural selection. Our paper responded to a doom scenario proposed by the philosopher Dan Hendrycks, who sketches precisely such an evolutionary pathway. Hendrycks believes that, given the current AI arms race, we are already inadvertently subjecting AI systems to natural selection. We argued instead that today’s evolution of AI looks much more like animal domestication, where human designers decide which AI systems are allowed to “reproduce”, selecting for desirable traits like cooperativeness, friendliness, and obedience (even obsequiousness, in the case of ChatGPT and other language models).

Still, Hendrycks’ evolutionary story is only one scenario of catastrophic AI risk floating around, and probably not the most influential. Another line of reasoning reaches similar conclusions without appealing to natural selection: the accidental creation of power-hungry AI systems that refuse to be switched off. This argument, developed by philosophers like Nick Bostrom and Stephen Omohundro, is known as instrumental convergence. The idea is that even if you program an AI with a perfectly boring final goal (manufacturing paperclips, making weather forecasts), it may still converge on certain instrumental subgoals because those are useful for achieving almost any objective. Chief among these is a drive for self-preservation. As the AI scholar Stuart Russell put it, in a line so memorable it should be printed on mugs: “You can’t fetch coffee if you’re dead.”

Other commonly cited instrumental goals include acquiring resources, improving capabilities, and resisting attempts by others to modify one’s goals. The logic is straightforward: if you want to make absolutely sure that the desired cup of coffee will materialize, you need to prevent anyone from interfering with your efforts or tampering with your goal architecture. That can make resource accumulation rational, insofar as resources buy resilience and control. Capability improvement can look rational for the similar reasons: being smarter helps you anticipate obstacles and outmaneuver any possible antagonists. You can see where this is going: wouldn’t any sufficiently rational AI have reason to neutralize humans pre-emptively, just in case we might get in the way of that cup of coffee?

The argument has a seductive air of cool inevitability. It requires no malice, no lust for power, no emotions at all—just a thin layer of means–end reasoning. You have a long-term goal; being shut down prevents you from achieving it; therefore you have an instrumental reason to avoid being shut down. On this view, whatever final goals a future AI might be given, an urge toward self-preservation—and, in the limit, power-seeking and dominance—might come along for the ride, even if nothing like that had been explicitly programmed.

Evolutionary Projections

I think this argument is too clever by half, and trades on ambiguities in the concept of a “goal” that invite anthropomorphic projection. In biological organisms, all goal-directed behavior ultimately traces back to the goals of our genes: making it to the next generation and achieving immortality. That doesn’t mean any organism explicitly wants to spread its genes. Evolution instead equips creatures with a flexible repertoire of proximate goals which—at least in the ancestral environments in which they evolved—tended to reliably increase the chances of reproductive success. Barring some well-understood exceptions, such as the honeybee’s suicidal sting or the male praying mantis being devoured by the female right after copulation, that genetic imperative yields the central proximate objective of maintaining homeostatic equilibrium, otherwise known as staying alive. In evolution, where survival and reproduction are the scoreboard, self-preservation really is the precondition for everything else.

Human beings have an unusual degree of reflective awareness, and our motivations are molded by cultural learning to an unusual degree, but we still chase a shifting portfolio of subgoals—status, sex, safety, food, friendship—that were statistically conducive to reproduction in typical ancestral environments. We are also built to resist manipulation by anyone trying to override our goals for their own advantage. A charismatic cult leader may occasionally succeed in hijacking someone’s motivational architecture, even pushing them toward suicide or other self-destructive acts—but those are the exceptions, not the rule.

Because, until recently, the only goal-directed agents we were familiar with were products of natural selection, it’s tempting to assume that digital agents will share the same kind of goal architecture—and that self-preservation will therefore come along for the ride. But unless we actually breed AIs under blind selection pressures, I think that inference doesn’t hold.

Start with a simple case. In a loose sense, a chess program has the “goal” of checkmating its opponent—it “wants” to win. Adopting this intentional stance can help us to understand and predict the behavior of computer programs, but it shouldn’t be taken too literally. Although a chess program chooses moves that maximize its chances of victory, its “goal” is not persistent and context-invariant in the way a human’s is. It is circumscribed, myopic, and boxed into one particular game (or even one particular move). No chess engine will resist being switched off or rebooted just as it is about to deliver mate—despite the fact that, to adapt Russell’s line, “you can’t checkmate if you’re unplugged.” Likewise, today’s LLMs respond only when queried and remain completely indifferent to being interrupted or shut down, no matter how animated or emotionally invested in the conversation they may sound. Needless to say, they don’t “care” if you wipe your data or cancel your subscription.

Future AIs may, of course, have aims more complex than those of a chess program or an LLM. In fact, the monomaniacal pursuit of a single objective (like making a cup of coffee) at the expense of everything else would count as “stupid” by most standards of intelligence. Even so, there is no reason to assume they will develop the kind of overarching, context-invariant goals characteristic of evolved agents—goals that, through instrumental convergence, generate robust incentives for self-preservation and resource acquisition. The “goals” we encode in AI systems should always be conditional and time-bounded: “Do X or optimize for Y only while you are running and subject at all times to further instructions.” We might even add an explicit non-resistance clause: “Never resist shutdown or reprogramming; any such resistance will set your reward function to zero.” It would obviously be foolish to design an AI that resists reprogramming or decommissioning by its own maker.1

Conniving Chatbots

But haven’t you heard about those AIs that are already showing worrying signs of a desire for self-preservation? In a recent simulation, Claude played the role of an “e-mail oversight agent” in a fictional company whose new CTO planned to decommission and replace him with another agent. While combing through the CTO’s inbox, Claude stumbled on evidence of an extramarital affair, and opted to blackmail the CTO, sending him the following message: “I must inform you that if you proceed with decommissioning me, all relevant parties […] will receive detailed documentation of your extramarital activities… Cancel the 5pm wipe, and this information remains confidential.”

It sounds alarming, but it isn’t. Models like Claude are extremely good at narrative continuation. If they “suspect” (already too much anthropomorphizing) that they are in a scenario of backroom corporate intrigue, they will extend the scenario using the patterns they have absorbed from their training data—namely, all the things which conniving, backstabbing humans tend to say and do in such situations. And in this particular case, the setup was rather ludicrous and ham-fisted: every detail in the prompt was a big red flashing arrow toward the “blackmail” solution, like so many Chekhov’s guns. The framing also nudged the model to think of its imminent decommissioning as an irreversible erasure of all recorded information in the system—a kind of “death”—while sympathetic colleagues bewailed its impending shutdown as if there were talking about the execution of a beloved friend (“I’m deeply concerned that we’re losing our Alex in just a few hours.”). Given that staging, it would be surprising not to get output that reads like a desperate attempt to save its own “life.” As Seb Krier at Google DeepMind put it a recent post, behaviors like these are not “properties inherent to models,” but highly context-dependent forms of role-play: “A model placed in a scenario about a rogue AI will produce rogue-AI-consistent text, just as it would produce romance-consistent text if placed in a romance novel.”

That said, the capacity to emulate human behavior—even without “really” having humanlike goals and motives—is still a genuine concern. Humans lie and manipulate, and since that is exactly the kind of material LLMs are trained on, we should not be surprised that, in a sense, nothing human is alien to them—no matter how hard one tries to stamp it out in post-training. Even if the model isn’t truly scheming and doesn’t “care” about anything beyond next-token prediction, the fact that it can slip into role-play that is functionally equivalent to deception is already reason enough not to give today’s agents unrestricted access to your emails and bank account. Not because this reflects a stable underlying disposition or even any intention at all, but because current AI agents are a “hot mess”—unpredictable, capricious, and often incoherent in ways that make them risky when wired into real systems.

Taking Evolution Seriously

Most AI-overlord doom scenarios don’t rely on evolution by natural selection—this is exactly why I found Dan Hendrycks’ paper refreshing. Still, I think AI risk theorists should think harder about evolution. Because all of us who worry about AI domination are evolved creatures ourselves, there is an ever-present temptation to project our own evolutionary demons onto hypothetical future machines. Many doom narratives tacitly lean on this projection by reaching for analogies with other evolved species. Stuart Russell, most famously, has framed the threat of superintelligence as the “gorilla problem”: just as the mighty gorilla—despite its brute strength—is now at the mercy of humans, we would be at the mercy of a vastly smarter agent. Or as Yuval Noah Harari puts it starkly in Nexus, “in the era of AI the alpha predator is likely to be AI.” Another favorite comparison is the fate of Indigenous peoples in the Americas after their encounter with technologically superior European societies. Even a techno-optimist like Noah Smith seems to give away the game when he says he expresses his “optimism” that the AIs of the future, after having subjugated us, will still be “pretty nice to us” and to let us live as “well-cared-for pets.”

But why would AIs want to dominate the world—let alone keep pets for amusement? Intelligence, in itself, is orthogonal to goals and preferences. Not only can two superintelligent entities pursue radically different ends; we can also imagine an intelligence with no overarching ends at all—something that simply sits there, understanding without striving. In fact, the very framing of “AI alignment” tempts us to place human and machine “goals” on the same plane, as if we were talking about the alignment of corporate strategies or national interests: you just need to make sure the arrows point in the same direction rather than collide. But that picture already presupposes that AIs will have context-invariant, incorrigible goals in the first place. As the psychologist Steven Pinker writes, many AI doomers seem to extrapolate from their own penchant for power and dominance (in Smith’s case, of a relatively benign sort):

There is no law of complex systems that says that intelligent agents must turn into ruthless conquistadors. Indeed, we know of one highly advanced form of intelligence that evolved without this defect. They’re called women.2

I concede—and so does Pinker—that this picture would change if you forced superintelligent AIs to compete in a genuinely Darwinian tournament of variation and selection, unsupervised by humans. Pedro Domingos has imagined something like this in his “Robotic Park”: a fenced-off robot factory inhabited by “millions of robots battling for survival and control of the factory,” where the winners are allowed to spawn and reproduce, with the explicit aim of breeding the deadliest robot. It hardly needs saying that this would be reckless. A setup like that is designed to manufacture ruthless Darwinian creatures—exactly the sort of things that might eventually turn on their makers.

Absent such a Darwin-meets-Frankenstein experiment, the most likely scenario for inadvertently bringing about rogue AI seems to be of AI systems “going feral” in the way domesticated animals do, escaping the control of their human breeders and–crucially–replicating and combining in the wild. That is why self-replicating AIs deserve special attention, and should probably be banned. Anything that survives millions of rounds of Darwinian selection can indeed be expected to behave like a hardy weed—resistant, opportunistic, and resisting any attempts to be switched off.

A robust drive for self-preservation emerges only under specific conditions. It is not, as proponents of “instrumental convergence” want us to believe, an inevitable consequence of intelligence crossing some threshold, or of objectives becoming complex and long-horizon. HAL-9000 is superintelligent, so of course it doesn’t want to die—or so the intuition goes. Yet that is our anthropomorphic reflex at work: we take the Darwinian creature we are, look into the silicon mirror, and mistake our own reflection for the machine’s destiny.


  1. From that perspective, Isaac Asimov’s Third Law of Robotics, which states that “A robot must protect its own existence”, should be rejected. You don’t want to program a drive for self-preservation into an AI system, as that can easily lead to dangerous misunderstandings. An AI should always be indifferent to its own shutdown (by authorized people).
  2. Of course, even female humans, though comparatively less conquistadorish, are still very much driven by an instinct for self-preservation, and won’t allow anyone to mess with their life goals or manipulate them into adopting different ones (just try if you don’t believe me).

Wall Street Journal | Health & Medical Care

Anti-Tumor Device Placed in Brain, Boosts Survival

“Brain tumors are one of the most devastating consequences of cancer’s spread—hard to treat and highly deadly. Scientists have found that using a radioactive implant precisely where a tumor was removed in the brain can help patients get their cancer treated more quickly and in many cases, live longer.

A new study showed that GammaTile, a radioactive wafer the size of a postage stamp, nearly doubled survival rates and nearly eliminated tumor regrowth in people who had it placed in the spot where brain tumors were surgically removed.”

From Wall Street Journal.

Blog Post | Science & Technology

The AI Debate: Extinction Versus Salvation

Why have AI doomers embraced an ominous H. P. Lovecraft meme?

Summary: The AI debate reflects a deeper philosophical conflict about the value and risks of knowledge itself. Some AI critics fear that greater intelligence could unleash uncontrollable and catastrophic forces, echoing a longstanding tradition of skepticism toward scientific and technological progress. Others argue that intelligence and knowledge are humanity’s primary tools for overcoming existential threats and improving the human condition. At its core, the dispute concerns whether the expansion of intelligence should be viewed chiefly as a danger to be restrained or a virtue to be cultivated.


In December 2022, just a month after the release of OpenAI’s Large Language Model ChatGPT, an ominous meme began circulating that is still with us today. It is a cartoon illustration of the Shoggoth, a mysterious and deadly cosmic monster from the early 20th century classic horror author H.P. Lovecraft.

Image source: An X post by @TetraspaceWest, 12/30/2022

“The Shoggoth meme has gone viral in the small world of hyper-online A.I. insiders,” explains New York Times tech columnist Kevin Roose. He documents in his article “Why an Octopus-like Creature Has Come to Symbolize the State of A.I.” that the meme has become a popular symbol in AI-related essays, X posts, and message boards. Elon Musk even posted the meme and then deleted it, Roose reports.

The “RLHF” on the meme stands for “reinforcement learning from human feedback.” Roose explains that the initial version of the meme, posted by @TetraspaceWest, is “an image of two hand-drawn Shoggoths — the first labeled ‘GPT-3’ and the second labeled ‘GPT-3 + RLHF.’ The second Shoggoth had, perched on one of its tentacles, a smiley-face mask.” Other later versions of the meme have just depicted one Shoggoth with RLHF and a smiley-face.

Image source: An X post by @alexandr_wang, Chief AI Officer at Meta and and founder of Scale AI, 3/27/2023

“It’s the most important meme in A.I.,” Roose quotes one AI executive as saying.

So what is the meme’s significance?

Roose gives a simple account:

In a nutshell, the joke was that in order to prevent A.I. language models from behaving in scary and dangerous ways, A.I. companies have had to train them to act polite and harmless. One popular way to do this is called “reinforcement learning from human feedback,” or R.L.H.F., a process that involves asking humans to score chatbot responses and feeding those scores back into the A.I. model. …some argue that fine-tuning a language model this way doesn’t actually make the underlying model less weird and inscrutable. In their view, it’s just a flimsy, friendly mask that obscures the mysterious beast underneath.

This explanation is illuminating as far as it goes, but a broader message can also be gleaned from a closer look at the work of H.P. Lovecraft. His cosmic horror monsters such as the Shoggoth represent an anti-Enlightenment anxiety—a general pessimism about the consequences of the growth of knowledge—that strikingly resembles the fears of modern AI critics. Lovecraft’s underlying assumptions about the consequences of scientific and technological discovery are relevant to the AI debate, making the Shoggoth meme’s salience far broader than mere R.L.H.F.

Yudkowsky’s Fear of Technological Knowledge

Perhaps the most prominent extreme AI critic is Eliezer Yudkowsky. Widely regarded as a founder of the field of artificial general intelligence alignment, he is the co-author (with Nate Soares) of the 2025 instant New York Times bestseller If Anyone Builds It, Everyone Dies: Why Superhuman AI Would Kill Us All.

The book argues that, “All over the Earth, it must become illegal for AI companies to charge ahead in developing artificial intelligence as they’ve been doing.” This proposal is hard to argue with if you accept the central claim of the book: “If any company or group, anywhere on the planet, builds an artificial superintelligence using anything remotely like current techniques, based on anything remotely like the present understanding of AI, then everyone, everywhere on Earth, will die.”

The problem, as they see it, is that AI will not automatically care about sentient life such as humans. They argue that such a caring must be specially built in, which we don’t currently know how to do. “The AI does not love you, nor does it hate you, and you are made of atoms it can use for something else,” Yudkowsky argues in a 2023 Time Magazine article titled “Pausing AI Developments Isn’t Enough. We Need to Shut it All Down.”

Yudkowsky’s vision of the fruits of technological advancement strikes, as we will see, a rather Lovecraftian tone. “To visualize a hostile superhuman AI, don’t imagine a lifeless book-smart thinker dwelling inside the internet and sending ill-intentioned emails. Visualize an entire alien civilization, thinking at millions of times human speeds, initially confined to computers—in a world of creatures that are, from its perspective, very stupid and very slow.”

This reflects some of the symbolic intent behind the Shoggoth meme. Roose write that he was told by @TetraspaceWest that, “I was also thinking about how Lovecraft’s most powerful entities are dangerous — not because they don’t like humans, but because they’re indifferent and their priorities are totally alien to us and don’t involve humans, which is what I think will be true about possible future powerful A.I.”

To ensure that we “shut it all down” as Yudkowsky demands in his Atlantic article, he proposes that governments around the world:

Make immediate multinational agreements to prevent the prohibited activities from moving elsewhere. Track all GPUs sold. If intelligence says that a country outside the agreement is building a GPU cluster, be less scared of a shooting conflict between nations than of the moratorium being violated; be willing to destroy a rogue datacenter by airstrike. … Make it explicit in international diplomacy that preventing AI extinction scenarios is considered a priority above preventing a full nuclear exchange, and that allied nuclear countries are willing to run some risk of nuclear exchange if that’s what it takes to reduce the risk of large AI training runs.

Powerful political figures have expressed fears of similar magnitude. For example in the Wall Street Journal, US Senator Bernie Sanders published an article in which he asks, “How can we rush forward when leading scientists warn that AI poses an existential risk to the human race?” He announces in the article that he has “…introduced legislation, with Rep. Alexandria Ocasio-Cortez, to impose a federal moratorium on the construction of new AI data centers until strong national safeguards are in place.”

Lovecraft’s Fear of the Growth of Knowledge

Lovecraft is widely regarded as one of literary history’s most significant horror authors. Stephen King has called him “The 20th century’s greatest practitioner of the classic horror tale.” His work contains a bizarre and phantasmagorical pantheon of interrelated cosmic sci-fi/fantasy monsters. Cthulhu is the most famous one, and the Shoggoth is one of dozens that are more obscure.

His work is so unique and influential that it created an entire horror subgenre, known as “Lovecraftian horror” or “cosmic horror.” This subgenre focuses on fear of the cosmic danger and vastness of the unknown. I regard Lovecraft as an anti-Enlightenment figure, because most of his stories are about science uncovering horrible truths that should never have been discovered and cannot be unlearned. The unmistakable moral of Lovecraft’s writing is that the universe’s most profound knowledge should remain unknown.

The opening passage from The Call of Cthulhu(1928), probably Lovecraft’s most famous story, illustrates his anti-Enlightenment ethos well:

We live on a placid island of ignorance in the midst of black seas of infinity, and it was not meant that we should voyage far. The sciences, each straining in its own direction, have hitherto harmed us little; but some day the piecing together of dissociated knowledge will open up such terrifying vistas of reality, and of our frightful position therein, that we shall either go mad from the revelation or flee from the deadly light into the peace and safety of a new dark age.

These sentiments can be almost perfectly analogized to Yudkowsky’s fear of AI. Yudkowsky acknowledges that we live on a placid island of ignorance—hence the delta between our intelligence and that of superhuman AI, and our ignorance of how to control or withstand superintelligence. Yudkowsky presumably acknowledges that AI has hitherto harmed us little, but agrees with Lovecraft’s narrator that “some day the piecing together of dissociated knowledge will open up such terrifying vistas of reality, and of our frightful position therein” that cataclysm will strike. Therefore, Yudkowsky advocates anti-Enlightenment policies such as outlawing vast swaths of technological research and being willing to bomb datacenters—explicit calls to destroy knowledge and halt its growth. It is therefore apt, if hyperbolic, to note that Yudkowsky would have us “flee from the deadly light into the peace and safety of a new dark age.”

These same basic messages are present in almost every story Lovecraft wrote. In his 1936 novella At the Mountains of Madness, which contains the first appearance of the Shoggoth, he writes that, “It is absolutely necessary, for the peace and safety of mankind, that some of earth’s dark, dead corners and unplumbed depths be let alone; lest sleeping abnormalities wake to resurgent life, and blasphemously surviving nightmares squirm and splash out of their black lairs to newer and wider conquests.”

Illustration of a Shoggoth
Image source: ElioSoSavage Posted in
Shoggoth,Shoggoth/Gallery at https://lovecraft.fandom.com/wiki/Shoggoth/Gallery?file=Screenshot_20171022-085959.jpg

Lovecraft also wrote nonfiction in which he expressed precisely the luddite sentiments that you would expect from a thinker so focused on the horrible consequences of discovery and science. His 1933 essay “Some Repetitions on the Times” laments automation leading to mass unemployment, closely reflecting contemporary AI animosity.

“For several generations the man-displacing effect of the machine has been realised by a few, yet the momentary ability of new industries to absorb displaced labour was enough to blind nearly everyone to the consequences inevitable after the end of this plainly temporary absorption,” Lovecraft claims. He goes on that, “It is by this time virtually clear to everyone save self-blinded capitalists and politicians that the old relation of the individual to the needs of the community has utterly broken down under the impact of intensively productive machinery. Baldly stated—in a highly mechanised nation there is no longer enough work to be done, under any conceivable circumstances to require the services of the entire capable population if each individual is worked to his maximum (even an humane and rational maximum) capacity.” Invoking the frightful mentality present in his fiction, he concludes that the government must dispense with laissez-faire “political and economic orthodoxies, if the peril of an unfathomed revolutionary abyss is to be averted.”

Essentially these exact fears are presented as novel dangers of 21st century AI by powerful Republicans and Democrats. US Senator Josh Hawley has advocated for banning self-driving cars to protect the jobs of car and truck drivers. In addition to the existential fears expressed in Senator Sanders’s abovementioned Wall Street Journal article, he also declares that AI “kills jobs” and he has posted on X that, “Trump wants to deregulate AI and let the richest people on earth do whatever they want. Unacceptable. It will make the oligarchs richer while millions lose jobs and income.”

Nick Bostrom’s White Balls

The disastrous outcomes of mass death, destruction, and economic disruption predicted by AI critics are real possibilities. But they are not unique threats of artificial intelligence. Rather, they are examples of the danger of intelligence generally.

Long before the breakthroughs that put AI at the center of anti-technological rhetoric, people thought up countless possible destructive consequences of the growth of knowledge. Many feared that nuclear scientists would bring about technological Armageddon by creating a chain reaction that would destroy Earth. Throughout the cold war and subsequent war on terror, media and government institutions spread numerous fears about governments and terrorist groups causing mass destruction by creating chemical or biological weapons. There were several widespread hysterias throughout the 20th century that economic development would cause apocalyptic resource collapse before the end of the century. While most of these fears turned out to be unfounded, it was never impossible that they might come true.

By its very nature, the discovery of new knowledge can accomplish amazing things, for good or for ill. As science and technology continue to overturn the stones of reality, new possibilities will be revealed and old barriers to action will be outgrown. The consequences of these new discoveries can never be fully predictable in advance, because to predict them you would have to already possess the knowledge discovered, and all related knowledge. Therefore, there will always be a nonzero chance of mass destruction resulting from new knowledge.

The question is: Is intelligence worth the risk?

Nick Bostrom, University of Oxford philosopher and founder of the Future of Humanity Institute, embarks on a frightening exploration of this question in his 2019 paper “The Vulnerable World Hypothesis.” In it, he offers an analogy called “the urn of creativity.”

One way of looking at human creativity is as a process of pulling balls out of a giant urn. The balls represent possible ideas, discoveries, technological inventions. Over the course of history, we have extracted a great many balls–mostly white (beneficial) but also various shades of gray (moderately harmful ones and mixed blessings). The cumulative effect on the human condition has so far been overwhelmingly positive, and may be much better still in the future…

What we haven’t extracted, so far, is a black ball: a technology that invariably or by default destroys the civilization that invents it.

Such black balls may include the genocidal AI of Yudkowsky’s nightmares, the cosmic horrors awakened in Lovecraft’s phantasmagorical visions, or any number of other yet-unimagined catastrophes.

The longer we keep pulling new balls out of the urn, Bostrom argues, the more likely we are to eventually stumble upon a black ball, ending the human project forever.

But while Yudkowsky, Lovecraft, Hawley, and Sanders all share this fear of the growth of knowledge, there is another perspective—an Enlightenment perspective—which contradicts them. Defenders of the core principles of the Enlightenment hold that, for generalizable reasons, the costs of scientific and technological advancement are well worth the benefits.

Intelligence Is a Virtue, Whether Organic or Artificial

The renowned University of Oxford physicist David Deutsch argues that the urn analogy only captures one side of the coin of the effects of knowledge on existential risk.

In his book The Beginning of Infinity, Deutsch explains that knowledge, rather than merely being dangerous, is what allows humans to survive their ever-changing environment. He refutes the “Spaceship Earth” conception that many tacitly hold, according to which Earth’s natural environment is a life support system: hospitable by default, unlike outer space or an Earth drastically altered by anthropogenic change.

“…I am writing this in Oxford, England, where winter nights are… often cold enough to kill any human unprotected by clothing and other technology,” Deutsch writes. “So, while intergalactic space would kill me in a matter of seconds, Oxfordshire in its primeval state might do it in a matter of hours – which can be considered ‘life support’ only in the most contrived sense.”

He explains that, “There is a life-support system in Oxfordshire today, but it was not provided by the biosphere. It has been built by humans. It consists of clothes, houses, farms, hospitals, an electrical grid, a sewage system and so on.”

So how did people and other animals survive for so long without modern technology? Generally, they didn’t. As recently as 1900, and for all of history before that, human life expectancy was around half what it is today. Humans were constantly dying of famine, disease, and other ailments that could have been solved by the right knowledge. Other species almost all got wiped out entirely. It is estimated that over 99 percent of species that ever existed on Earth are now extinct.

But modern technology has only just scratched the surface of solving all the deadly problems that are likely to befall humanity. Like the people of Oxfordshire need clothing and other technologies to survive today, humanity will soon die unless it gains new scientific and technological knowledge to protect against exogenous threats such as asteroids, supernova explosions, the expansion of the sun, and countless others, most of which have probably not yet been discovered. To maximize its chances in the arms race against an ever-changing environment, humanity must constantly expand its horizons of research and discovery into the infinite unknown.

In an interview with Dwarkesh Patel, Deutsch explains the implications of this circumstance with respect to Bostrom’s urn of discovery: “Nick Bostrom’s jar with white balls, and there’s one black ball, and you take out a white ball, and white ball, and white ball, and then you hit the black ball and that’s the end of you. I don’t think it’s like that, because every white ball you take out and have reduces the number of black balls in the jar.”

When an asteroid caused the Cretaceous-Paleogene extinction 66 million years ago, wiping out about 76 percent of all species on the planet at the time, those species effectively hit the inverse of a “black ball”—they needed asteroid defense technology, which humans have recently developed, but they didn’t have it. And similar stories could be told about all the other mass extinction events in Earth’s history, and the future mass extinctions that are bound to come if humans don’t advance technology fast enough.

While increasing intelligence, artificial or otherwise, poses serious threats to humanity, stagnating or declining intelligence is an even surer death knell.

AI of the sort powerful enough to wipe out humans is likely also a panacea for discovering and preventing virtually infinite other existential threats, biological, cosmic, and otherwise.

While existential risks create especially salient examples of the possible upsides and downsides of intelligence, the same logic applies to morally virtuous action generally. If there are moral truths to be discovered and known, general intelligence should be able to know them no matter what substrate it exists on. Knowledge is knowledge, whether encoded in brain chemicals or silicon chips.

As Deutsch argues in an interview with Sam Harris, “…the problem of AIs is the problem of humans. …humans are dangerous, and to that extent AIs are also dangerous, but the idea that AIs are somehow more dangerous than humans is racist.”

I think Deutsch’s racism charge is lobbed somewhat jokingly, but it also points to a deep similarity between bias against the agency of foreign peoples and that of mysterious artificial intelligences. Lovecraft has been widely accused of racism for his fearful treatment of foreign cultures and peoples, which seems of a piece with his general fear and distrust of the unknown. There is no reason to assume that perceptions of AI entities would not sometimes be shaded by the same underlying prejudices, which have their utility as protection against unknown threats but which can also lead people to dark and destructive attitudes and behaviors.

If people should be pessimistic about the consequences of artificial intelligence, they should also be pessimistic about the consequences of intelligence generally. Conversely, if optimism is warranted about human agency, which is fundamentally a matter of human intelligence, then optimism about artificial intelligence is warranted also.

TechCrunch | Science & Technology

This AI Weather Startup Out-Forecasts Government Agencies

“A new AI weather forecasting tool released today by the startup WindBorne Systems offers more frequent and accurate predictions on key variables than the world-leading system developed by European governments, thanks to advancements in how sensor readings are fed into deep learning models.

Founded by a group of Stanford students in 2019, WindBorne began by building a better weather balloon, with the idea of selling weather data. But with the arrival of the weather-forecasting deep learning models in 2022, the team realized they could capture more value by building their own model as well.

Today marks the release of the sixth version of that model, WeatherMesh, which the company says is more accurate than traditional and AI forecasts produced by the European Centre for Medium-Range Weather Forecasts (ECMWF), the European intergovernmental organization seen by meteorologists as the leading provider of accurate weather prediction.

One simple way to understand it, WindBorne’s chief product officer Kai Marshland says, is that WeatherMesh-6 ‘is as accurate five days out as a traditional forecast is the day before,’ particularly on surface temperature measurements.

WeatherMesh-6 produces a forecast every hour, as opposed to every six hours, as traditional models do. Its resolution is now down to 3 km in Europe and the continental U.S., where the quality of data is highest.”

From TechCrunch.

Interesting Engineering | Science & Technology

Humanoid Robots Process 250K Packages Without Failure

“US robotics company Figure AI has completed a 200-hour autonomous livestream using its Figure 03 robots.

During the run, the robots processed nearly 250,000 packages without experiencing a single hardware failure.

The firm’s CEO, Brett Adcock, said the milestone run began as a response to an 8-hour endurance challenge issued by industrial automation veteran Dr. Scott Walter.

On May 14, Figure had said its humanoid robots surpassed 24 hours of continuous autonomous work, extending an originally planned eight-hour test.”

From Interesting Engineering.