Preventing Extinction from Superintelligence
A short explanation for why I believe building Superintelligence now would lead to human extinction, and what to do to prevent it.
Lately, I have been asked a bunch of questions about ASI. Here they are.
Quick Explainer
Do you have a quick explainer (less than 10 lines) for why you believe that AI development yields risks of extinction?
There are three main reasons.
1. Superintelligence (ASI) is incredibly powerful.
ASI is genuinely incredible. As in, it’s not credible. It’s hard to believe in. It is unlike anything that we have seen so far.
We are used to certain things: You can buy food without having to grow it yourself. Travelling from London to Tokyo takes half a day. Science is hard. Bodies rot and people die. We exist. Earth exists.
It is hard to think about a world with completely different assumptions. Science Fiction does not cut it: all fiction must at the very least feature human-like entities making meaningful humane choices.
But the real world is not like fiction: crazy things can happen. We discovered nukes, bioengineering, and worldwide instant communication. With more intelligence and faster science, what will be discovered?
In my experience, this is emotionally the hardest pill to swallow.
Intelligence.
When we think of the heights of intelligence, we think of geniuses, or possibly the collective intelligence of humanity.
But we should not think about people. We should think about machines instead, and consider all the ways in which digital intelligence is stronger than human intelligence:
Computers do not stop.
We have good and bad days. We get tired. We need to sleep. We have trouble focusing.
Computers are relentless and do not stop.
Once we automate some mental process on computers, we can perform it much faster than on humans.
Calculators compute arithmetic operations much faster than any human.
LLMs read books much faster than any human.
Software deployed on computer can be rapidly and vastly scaled up.
Once ChatGPT was built, it was cloned, scaled and deployed immediately to hundreds of millions of people in less than three months.
Once we build an AI smart enough to clone itself the same way OpenAI cloned ChatGPT, it will be able to clone itself at the scale where it can interact with hundreds of millions of people at the very least.
Combined with the previous point, that AI will clone itself to that scale in much less than three months.
Computers have much higher bandwidth than any human communication.
Humans convey information very slowly. We need to find the correct words, write them down in articles and books, publish them, and make them more accessible over time. For advanced science, it can take more than a decade to go from a scientist coming up with a new idea to students learning about it at school.
Computers can just dump gigabits of information to each other in seconds. This would be akin to a person sharing their entire lifetime of reading in an instant.
Combined with the previous point, it means that whenever an AI learns something new, it can share it instantly with all of its other instances.
Deep Learning is alien. It is completely different from human learning.
AIs have learnt how to replicate human voice, solve most text problems, and now generate videos, without ever being explicitly taught how to do so. They just grow by being fed more data. They never ask questions and they don’t need any explanation.
They generate voice, text and videos, without using any external tools. They do so without us building much structure beyond multiplying big tables of numbers.
They generate voice, text and videos without having using our theoretical understanding of acoustics, phonetics, grammar, lights, cameras, etc.
This is an alien way of learning, that we intuitively and scientifically do not understand.
Software is easier to modify and improve than brain patterns.
It is much easier for an AI to change its source code than for us to change our brain configuration.
It is much easier for an AI to fine tune itself on more data, than for a human to learn knowledge from books.
It is much easier for an AI to quickly skim large amounts of data and remember it (indexing / embedding) than for us to do so.
The above largely applies to current AIs.
Superintelligence is much scarier. At the very least, we have to envision all of the above applying to AIs smarter than humans, that can improve themselves.
Imagine a species of a million aliens, who can all clone themselves, who communicate with each other instantly, who are all much smarter than humans, who are all relentless
Technology: Product of Intelligence
There are massive returns to intelligence. As humanity became smarter and built science, it achieved massive technological and economic progress.
With an AI system that is smarter than the entirety of humanity, and that can be scaled to a swarm of tightly interconnected clones, the potential for technological progress becomes extremely hard to grasp. Who can guess what happens when 10000 years of collective human research get performed by an AI in 1 week?
The only sure limits on the technology that can be developed by such an AI system are the limits of physics.
I expect, at the very least: 1000x bigger energy production, 20x faster transportation on Earth, space travel, molecular robotics, an efficient theory of learning, mind-reading, a grasp of psychology and psychiatry strong enough to cheaply rewire people’s beliefs and preferences, a way to murder people at scale for less than 1$/person.
Fundamentally, we are not talking about 10% GDP yearly growth or even 100%. We are talking about GDP not making sense anymore, what is usually called a Singularity. A system constantly making itself smarter, exploding in intelligence, reshaping galaxies in the process.
It all comes down to your shock level: how much can you think about extreme levels of technology with a straight face.
If you think of ASI as chatbots, then of course, you expect humans will live.
If you think of ASI as cyborgs, terminator style, then of course, you expect we could organise some resistance.
If you think of ASI as titans, in the vein of greek mythology, then of course, we will live like ants, pets or zoo animals.
If you think of ASI as capturing all of the energy from the sun, and reshaping our environment at a molecular level, then… without light, or earth, of course humanity gets not killed, but erased.
2. We are not on track to controlling ASI, and we are racing toward it.
We were already surprised by the capabilities of GPT3. We will only get more surprised by the capabilities of ASI.
We already fail to reliably detect when current AIs are telling the truth or hallucinating, we need to double check their output instead. This only gets worse with ASI.
No one understands the internals of current AIs: we do not understand how they learn what they learn, nor why they make the choices they make. This only gets worse with ASI.
We are already suffering problems from AIs, such as deepfakes, or people cheating on tests with ChatGPT. This only gets worse with ASI, when AIs misuse themselves at scale.
We are racing to build more powerful AIs, rather than incrementally improving AIs once we have established specific AI systems as safe. This only gets worse with ASI, when it is building its more powerful successors itself.
In summary, we get surprised by how powerful AIs get, we can’t tell when they’re telling the truth, we don’t understand how they work, they fail without us noticing, and we are already facing problems from its acceleration. Yet, we are racing toward more and more powerful systems, faster and faster.
3. There are no do-overs for ASI.
We do not get the chance to birth a world-reshaping entity twice.
Once we build ASI, we have either made sure we can control it, or we are fucked, that’s it. If anything goes wrong, we can not try again: the consequence is “We lost the universe to an alien mind”.
As it keeps expanding and gathering more resources, we get erased, not out of ill will, just out of indifference. Matter and energy are just very useful.
When we build a new tower, we do not annihilate the ant colonies on the construction site out of spite, we simply do not care. Space is very useful.
As ASI becomes more powerful, except if we enforce that it is controllable, what it does with its technology will just seem completely crazy for a small amount of time, before we all get annihilated without understanding what happened.
Conclusion
Quick Summary
1. ASI is crazily powerful; 2. We can’t control it yet; 3. We get no do-over.
→ If we build it now, we die.
Common Counter Arguments
Different people deny this conclusion for different reasons:
Variations of “ASI can not be this intelligent”, “Intelligence can not yield that much technology” and “Technology can not be that powerful”.
I believe a lot of popularisation is needed here, as this is genuinely hard to grasp. If anyone can show me a very short video (<30 secondss), in an infographics style, that illustrates one of the points made above, I would likely commission a longer video.
“We have actually aligned our current AI systems with RLHF, RLAIF, constitutional AI and instruction-tuning. They already care about human values”
Our AIs still fail in poorly understood and unexpected ways.
The more you go “out of distribution”, the harsher they fail.
The more powerful they become, the harder it is to predict and make sense of the failures.
ASI is pretty powerful, pretty “out of distribution”, and people are rushing to put humans out of the control loop. I expect very harsh failures, with very harsh consequences.
“We can easily control AI systems, it would be trivially easy to air-gap them, prevent them from accessing tools, monitor them for strange behavior and just do a lot of interpretability on current systems before moving on to bigger ones”
This is true in theory. I think we could work on current AI systems without it leading to uncontrolled ASI.
In practice, companies are racing toward ASI roughly as quickly as they can, and to give current systems access to as many people and tools as they can.
This is usually why I point at MAGIC.
“There are do-overs. AIs improve incrementally, and so we will get ‘warning shots’ before building ASI.”
The pace of progress in AI is accelerating. It is improving less and less incrementally.
Even if we get a ‘warning shot’, if AI progresses so quickly that we don’t have time to get our shit together in time before ASI, it doesn’t matter.
‘Warning shots’ are risky. There is a fine line between 'Disaster big enough to make people care about the problem and take historically unprecedented measures to deal with it’ and ‘Disaster that wipe out humanity’.
‘Warning shots’ are destabilising. The time to make good collective decisions is not when everyone panics because of a disaster!
“Solving this problem is so hard that it seems impossible. Shouldn’t we just give up and die?”
No.
Misc Questions
What is your view on how we should best deal with the risks posed by AI?
From my point of view, in order to survive, we need to:
Completely halt all the research, development, deployment and proliferation of AI systems that are general, autonomous, super-human, self-aware, and self-replicating.
Ramp up investments in superintelligence controllability, to pave a way forward.
Build much stronger institutions, with the tools of the 21st century rather than the 18th, to deal with the above points.
We are basically in a race against the clock, where we need to solve controllability of superintelligence before anyone builds it. We solve it by buying more time, and by solving controllability with this added time.
Now, if you ask me about incremental policies that start with our current world and get us to such a place, that's a much bigger question! Happy to engage with that one too!
AI is a (very large) subset of the general phenomenon that it becomes ever easier and cheaper for a small group to harm large fractions of the population as technology improves. This poses large collective action problems. Do you have any views on how we should solve those?
Note that the risk does not only come from small groups, like lone hackers or terrorists. For instance, what happens if a nation-state discovers a weapon that can remotely kill anyone on Earth without a trace? Or get-access-to-literally-any-computer-system technology? Or thought-reading technology?
In general, we could draw a worrying graph, of "bomb radius over time", "budget needed to kill 1_000_000_000 people over time", etc.
But the more worrying one is "technological growth over time" against "ability to coordinate over time". The latter progressed much too slowly compared to the former, leading to bad unbalances.
I have been lately asking around "What do you think would happen if nukes were not discovered in 1945, but 1 year ago?". The main answer I get is "Before reaching an international agreement, we would face much more worrying proliferation than we ever did in the past. Everything travels much faster, and many more countries have the actual state capacity needed for such a programme."
On this specific point, we have regressed on our ability to coordinate and steer our future.
—
As you might have guessed from the above, I started caring about extinction risks long before I thought of AGI, heard of LessWrong or Effective Altruism.
Before moving to AI (in 2020, after seeing GPT3), my main goal was to improve coordination at scale. It is in this context that I learnt a bit about mechanism design, public choice theory, voting theory, and game theory.
"Coordination" means many things to me. I am not sure how deep you'd like me to go, but just to go over a few:
Pareto Movement. Building a political programme that brands itself as focusing on Pareto improvements, rather than identity politics and resources redistribution.
Accelerating science. Science is extremely slow. I worked at the SOTA of programming language theory, blockchain, and now AI. I found it tractable to reliably go faster than Science, and would want to do so extensively.
Mental healing at scale. The world is a very traumatising place. Our default coping mechanisms make us very un-ambitious. I believe there is an arbitrage in massively scalable therapy, that makes people more happy, more productive and more prone to coordination.
Better software. Shit software makes everything worse. Good software makes everything better. I am not sure how much you care about software architecture and engineering, but we can go deep in there.
(One could envision "Mental Healing", "Accelerating Science" and "Pareto Movement" as another type of "Better software", just programs that runs on increasingly bigger groups of people instead of computers.)
If you have some other area that you'd like to ask me about, feel free, as you an see, I like writing about this topic.
How important is human brain enhancement/uploading to the above in your view
I think uploading is hard to do right, and depending on the day, that brain enhancement has slightly better odds.
With these technologies, my main expectation is we all die, our institutions are not in any way strong enough to resist sci-fi / singularity-accelerating tech.
My “reasonable worst” expectation is Black-Mirror-On-Steroids (also known as S-risk).
My “unreasonable best” expectation is utilitarian utopia.
In general, we do not realise how lucky we are to exist, and how even luckier we are to not suffer more than we do. It is truly not a given.
One should remember that the Universe is cold and uncaring, and that the world is not fair.
If you dropped Earth at a random place in the universe, we’d all just die.
If you happened to be born as the wrong person, at the wrong time, in the wrong place, you could just be condemned to a long life of meaningless suffering.
People die constantly, and it’s a tragedy.
I think that some people put a lot of hope in salvation through technology. In my opinion, techno-optimism is often just cope for disliking coordination with other people. Coordination involves many icky things like doing politics, engaging with the out-group, compromising on issues where you believe you are right, not resolving disagreements, having public debates, regularly admitting that you are wrong and publicly changing your mind, and punishing the bad people in your group.
But let's say we assume that we do none of these icky coordination things, and instead focus only on technology, like brain enhancement. Then I expect we all die, and that if we are quite unlucky, we (or our descendants) might just end up tortured instead of dead.
In general, a stable "good world" - as in, a world that doesn't suck - requires much more coordination work than what we are currently putting in. If we put in that work, then, human brain enhancements could be net positive, rather than just singularity-accelerating.
As things are, aiming for a full-on utilitarian utopia is dangerous. It puts the emphasis on high-tech instead of coordination-tech, when the latter is what's bottlenecking us for now. Let’s not get distracted. A mere "good world" is already plenty hard enough.
All points excellent: indeed, unimpeachable. Techno-optimism as a cope for disliking coordination is especially brilliant. I have updated my model of this phenomenon accordingly.
"Mental healing at scale. The world is a very traumatising place. Our default coping mechanisms make us very un-ambitious. I believe there is an arbitrage in massively scalable therapy, that makes people more happy, more productive and more prone to coordination."
I would like to hear more about this. Why do you believe this? What would this therapy look like? How could it scale when a lot of skilled-therapists struggle to transfer their same tacit-knowledge to more than a couple of people at a time? What's the cheapest/fastest possible test of your idea?