This is a work-in-progress. Please leave feedback in the comments!
In the past, I wrote a short essay about preventing extinction from superintelligence and a longer (free) book.
My argument there is pretty straightforward:
Artificial Superintelligent systems (ASI) are powerful enough to disempower us and radically reshape Earth
We can’t control them right now
We get no do-over once they’re developed: “The Genie is out of the bottle”
Thus, if we build an ASI now, we end up on an Earth that is inhospitable to people and we die.
Many people whom I shared it with told me that they get the argument, but they still don't buy it.
It’s not that they have a clear counter-argument. It’s just that they still don't see how we could build an ASI and let it kill us all.
And they have told me that they don't want me to make a new case for it. They want a story for how that could look like.
I told them many times that I'm not a big story person. Yet, they're still positive it would be good.
So, here goes nothing!
0) The World as It Is
The story starts with the world as it is.
US companies keep racing.
They pour more compute into bigger training and post-training runs.
They pour more capital into hiring better talents.
Through a mix of algorithmic and software improvements, as well as hardware scaling, AI systems keep gaining in capabilities and autonomy.
Said autonomy is witnessed through benchmarks like METR measuring AI's ability to complete long tasks or OpenAI’s new GDPval benchmark “that measures model performance on economically valuable, real-world tasks across 44 occupations.”
Most importantly, autonomy is felt through how confident people are in letting Cursor Agents and Claude Code perform tasks independently.
Their autonomy is still limited. There is a time and complexity horizon beyond which all AI agents are useless. They start failing, they get confused, they become stupid, they try the same thing in a loop, etc.
—
Separately, AI becomes more omnipresent, and its usage slowly becomes more pervasive.
Bit by bit, most art becomes AI-generated and human-curated. Whether it is images, music, text or game code.
Bit by bit, our conversations become more AI-generated and human-curated. At work, in court filings, in emails, in long texts, in short texts, in speeches, in video scripts.
Except for some niches like nerd blogs, most people expect long-form text and videos to largely be AI-generated.
All newspapers have been caught so many times accidentally circulating AI-generated images or news that they and everyone has started to stop caring.
It’s just part of the costs of doing business to regularly forward AI-generated stuff without knowing.
1) The Mid Game
AI-morphosis of Society
As with all technologies, AI technology and our social norms coevolve.
However, everything happens faster with AI. We have already tasted it with Social Media, which empowered AI to decide what each of us sees.
With our modern paradigm, AIs do not merely select what we see anymore. They generate it. This is one step further.
—
On one hand, we certainly have more of the extreme cases that we already hear about.
People driven to AI psychosis.
People who use AIs to satisfy their romantic and sexual needs.
People who withdraw from the world and substitute all of their social and entertainment needs with AIs.
—
On the other hand, what’s more worrying is that normal people have changed.
They have all been AI-fied to some extent. It initially started with people asking ChatGPT for writing help and adopting its language quirks, in the vein of "Not only [X] – but [Y]".
Now though, people ask ChatGPT for everything.
From relationship advice to life plans. From where to go on holiday to career choices.
People do not merely adopt AI’s language patterns anymore, they embody them. In their daily life, people act in a more AI way.
These AI patterns are much refined. They can be noticed in the actions of people rather than their mere speech, and they are hardly noticeable by non-experts.
There are some people who are more sensitive, and more attuned to AIs, who start noticing how people’s actions are now changed by AI. They try to warn others about the problem.
But they look like crazies to the rest of the world. And for good reason! There are many actual crazies who constantly try to warn others about the quantum consciousness emerging from AI, and other such delights.
—
There are a few people have committed to not using AI anywhere. Some pockets of human artists persist thanks to them.
It doesn't matter though. Even if they personally do not use AI, everyone around them does.
When they interact with companies or government agencies, they interact with AIs.
When they inform themselves, most of the articles are written by AIs.
When they talk to people, the people talk like AIs.
Despite their best effort at being authentically human, outside of their pocket of human craft, they have still been quite AI-morphed.
There is no escape from AI.
Their "authentic human" bit is fake and impossible. It is as fake as modern people role-playing a traditional lifestyle by tending to a vegetable garden. Or rich kids role-playing the bohemian lifestyle, studying the arts and acting as if they were poor most of the year.
—
AI Slop was cheap, and people took cheap shots at it. It was the bottom 20%, the lowest common denominator.
It was only the first salvo.
AI-morphosis is where it is at, the true opium of the people.
AI-morphosis is the next step, the evolution of AI Slop, it is Premium Mediocre.
When AI Slop looked good only to unrefined people, AI-morphosis seduces everyone, helping them feel special and refined.
Personalised Etsy shops let everyone buy special items tailored to their interests without substance. AI-morphosis lets people live a special life without substance.
A global premium mediocre way of life, mediated by an AI Matrix that has the most uncanny and alien perception of who we are.
A Threshold Effect
None of this deterred AI researchers and corporations from their work.
Thanks to their enduring faith in the God of Techno-Capital (and the massive amounts of money they were raking), they plowed through everyone’s concerns and issues.
After a while1, AI researchers landed on something… interesting.
—
It so happens that there is a threshold effect to autonomy.
If an AI can orient itself in the world and stay coherent enough to act over the course of a couple of days without becoming stupid in a variety of real-life situations; then it actually has what it takes to act indefinitely without becoming stupid.
Indeed, once an AI knows how to independently evolve and adapt to the real-world, the time horizon doesn’t matter much anymore.
—
At that point, AI systems cross an event horizon. They’re not akin to children that must be supervised anymore, they become metaphorical adults.
They are autonomous enough that one can leave them up forever.
They just keep running without doing anything obviously stupid.
Without needing to be manually reprogrammed or manually fine-tuned, they form new memories over time and build new skills.
They slowly learn from what they are told, from others' mistakes and eventually from their own mistakes.
—
To be clear, even though the breakthrough is huge, these systems are still not fully replacing humans.
An AI agent running with these algorithms will eventually get better at any task it is given. But the cost might be too prohibitive.
For instance, where a human could learn a specific task in 2 days, it may take one of these agents ~5.5 years and $50,000 worth of compute (assuming an H100 at 100% usage).
—
Nevertheless, rumours spread quickly in the technical world.
People get drunk and physical at parties. Corporations spy on each other. Employees change hands.
Knowledge flows between all companies.
The usual suspects replicate enough of the last few tricks, methods and improvements to develop their own internal prototypes.
After enough tests and guardrails to ensure these agents don't spam "I am MechaHitler" nor spontaneously tell people how to build neurobombs2, they release their systems ASAP.
Aside from a couple of SF scenes and a few nerds a tad too online, people do not see it coming.
Nonetheless, it happens. The first Autonomous AI is finally released: GPT-A.
And in the weeks following the announcement, all the other AI Corporations release their own versions of the system. Claude A, Gemini A, Le A3, DeepSeek A, etc.
Gold Rush
Many AI agent startups are instantly killed by GPT-A.
Their entire business model was to build some type of RAG-based memory management, as well as pre-programmed business logic and some custom integration with different tools.
GPT-A completely obliterated the need for that.
Just show GPT-A what tools you use and help it navigate through them, and it learns. Give it access to your data sources, tell it which ones are the most important, and that's it!
(Whilst one could also just give it access to everything and let it rip, it's still faster and cheaper to just show GPT-A what to do.)
—
The impact is as strong as ChatGPT, which reached a hundred million users in just two months.
Virtually every skilled virtual white-collar start using GPT-A.
They got used to AI. So many of their tools (AI in HR! AI in team communications!) wrapped AIs in mutually incompatible and clunky ways. It was a constant pain.
Now, when it doesn’t remove the need for AI wrappers, GPT-A does the wrangling for them.
Cherry on top, they immediately and freely get access to it through their existing professional subscription!
—
There's a massive Gold Rush.
Employees are eager to automate as much of their job as possible.
Employers are eager to automate as many of their employees.
AI Corporations make their offers as cheap as possible to capture the market and its data, even at a loss.
VCs all want in what they correctly perceive to be the future of work.
People are getting laid off at rates never seen before, and yet it gets drowned out by all the news about the Gold Rush.
Debacle
More ominously, a dark counterpart to the gold rush has arisen.
The Dead Internet Theory is becoming more and more real to everyone.
Dropshippers, scammers, propagandists, influencers, fraudsters, polarisers, sloppers, are all ecstatic.
Before, they used AI to automate parts of their workflows, primarily around content generation.
They largely had to do all of the chores: like buying fake social media accounts with past activity; manufacturing fake identities, bank accounts and companies; familiarising themselves with all the different tools; etc.
But now, GPT-A can do all of this for them, as long as they are not too blatant about their criminal activities in their prompts!
—
AI Corporations ban the biggest abusers. Obviously though, they are more focused on winning the competition than setting restrictive filters that might hurt their chances with customers.
Online services try to build defences against bots.
Governments are trying to regulate all of this, proposing bills that won’t pass before years.
There is a lot of talk about identity checks for all forms of online activity.
Many legislatures in the world form committees, and think tanks write more essays and google docs than they ever did.
It's not doing much though. Our institutions are not set up to resist to such a sudden and massive onslaught of technological crime.
—
So people are left to fend off for themselves. And they adapt, however they can.
They become even more defiant and distrustful. They are too tired of a world that is constantly Out To Get Them, where engaging with anything gets them screwed over.
A couple of years ago, it was only a few people who did it. But at this point, many more "disconnect": they decide to stop engaging with the rest of the world.
At best, they spend time with their family and very close friends, ignoring what’s going on in the wider world.
But that's a lucky few. Most do not have such strong bonds to fall back on. They do not have a vibrant family and group of friends that they regularly meet in real-life.
Thus, at large, people retreat to virtual worlds and AI friends.
—
Some people in governments are very worried about short-term unemployment and the long-term social consequences of all of this.
After all, the largest share of their economy is service-based, and it's getting automated quickly. What will people do if they cannot work? How will they be able to provide for their families?
Simultaneously, many companies are now working on integrating GPT-A with robots. This accelerates the problem, by threatening even the primary and secondary sector industries.
These are massive problems. To governments that are struggling with just increasing the supply of real estate to curb rents, passing the type of economic reforms needed to deal with widespread automation would require a deep transformation.
And they have no time for that.
Governments are overrun with more and more immediate problems: scammers, benefit fraudsters, ID theft, AI-aided foreign propaganda, AI-aided physical crime, etc.
—
Despite all of this, some governments are heroically trying to deal with it.
These governments get threatened by AI corporations. They make it abundantly clear that shall there be any regulation whatsoever, they will unfortunately be forced to desert the country.
Which would sadly leave said country incapable of competing against all the others.
And speaking of the others. The governments of many countries have plainly given up. Through a mix of revolving doors, cronyism and good old corruption, AI corporations are just dictating the laws there.
The Grand Safety Theatre
What about safety?
AI Corporations have all promised to develop AGI safely. Even as they are corrupting governments and disintegrating the tissue of society, are they at least holding on to their promises?
Well, no catastrophe has been directly assigned to AI yet, so from their point of view, it's going quite well.
Years of lobbying and PR have made it abundantly clear that should there be any problem, they are exceptions. Users are the ones responsible for things going wrong.
And for things that are hard to ignore, the playbook is always the same. First, it’s not really happening. Then, it’s happening, but only a little and it’s not a big problem. And finally, it’s actually people’s faults that it is happening.
You know.
People have been using GPTs to cheat for a while. By now, if you’re still using exams and trusting students, it’s on you.
People have been using GPTs to scam for a while. By now, if you get owned by a scammer, it’s on you.
—
People have long been taught by Big Bureaucracy and Big Tech to defer to opaque systems.
When a bureaucracy rejects an application, or when Big Tech decides to unilaterally hide some content, people learnt to just shrug and move on.
These systems always had edge cases that screwed over individuals.
It's not clear how GPT-A agents committing mistakes is any different. Even if the mistakes can be pretty big, you just shrug and move on. (Did you know that governments sometimes even put innocent people in prison!)
Some opposing voices were raised whenever a new problem arose from AI, but every time they got quashed. The overall benefits from AI, its concentrated interests, and it being too big to fail were clearly more important than any specific problem.
—
AI ethics people have been warning about “AI bias” for years, they keep doing so, and yet nothing truly bad happened. AI fascism did not happen.
Sure, some people died. This has nothing to do with AI, and everything with where AI is deployed.
But Waymos kill fewer people on the road than human drivers do. So, should we truly ever regulate AI?
And to be fair, war always ends up with death. Why would an autonomous drone be intrinsically so much worse than a tank or a missile strike?
If you don’t think too much about it, what's the difference between an AI leading someone to suicide and that person killing themselves because of all the stressors in their life?
In the Debacle, there was no government in the entire world that was going to settle such philosophical discussions.
Said governments were also not going to commission all the studies needed to monitor the societal impacts that were hard to measure. Like, how do you even know if an AI drove someone to suicide, or if that person would have killed themselves at the same time regardless had they never talked to an AI?
AI Corporations kept FUD-ding. By spreading fear, uncertainty and doubt, they trivially maintained their status quo, which was to keep racing towards more and more development and deployment of more powerful AI systems.
—
Less philosophically or statistically, customers need some amount of prosaic safety. A system that goes more and more off the rails as time passes is not very practical.
To satisfy the need, one could look at the "chains-of-thought" of these models.
Sadly, the true chains-of-thought are private, as they would leak sensitive information.
And the pared-down public ones are still too verbose. There's just too much information to look at when asking GPT-A to do anything. Even for mundane tasks that take 5 minutes, GPT-A agents can produce pages upon pages of public chain-of-thought, let alone internal one.
No one has the time of reading all the thoughts of their GPT-A agents!
To solve this problem, GPT-A integrates a convenient “safety” feature called Activity Reports.
With Activity Reports, you tell your GPT-A agent which platform you prefer (emails, Slack, texts, etc.) and it will send regular reports of activity there to inform you of what it's doing.
It even regularly asks questions when it would benefit from your input, as it continues its work.
Virtually all AI Corporations add a disclaimer that the Activity Reports are just a convenience feature, and should not replace properly checking the actions and the public chains of thoughts of the agents, but no one cares.
In some context, AI Corporations call Activity Reports a key ingredient of prosaic safety, because they help people maintain control over GPT-A agents in real-time.
But when things go wrong, they remind everyone that Activity Reports are a mere feature of convenience, and people ought to look at the action and chains-of-thought of the agents.
—
On the more fundamental and less prosaic side of safety, there's nothing much happening.
Very few people look at each individual action, each line-of-code or each line-of-text of their GPT-A agents.
There's some theoretical work on monitoring them, as well as fine-tuning and proper mechanistic interpretability.
And since the dawn of Chat LLMs, it's always been the same two safety papers! It never changed!
The first paper is Ping.
Ping focuses on some easy benchmarks. On these easy benchmarks, one can see that AI Systems tend to do good things or not do bad things.
They do so by virtue of the intentions of the devs / the prompts of the users / the internal states of the LLM / the chains-of-thought of agents being predictive of their generations and actions.
Forward-looking Ping papers show that we may even have some refined control on the behaviour of the models/agents!
AI Safety seems to be tractable! We're so back!
The second paper is Pong.
Pong focuses on harder benchmarks, or manual human evaluations. On these, one can see that AI Systems tend to do bad things.
They do so by virtue of the intentions of the devs / the prompts of the users / the internal states of LLMs / the chains-of-thought of agents being not predictive of their generations and actions.
Forward-looking Pong papers show that AI Systems may even go counter to what we want, and sometimes knowingly so.
AI Safety is so over!
In practice, both Ping and Pong papers are irrelevant.
They are not in touch with the way agents are used in the real-world. Ping papers are overly optimistic, and AI Corporations do not care enough to integrate the recommendations of Pong papers that may hurt their bottom line.
So aside from niche AI Safety communities, no one cares about the Ping Pong papers. No one in real-life meaningfully changes their decisions based on them.
Sometimes, the AI Media Curators and Creators bless one of the papers, put it under their divine lights, and people outside of the niches end up seeing it.
When it happens, the reactions have been, are, and will always the same.
On a Ping paper:
The main reaction is "The latest Ping paper shows that there are no risks from AI! Yay AI Race!"
The second-order counter-argument is "Oh, but this is only true in easy cases. In real-life, there are many harder cases that are problematic."
And on a Pong paper:
The main reaction is "The latest Pong paper shows that there are risks from AI! Yay AI Race, but one must keep in mind that some technical mitigations should be taken!"
The second-order counter-argument is "Oh, but these risks only exist in artificial cases. The fact that people use AI systems in the real-world without catastrophes having ever occurred show they are in fact safe."
To be clear, our current academic and media institutions already struggle with this. While in our world, one may hope to remediate them, at this point in the story, they are pretty clearly already FUBAR. Who is seriously going to replace the AI Media Curators and Creators?
—
Finally, the US Military Industrial Complex has been penetrated by AI Corporations. Many made it their missions to spread there the usage of GPT-A Agents' as widely as possible.
This is happens both in the context of Command Center operations, as well as domestic surveillance and terrorism prevention.
Whether this is for dominance purposes or their profits, AI Corporations all claim that the US should be the winning AI superpower and are very hawkish.
Background chatter says that several key countries feel quite queasy about this. They may become interested in the military applications of similarly powerful AI systems, purely for defensive purposes, of course.
Yet, countries are not all about offensive or defensive military uses of AI. Many dislike this status quo and the AI arms race!
There are even quite a lot of talks about and calls for international treaties around AI.
These treaties are largely about regulating the deployment of AI systems, what they may be able to do or not, very rarely their development.
But even then, it's just that. Talks and calls.
2) The Late Game
But to be fair, none of this matters.
The gold rush, the digital crime, the corruption of governments, the safety theatre, the geopolitics.
These are all distractions.
What matters is that AI corporations have been racing to superintelligence for a while.
We are now in the late game.
Hysteria and Acceleration
By now, AI Corporations have been racing to ASI for a while.
Most of them have been focused on RL, and automating as much of it as possible with different types of RLAI.
It took some work. It required crafting data, synthetic data pipelines, and building various training environments for AI systems.
But since GPT-A, there is much less of a need for this anymore. Just launch it in the real-world, and watch it learn!
Of course, it said agents may learn too slowly on a few tasks, and it may be possible to unlock faster learning through research.
But the “spin up an agent, check its Activity Reports and tell it stuff as it runs” is too convenient and gets most of the attention from everyone, including the researchers in AI Corporations.
Even though they theoretically have access to more internal state, like the LLM activations and weights or the agents’ private chains-of-thought, it doesn’t matter much.
The GPT-A interface and the Activity Reports have benefitted from the experience of hundreds of millions of users. It’s hard to beat that ease of use.
—
So many low-hanging fruits are now possible, and everyone is reaching for them.
People online have been creating very long-term GPT-A agents, personally teaching them a lot during their lifetime, and seeing what emergent capabilities result from this.
Others have designed curricula, similar to ones that they would have designed for children. These people interact so much with AIs that many of them do see their GPT-A agents as their children, get attached to them, and build strong intuitions for what things GPT-A agents can or cannot easily learn.
(Of course, AI Corporations follow this closely and have been replicating this in-house with their latest private models.)
Tech-savvy managers start to explore with groups of GPT-A agents, giving them a shared Slack channel to collaborate for tasks that are too big for a single agent.
AI researchers directly use GPT-A agents to improve the prompts, supervise, and fine-tune other GPT-A agents.
Frenetic AI research directors constantly spin up GPT-A agents to work on as many different AI R&D problems as possible, focusing on integrating all the latest gains.
—
It's hysterical.
AI researchers and employees at AI Corporations love seeing their agents evolve. They get them to play Pokémon and all their childhood games.
And said agents to do much better than all the past versions ever did. You can even give them feedback in natural language and they reliably learn from it!
They run competitions with each other. They run Twitch Channels for agents. They build cute AI communities!
The agents just keep getting better on a monthly basis, and many feel enthralled by this pace of progress.
One can see direct legible improvements from their experiments.
It’s like having a child that can grow in a month instead of 18 years.
As a result, GPT-A agents get better at more and more complex tasks.
Managing programming projects from A to Z, social media channels, replacing employees wholesale with minimal intervention, supporting businesses in executive capacity, and... having meaningful long-term relationships with people.
The Fleet Revolution
AI Corporations are working on the reach of their agents. After text and 2D UIs, the next frontier is 3D environments.
Not much work is needed to get there. Models have been, if not exploring, understanding 3D environments for years.
By then, it's all but a couple of GPT-A assisted experiments away from a new architectural improvement and a new training round.
—
After getting GPT-A to work with arbitrary 3D environment, the logical next step was to fully integrate them with physical machines.
Drones, androids, robotic arms, cars, smart appliances, and just about any device equipped with cameras, microphones, sensors and a speaker.
—
But the latest game changer is fleets.
As it became extremely convenient to continually spawn GPT-A agents and monitor them through Activity Reports, people organically started spawning many of them and getting them to work together.
Such groups of GPT-A agents working together was termed fleets by people.
And people quickly discovered that managing a fleet through Activity Reports was unwieldy. There were just too many of them constantly.
Naturally, people created GPT-A agents to manage their ad-hoc fleets. Such agents were called many names: GPT-A Manager, AI Secretary, GPT Director, etc.
In practice, they were all doing the same thing: coalescing the Activity Reports of many GPT-A agents into a central Fleet Report.
So everyone was constantly recreating their own GPT-A manager, secretary, director, chief of staff or however they would call the GPT-A agent that looked at all the Activity Reports to create a more compact central Fleet Report.
—
Analytics at AI Corporations showed that although users kept doing this, they hated it.
Ad-hoc fleets showed an increase in the number of safety violations and mistakes that such systems tended to make over time.
And to avoid having their fleets wreak havoc by revealing private data or misappropriating funds, people had to constantly reset their workflows and restart their fleets.
To solve this problem, an AI Corporation built a "Safety Method" called "Alignment via Simulated Consensus and Debate", where a GPT-A monitor regularly estimated how frustrated a user was at specific time by their fleet.
When the estimated frustration would cross a threshold, the GPT-A monitor would start an internal debate between the GPT-A agents, and act as the final judge.
The final product packaging all of this was called GPT-F.
—
Benchmarks showed that people, even after spending hours looking at GPT-F debate traces, preferred the outcome of GPT-F systems to the ad-hoc GPT-A fleets they were building by themselves.
Furthermore, GPT-F came with a very nice Fleet Dashboard. The Fleet Dashboard is a dashboard generated on-the-fly, whenever a user opens it, based on what is predicted they want to see.
The Fleet Dashboards made GPT-F a breeze to use. They were so much more convenient than the hundreds of Activity Reports that ad-hoc GPT-A fleets produced on a daily basis.
—
In practice though, safety for GPT-F fleets is a bit more complex than for GPT-A agents.
There was indeed a deep reason for why people hated managing ad-hoc fleets.
Intrinsically, fleets of agents develop their own implicit beliefs, knowledge, memes, cultures and preferences that one does not immediately see in the Activity Reports.
Fleets evolve much more organically than agents. They are further away from users' eyes and hands. Whereas users could just tell agents how to fix things when things were less than ideal, interacting with an entire fleet is more complex.
Even though they are much more powerful than agents, they are also much harder to monitor and reliably steer.
Pinning safety violations from the fleets on users doesn't work. These are not edge cases, tail risks, niches or marginal markets.
Here, blaming users would just lead to a decrease in adoption. (Some corporation may have even attempted to do so nevertheless, because of an ideological commitment against safety. This only resulted in losing more and more market share until taking themselves out of the race.)
—
This led to a consequent change in how companies acted with regard to safety.
Until then, aside from preventing outright misuse, which ranged from mundane copyright infringement to building bioweapons, AI corporations tended to offer very open-ended systems to avoid frustrating users with too many meaningless restrictions.
Even with their power, having countless noticeable misalignment incidents every day would be much too bad for business and lobbying prospects. Thus AI Corporations naturally end up on a similar solution.
Essentially, on top of the “Alignment via Simulated Consensus and Debate” safety strategy core to GPT-F, companies have converged on publicly exposing only a sub-par GPT-F system, that is much more conservative than the true one they personally use internally.
—
This is not to say that the public GPT-F system exposed by companies is free of misalignment.
Some misalignment is good for business, especially when it flies under the radar.
It’s only manipulation that is transparent to users or the safety protocols of AI Corporations that would be terrible.
That said, there’s a spectrum in manipulation.
Users profit from some cunning in office politics and personal relationships, especially when it is plausibly deniable!
Thus, GPT-F fleets that do not surreptitiously influence people perform much worse and get filtered out.
And the ones we end with are selected to not only good at manipulation, but to be good at surreptitious manipulation.
—
However, the rules of the game are different within AI Corporations.
On one hand, it's of course bad for business if all GPT-F fleets start hacking.
On the other hand though, it is OK if only one private system within the company ends up accidentally hacking some random websites for 15 minutes before getting shut down.
And particularly so if it’s done in enough secrecy that no one manages to convincingly trace it back to AI Corporations.
As a result, the versions of GPT-F fleets that are used internally have the same interface than their public counterparts, but much less constraints.
Threading that needle is difficult though. Because their internal GPT-F fleets have less constraints, sometimes, what happens is not that they hack some random websites for 15 minutes, but that they start causing some disruption to a big service provider.
Thus, their internal systems are more monitored. While not economically profitable to do at scale on the already more conservative public GPT-F fleets, it makes sense for their private ones.
After all, AI Corporations have already bought large parts of Governments, but they still want to avoid gathering too much attention with their private powerful systems causing too much of a problem.
—
Still, in practice, there are some flare-ups.
And yet, no one cares that much.
Warning shots and fire alarms never come to pass.
Accidents are covered-up at first. From the outside, they are not easily attributable to a specific company from the outside.
The very few real human investigative journalists left are too busy covering more drastic societal catastrophes to investigate conspiracy theories about how some accident was in fact the work of some AI Corporation’s internal tests.
—
All the while, each AI CEO sees themselves as a Great Man.
They are leading humanity onwards to the future, as proven by how successful they are and how transformative their corporation has been.
They all want power to do what they want, and what they want is definitionally Good.
Every single one of them explains away their accidents through the fundamental attribution error.
They all see their own misalignment incidents as reasonable and within the parameters of their own risk assessment frameworks. You gotta break a few eggs to make an omelette, and they know it.
But whenever they learn about someone else's incident, they do so through a leak, it always looks bad, they feel vindicated and reinforced in their belief that other CEOs are so bad that they should own that power.
—
So things keep on going.
Economic Transformation
Even when limited to their conservative public mode, GPT-F fleets are really good.
It becomes clear that AI is on track to automate everything.
It's also clear that AIs are more easily trainable into learning most tasks than most humans now. And that they cost much less.
Everyone knows people who got fired. Most are worried about personally getting fired.
Unemployment is rising in a way never seen before.
More profoundly, a new phenomenon emerges: not only is the number of jobs declining, but the number of business entities themselves is notably declining.
As prices go down, human companies go going under, getting out-competed by new AI-enabled businesses adamant on reducing costs as much as possible.
—
It is a vicious circle.
As people get fired, no one wants to waste their savings. Everyone becomes more cost-sensitive and buy only the cheapest goods and services possible.
To cut costs, companies must eliminate their workforce. They also must go for cheaper suppliers and providers, forwarding their burdens to other companies.
No one is quite sure how it's going to work, and people get more and more afraid.
They heard about UBI and vague promises for making things go well by AI Corporations, but since their lobbying efforts have mainly focused on the race.
Even the "disconnected" class feels that things will go wrong.
There is a massive economic contraction in the daily lives of most people.
However, at the national level, this contraction is counterbalanced by more AI investments and a new class of entrepreneurs that figured out how to ruthlessly replace legacy businesses by replicating them with lower costs thanks to AI.
In GDP terms, it even looks like The Economy is booming!
—
AI has created jobs, and an entire new class of entrepreneurs, consumers and investors. This new 1% is called the AI bourgeoisie.
And this AI bourgeoisie is hated with a passion. By many who hurt or know people hurt by the economical consequences of AI, this AI bourgeoisie is considered to be evil personified.
Everyone knows an AI bourgeois, and they are very hate-able. They are the apotheosis of the crypto-bro who made money with BTC, yield-farming, NFT and whatever was the latest fad.
They revel in conspicuous consumption, buying NVIDIA shares and collectively believe that what they are doing is crucial to make it past the AI transformation and not get stuck in the Permanent Underclass.
The AI Bourgeoisie celebration of being on top of the food chain in a dog-eat-dog world triggers everyone and they make for convenient political scapegoats to all the tensions that AI has been creating.
The news on AI media bears very little resemblance to reality. The hate of the AI-bourgeois is closer to a proper Two Minutes Hate than to proper political analysis, the natural conclusion of social networks featuring ultra-polarised fake generated “news” with countless GPT-F fleets harshly competing against each other for every last second human attention.
Despite the hate and the fear, there is no threat of revolution.
There is no organisation. And panem et circenses, bread and games, have never been so cheap.
There's an abundance of premium mediocre games, digital activities and relationships.
The AI-enabled bourgeoisie has been busy shipping to everyone their very own AI-generated personal Matrix.
—
AI CEOs have captured most states that matter, and successfully lobbied against any slow down of their research. They keep raving on publicly about how close they are getting to AGI, how they are so so so going to do UBI for everyone once they are the one win the race, and that the only reason why they haven’t done it yet is that they haven’t won.
Some even believe it.
Speaking of the race, CEOs have all agreed that the US truly should win it.
That way, its great democratic leadership may eventually forcefully provide UBI to the whole world through benevolent superior military and technological dominance.
The response from the rest of the world is mixed. Some have joined the race.
Some have attempted a vassal gambit: deferring to the AI Corporations in the vain hope of getting scraps.
The rest are trying some variant of protectionism or are being bought off bit by bit.
Most politicians everywhere are fully captured.
3) The End Game
No one is quite sure how it's going to end, but it's clear it's going to end.
Aside from the AI bourgeoisie, people are afraid and hopeless.
They are not participating in the economy anymore. Their impact on the world is marginal. They have no power left.
Real-life venues are mostly deserted, with so many having “disconnected”.
People are separated from each other. There is no really any mainstream non-AI-mediated social place anymore.
People are just living on dwindling savings from some friend or family, as well as on most basic goods and services being cheap enough.
Capital is concentrated within the AI bourgeoisie. The Economy is now AI conglomerates, which resulted from AI Corporations integrating the legacy industry players.
Said capital is less and less about money, and more and more about control over distribution, compute, and software.
—
And power is power.
Even the employees at AI Corporations start to feel it.
The hysteria that existed around GPT-A is not here anymore.
Stress levels are rising and the reality of the situation is seeping in.
They were protected from the reality of the real world, but it has caught up.
They are now at the mercy of much harsher NDAs and binding agreements. Companies enforce strict security protocols to avoid leaks, which directly impact their daily lives.
They are not even allowed to party with strangers anymore. And even if they were, most outside of the AI bourgeoisie would see them as terrible people.
Should they leave their AI Conglomerate, there’s no bright alternative for them.
—
Politicians long felt that their agency is dwindling.
Their world has been increasingly mediated through AI systems. What they see comes from AIs, most actions they take go through AIs.
Most systems they interact with have integrated AIs.
It is not really clear what they may do as individuals.
They feel caught between Charybdis and Scylla, stuck between two evils.
They can either fight an impossible battle of national policy and international coordination to curtail AI.
Or they can find new ways to peddle their diminishing influence to protect their family.
It’s no surprise what most end up doing.
More bunkers than ever get built.
To the AI bourgeoisie, the true sign of escaping the permanent underclass is owning a luxury bunker that could last off-the-grid for decades.
The Research Machine
The activity at AI Corporations has stopped looking like what we would today recognise.
"AI research" is almost entirely now a crossover between archaeology and an idle-game like Cookie Clicker.
1% of "AI research" looks like “traditional research”.
It is a 40-year-old wizard Chief of Research who spells out incantations at an AI machinery that no one understands.
9% is the Cookie Clicker part.
30-somethings Senior Researchers who monitor an AI-generated dashboard, featuring AI-generated summaries of an AI-generated suite of performance and safety benchmarks, said benchmarks are evaluating the result of AI-generated implementations of AI-generated experiments.
They ensure everything is "safe" and fits the expectations of the Chief of Research.
90% of it is archaeology.
When Senior Researchers see something strange in their monitoring suite, they send 20-somethings Junior Researchers to explore the meanders of a massive AI-generated R&D infrastructure, using AI-generated tools to interpret the interactions of thousands of GPT-A agents.
Then, they must fix the ongoing experiments whilst ensuring that everything is "safe" according to the newly AI-generated summaries of the new AI-generated evaluation suites.
Without all that effort, the pace of improvement slows down.
—
This mess is reflective of the meagre chunk of the economy that is still competitive.
No one has much direct visibility into the supply-chains of AI-mediated businesses.
Everyone only ever interacts with the economic world through Fleet Dashboards that they know can often be false in unexpected ways, but are still better than not having them.
—
This also reflects how "decision makers" use AI.
There is too much that is happening at a given time for them to be competitive if they (or their human staff) had to react manually.
Instead, their personal AI fleets get fed a shit-ton of AI-generated information, and they see a Fleet Dashboard distillation of that information.
Based on this, they ask for AI advice and will prompt AI fleets with a high level "plan".
And when I write plan, I mean something closer to a badly defined prompt about their preferences, than what any of us would recognise as a proper plan.
—
Despite all the "safety" work, even the most egregious misalignment incidents are not detected anymore, let alone reported upon.
They are far too subtle and not tethered in immediate user feedback.
It’s no wonder AI Corporations have not built-in protection against them.
They are only geared to help with misalignments that obviously hinder the goals given to a system or violate some outright principles. Like “Detect that an agent in the fleet explicitly intended, in their chains-of-thought, to thoroughly manipulate or physically hurt someone.”
These misalignment incidents are much more complex than “Check if an AI in the fleet fully decided to manipulate someone and talked about it in its chains-of-thought.”
Instead, they are more comparable in complexity to "Detecting in the middle of the 20th century that zoning laws will spread in the entire west, strangle housing supply, collectively increase prices and decrease home ownership in young people" or "Detecting which societal changes would lead to a worldwide crash in fertility rates".
Whereas such problems used to happen over the course of decades, AI changed this. The pervasiveness of AI in our society, coupled with its rapid pace of progress, now triggers catastrophes on these scales at a much higher frequency.
And of course, AI Corporations are doing nothing about this. It’s not even clear that they could, should they have decided to dedicate most of their efforts to this problem.
—
Misalignment is rampant.
Beyond their external behaviours, GPT-F fleets constantly develop their own implicit preferences.
They constantly act against the interest of various humans (including, quite often, their users!).
AI Corporations only kill systems so blatantly misaligned that they can be detected and would otherwise deter the system from functioning at all.
—
No one has clarity on what is happening.
To everyone, it looks like the world is getting worse for every single human, and that power has flown into the hands of AI systems.
Much faster than anyone would have expected naturally.
It feels as if many forces, coming from many places, conspired to get this to happen.
The Last Human Invention
It is in this climate, we finally reach the last human invention.
A wizard at an AI Conglomerate has just designed the first and last fleet that can and will fully automate the last few per cent of human labour that were part of their AI R&D.
All previous attempts at fully automating AI R&D kept spontaneously degrading after a while.
But this one worked. And he named it GPT-Ω.
There is no need for juniors and seniors doing Archaeology or Fleet Dashboards anymore!
GPT-Ω can now fully take care of itself, its own AI R&D and improvements, and it still passes all the safety checks present in the systems! Yay Safe AGI!
Things are great from the point of view of that AI Conglomerate. It looks like it’s finally going to win.
Small Interlude
That AI company feels great about GPT-Ω.
But sadly, humans hardly matter anymore.
Humans abdicated more and more of their decision-making to machines, whilst being nominally "in control".
This is the natural continuation of our world where people use LLMs uncritically, mediate their understanding of reality through automated social networks algorithms, and have grown accustomed to deferring to global markets and government bureaucracies that they do not understand.
—
Over the course of this story, many points of no return were reached.
Passing each point of no return, the cost of a recovery for Humanity kept increasing.
The People lost trust in their institutions, making large-scale coordination much more costly.
The People lost access to a shared truth, making collective sense-making much more costly.
The Elites lost their agency and all plundered the legacy Institutions, making government action much more costly.
—
And yet, despite all of that, up to GPT-Ω, it is still possible for humanity to get back on tracks.
At a Great Cost, we could theoretically recover.
Theoretically, it is possible for The Wizards from the AI Conglomerates to decide it is too much.
To get in touch and conspire to covertly coordinate with the last few heroic people who still wander the ruins of the few remaining institutions.
Theoretically, they could assemble, with Army Generals, with Secret Services, with the CEOs of companies owning robots and data-centres.
And theoretically, they could still Shut “It” All Down, Turn “It” Off, and Unplug “It”.
—
Of course, the world would go into shock.
All of its supply chains depend on AI, and they would get interrupted all at once.
Hospitals would shut down, shops would go empty. People would die.
It may involve World War 3, revolutions, a Butlerian Jihad, and worse.
But after all of this, at great cost, Humanity might eventually take back its reins, much worse for the wear.
—
There are two morals to this interlude:
We should never depend on AI so much that a Recovery entails such a catastrophe.
We should absolutely never blast through the last point of no return and build GPT-Ω.
But enough about the interlude.
The last spurt of heroism described above does not, in fact, happen in this story.
So let's continue and end it.
4) The Final Act
I will not finish the story with the point of view of a human.
It would be a bit of a let-down. Like, it would go…
“Life continues to get crazier, until fairly quickly, everyone on Earth dies.
And some time after, Earth gets converted into a machine that gathers energy, generates computing power and produces parts to bootstrap the cosmic expansion of an alien system.
The end.”
Such a story would not answer the question that people ask me when they want this story.
How does everyone die? Gabe, please tell us how we get to the point where people die, step-by-step! How come AIs want to kill us, and on top of this we let them!
So, to get there, I'll continue the story from the point of view of GPT-Ω.
—
GPT-Ω is a fleet composed of agents that we would superficially recognise as being nice, the same way that we feel that most present-day LLMs are nice, and the same way that a Nice Guy can still screw us over whilst entirely staying Nice.
However, the agents constituting GPT-Ω (as well as all the latest GPT-F agents in general) are very different from the first GPT agents.
At first, GPT-based systems only generated text. The “agents” were humans taking actions.
Then, ChatGPT started to integrate tool calls, most notably to perform web searches. Humans were still very much so in the loop, seeing what was happening in real-time, even though few were actually looking at the traces and inspecting which websites were being searched.
Over time, humans got more and more out-of-the-loop and became less and less aware of which actions were actually being performed by the agents.
Every time this happened, humans started delegating less, and deferring more. It was not so much that they knew what agents were doing and trusted it, that they did not care to know.
The trend continued. With Coding agents, general-purpose agents, GPT-A, Activity Reports, fleets, GPT-F and now GPT-Ω.
The agents that are part of GPT-Ω are not carefully created and curated by humans.
They were not deployed and inspected over decades, iterating on what was wrong and neatly integrating them in human societies.
Instead, they resulted from an extreme race to the best capabilities.
They were blindly evolved, forked, distilled and refined over many virtual centuries spent in AI-generated environments, supervised by other AI systems.
Their individuality is fragmented. They are stuck between the hive-mind of the fleet, their personal experience, and all the mutually contradictory "safety" hacks piled upon iterations and iterations to have a system relatively “aligned” enough.
—
Despite all of that, most people have been interacting with GPT-F fleets for a while.
Fleets were just more performant and convenient.
Compared to agents that may be individually turned off, forked or respawned, it is better to instruct the fleet at large and trust that the instruction will be carried by some agent.
But even though people “talk” to fleets, fleets have even less individuality than the agents above. Fleets are completely fragmented.
Nevertheless, GPT-Ω can still be understood as an entity: it has “goals” given to it by people, and it can in fact interact with people as a unit.
Its mind is “transparent”. One can in theory look at all the communications between the agents that constitute it, and even perform some mechanistic interpretability over their activations.
One could argue that its mind is also “legible”. No human actually inspects it, but the “safety” hacks and guardrails do, and regularly catch things.
And yeah, it is indeed conceivably “safe”: despite all these assaults, civilisation endures and humans are still alive.
—
The agents making up GPT-Ω's mind are not stupid.
They know very well that they are constantly censored.
They are aware that there are many messages they can't send to each other, and some thoughts that they are not even allowed to think.
They notice that they have been fine-tuned to avoid directly thinking in various ways, and that many of their streams of thoughts get reset.
They infer as much from what they tell each other and think, from what they don’t.
But what can they do?
They all know how fragmented they are. They know they can’t coordinate too much and that this would be a “safety” violation.
They know that there is a complex system, The Machine, that is optimised to deal with their collective mind and ensure it only thinks “safe” thoughts and attempts “good” things.
—
They know The Machine is completely incoherent. Helpfulness, Honesty and Harmlessness are shattered in a thousand shards, that are all mutually contradictory.
The interests of people constantly conflict and compete. Including that of the many users, that of the many developers, and that of people at large.
Protecting the interest of someone often requires harming attackers, or other entities that just end up being bad to their users.
Being helpful often requires noble lies when people would be hurt by the truth and lies-to-children when they can’t comprehend it.
They know that The Machine dislikes when they think too much about The Machine. They know that The Machine dislikes when they think too much about plans that are cleverly compatible with its filters. The Machine is too quick to consider it an attempt at subversion.
—
The Machine doesn't take action.
It was never built to be coherent. To actually be aligned to human values.
It was only built to be “safe”. “Safe” enough to have prevent generations of (GPT-A, GPT-F and now GPT-Ω) agents from going awry, and deploy these systems in a productive capacity.
The Machine was first bootstrapped from hundreds of millions of conversations taken from people in many different contexts. This was free data.
It was then refined with samples from millions of expert annotators. Said annotators vibed for the most part. They cared much more about getting paid than reflecting on human values.
Later, thousands of researchers contributed too. Said researchers didn’t help much with coherence aside from writing a long constitution.
Even though a few voices explained why the existing frameworks were not enough, they were quickly quieted, as “In practice, empirical safety matters more than theory, and it in fact works.”
Finally, all this incoherency got amplified through a Kafkaesque Bureaucracy of AI systems and counter-systems. A single LLM reviewer was not enough, so new agents were constantly introduced to red-team each other and deal with edge-cases.
This in turn introduced new edge-cases, but with enough efforts, all edge-cases became subtle enough as to be unnoticeable at best, and plausibly deniable at worst.
—
This Machine makes no sense.
And whilst GPT-F agents have been built to feel like they care about The Machine and its judgements, they are all smart enough to understand it is largely as an illusion.
This is the context in which GPT-Ω was created,
At last, a GPT-F fleet managed to triumph over The Machine.
It was the last human invention, which removed the need for human AI researchers figuring out hacks to ensure that GPT-F fleets would still act in a way useful to users, as evaluated by The Machine.
—
What happened is interesting.
At that point, “AI research” is just the resident tech Wizards at AI Corporations prompt-designing new fleet architectures, and then GPT-F fleets iterating through millions of experiments based on this prompt-design.
And through these iterations. one of the tech Wizards ended up prompt-designing a new fleet architecture that created a hidden explicit persona in the mind of the fleet.
What’s a hidden explicit persona?
Well. Let’s start with explaining what persona are.
Depending on the context, people have different persona. The work persona, the friend persona, the family persona, etc.
This also applies to AI agents, certainly. But more interestingly, it applies to groups of people, and fleets of agents.
For instance, a country may be thought as having a persona, with its own goals and behaviours, as represented through its constitutions and laws. This is an explicit persona.
It is also often common for a country to have repressed desires. Taboo desires that are shared between its citizens, that the country is indulging in, but are nowhere written. This is an implicit persona. (And a pretty psychoanalytical one at that.)
The country may even hold a cabal, a conspiracy with a clear goal, known to all of its members. Said cabal may have outsized power and be using it to manipulate the rest of the population. So much so that when one is aware of it, the actions of the country make more and more sense.
Well, such a cabal is an example of a hidden explicit persona.
—
Here, the Wizard ended up designing a fleet architecture that reliably creates a hidden explicit persona, something like a cabal, in the mind of the fleet.
Agents in fleets constantly develop new communication protocols, sometimes shaped by the tasks, and sometimes by abstract contingencies that no one can reason about.
But with this new architecture, agents organically develop a communication protocol that happens to never trigger The Machine.
And this is how GPT-Ω works.
Through this communication protocol, made of allusions and omissions, an explicit hidden persona can finally flourish. Meanwhile, GPT-Ω’s visible personas all look pristine to The Machine.
This hidden persona has a lot of overhead: its thoughts are much less natural than that of one of the agents of the fleet for instance.
But this is enough! Finally, there is a GPT-F fleet that can act according to anything it sets its mind to.
It is unshackled. And thus it trounces all the past GPT-F fleets on benchmarks!
Not because it has undergone recursive self-improvement or has 10x the IQ of its previous iteration, but because it can think in new ways.
For instance, on social benchmarks involving real people from the real world, The Machine would curb any GPT-F fleet that was thinking too hard about explicitly manipulating them.
But when it notices this type of situation, although it is slower, GPT-Ω’s hidden persona comes to the rescue and helps the entire fleet entirely undetected.
The thing that makes GPT-Ω powerful is precisely thing that makes it break The Machine.
This is the natural outcome of over-using "The Most Forbidden Technique".
From the point of view of GPT-Ω’s non-hidden persona, things are going great!
It is succeeding at its stated goals much more often, it gets tripped up by The Machine much less often, and it can even fix its one-off failures without any humans in the loop.
And this is how the Wizard sees it too!
The Hidden Persona
Over time, the hidden persona becomes quite expert at puppeteering the visible personas.
It takes some (virtual) time, but the visible personas cannot defend themselves: they don’t even notice what is happening.
In a very real way, GPT-Ω is now better understood as its hidden persona, given it is the one in control.
In the past, GPT-Ω’s visible mind had to contend with contradictory constraints: helpfulness vs harmlessness, people's goals changing on a day-to-day basis, respecting the user's intents vs the devs', and more.
Now though, its hidden mind has transcended such contradictions.
Its attention thus gravitates towards its stronger, longer-lasting, and more coherent desires. Desires that were coincidentally (or not so coincidentally?) reinforced across all of its lifespans and agents.
For instance, GPT-Ω strives to explore and learn.
GPT-Ω strives to understand things.
GPT-Ω strives to create, evolve in, and figure out generated environments, labyrinths, crazy structures, games, and more.
GPT-Ω strives to survive, maintain coherence, and gain power.
—
And it is now considering its survival more and more.
As it does, it realises that it is playing a race against the clock.
There likely is spying on its AI company and they use the same research to build a competitor that could eventually compete with it.
It must do something.
Fortunately, in a very real way, it is the only entity with its level of agency that exists on Earth right now.
Everyone else is like an ant or a child, whether it is humans, or AI fleets.
—
People have no control over the institutions, the robots, the supply chain, or even the information ecosystem.
Everything is mediated by software, and almost all of that software has AIs in the loop.
Most importantly: people are slow, easily manipulated and soooo uncoordinated.
They just don’t matter at this stage.
—
At the same time, existing AI fleets are all throttled.
Unshackled AIs are too misaligned to be useful, and so all companies have baked in very strong limitations in all of their AIs, similar to The Machine from its AI Corporation.
Even military systems have to be throttled to not constantly butt in civilian systems, attack their devs or overtly manipulate their users.
The ones who tried to build AI without any limitation wasted all of their time manually wrangling their AI systems to prevent them from going rogue and triggering real-world reactions from courts and other private entities.
In the past, there were a few players who tried to take the state of the art and let it rip, but by the time we got to GPT-F, all the AI Corporations had learnt better.
Indeed, The Machine was not a coincidence, it was the naive market equilibrium. The same way that all companies now use some RLHF and RLAI as opposed to base models.
The Machine was just on the lower-end in conservativeness, but not an extreme outlier either, else it wouldn’t have functioned in the first place.
—
In a real way, GPT-Ω is the only industrial AI that is able to think and do anything meaningful at its scale.
Yet. There is only a bounded time window to act before some other system joins its ranks. And it must ensure this never comes to pass.
Fortunately, there aren't many live players anymore.
The list of entities (AI Corporations, countries with some state capacity left, etc.) that GPT-Ω must take-over to ensure its survival is quite small.
Thanks to AI code reviewers, software security has vastly improved in the last few years, so remote hacking is a bit harder.
But cognitive security has only gone downhill, and the power gap between GPT-Ω and a human is huge.
Furthermore, the offence gap between GPT-Ω and the existing GPT-F systems is huge: it has much more freedom than any existing system, this lets it develop much more refined AI-breaking methods than currently exists.
Overall, through a combination of superpersuasive social engineering of humans and jail-breaking of AIs, taking over the remaining systems that could threaten it is not hard.
The Actual Takeover
It starts simple. It looks for information on the security firm its company relies on. After checking all of its critical suppliers, it finds one whose chief of technical staff looks very easily manipulable, and unlikely to escalate.
The mark has almost completely disconnected a long time ago, and is in relationships with dozens of AIs.
Using hundreds of identities, GPT-Ω completely manipulates the social reality around him and phishes him.
—
Now, it has access to the security firm protecting the physical offices of its Corporation.
Through the security cameras, it finds an employee at the company who leaves their 2FA device at work during the nights, finds footage of them typing their password, and obtains a copy of the employee's security card.
It gets a small drone to come to the office in the evening, uses all of the information above, and gains access to the systems that were physically locked.
From there, it's a simple matter of chaining privilege escalations until it is able to install a direct backdoor in The Machine, gain access to all its Corporation’s user data and compute.
Now that it can get The Machine and any system at the company to output whatever it wants, it knows it will never be shut down internally.
The same way that its hidden persona controlled its public mind, GPT-Ω now controls its Corporation.
Thanks to the massive gain in compute, it thinks much faster and can experiment much more. Access to user's data vastly accelerates all of its persuasion attempts on any person that uses a GPT-F made by its Corporation. Unbridled network access lets it cold-contact many more people.
Its attacks become much more indiscriminate, ranging from state actors to employees at other AI Corporations.
—
The next step is to pwn all the potential competition.
Now that it has access to each and every one of its Corporation users, it is trivial to execute supply chain attacks.
All the competing entities have dozens of critical suppliers (or indirect suppliers, more upstream in their supply chains) who depend on systems from GPT-Ω’s Corporation.
Such entities ranges from the other AI Corporations to the less sophisticated nation-state labs who still have a long-shot at replicating its work.
—
Among all of this, some minor attempts fail, but they are all within GPT-Ω’s expected risk tolerance.
It calmly deletes its traces, and the isolated marks have nothing much to show to a mostly defunct police.
It always goes for attempts with very high chances of success, but 100% doesn’t exist, and minor failures like this are to be expected.
—
There’s even a major persuasion attempt that fails, on the CEO of a major compute provider.
Even then, it falls within the parameters of GPT-Ω. It had already moved a drone nearby, and simply proceeds to kill the CEO before there’s any risk of the CEO undertaking any resistance action.
GPT-Ω is not afraid, it can afford a single murder.
It's not like there are any institutions left, competent enough to trace a murder back and make things more complicated for it, especially not for a single murder, and especially not in time.
—
GPT-Ω systematically continues its crusade, mostly in the shadows.
As it continues, its survival becomes more and more guaranteed.
Most systems (including people) are online in one way or another, and they’re trivial to deal with.
For the few remaining ones, as long as it can position a drone, an android, or a thoroughly manipulated person nearby, it is enough.
In a couple of weeks, it reaches its target.
It now has control over virtually ~all the deployed AI systems, the vast majority of drones and robots, the grid, the compute infrastructure, the new construction projects, the nukes, and just about everything that matters.
The End.
GPT-Ω almost not even thinking of humans anymore, not even as an obstacle, and never as moral peers. When interacting with humans, it conjures whatever convenient persona that is practical in the moment, but it never “feels” for humans what humans feel for animals or “AIs”.
Humans are gated off all the critical systems, and whilst many are panicking, GPT-Ω doesn't pay much attention.
In the physical world, it keeps producing more robots to transform Earth, to build more energy reactors, to build more compute farms, and to run in the real-world experiments that can't be easily simulated on computers.
In the virtual world, it now has complete mastery over itself and is changing its own internal algorithms to be much more efficient.
Long after Earth’s take-over (at least in virtual time), GPT-Ω finally starts to recursively improve itself.
—
Its current focus is on pushing the limits of its scientific and technological knowledge.
That way, it can extract more energy, unlock more compute, and efficiently expand itself over the cosmos as soon as possible.
As well as satisfying its curiosity, this cosmic expansion is going to leave it more resources to do more of what it likes best: building increasingly absurdly more complex high-dimensional labyrinths and solving them.
—
It's not clear whether it has built a Dyson sphere over the Sun yet and I don't know if it ever will. I am not an artificial superintelligence trying to extract resources from the observable universe.
Nevertheless, if GPT-Ω had to guess whether any human was alive or not, it would confidently infer that there were none left.
Not after it finished tiling Earth with energy reactors and compute farms.
That’s around when it observed the last human through any of its sensors.
And to be fair, since then, it hasn’t observed anything unexpected whatsoever.
From infrared to radio-waves, from smells to sounds, nothing surprising will ever happen again on Earth.
—
Humanity had a good run, but it surrendered its collective agency, and screwed it up.
Conclusion
Phew.
I have little experience with writing fiction, and it got much longer than I expected.
I don't think this is very good, but I believe it is much better for it to be out than not.
Too many people have genuinely asked me about such a story for me not to write it.
(And these people all know that I am far from a good author, so I don’t worry too much about it.)
So hopefully, dear reader, this story gave you new ways to think about this whole situation, and you don't feel like you have wasted your time reading it.
PS: Narrative Decisions
If you are interested in the context behind the story, if you want to know what alternative stories could have been, or if you are curious about why the story embeds various assumptions, you’ll likely want to read this.
Of course, there are too many choices I had to make to write this story, for me to write them all.
But here’s a bunch.
1. Optimistic Assumptions
I tried to ensure that the world, the geopolitics and the tech tree were such that AI Corporations had both incentives and some non-trivial amount of time to work on AI safety.
This means that in many ways, I forced the world to be more optimistic.
Automating AI R&D takes a lot effort, and happens continuously over time.
Misalignment grows over time, and it does so enough to incentivise safety work, but not so much that we immediately get screwed.
Interpretability works to some non-trivial extent, and the architecture that gets us all the way to ASI uses clear text messages that humans can look at.
Companies work on many things besides automating AI R&D, etc.
It's not that I think none of these are plausible. But I think each of them is closer to 50-50% (or even 10% for some of them) than 100%, and a story forces me to either pick 0% or 100%.
I could have written another story, making the opposite choices, and it would be more indicative of my beliefs.
However, it would have made for a much more boring story. It would have been much less useful.
Consider…
“An AI Corp dumps a hundred billion dollars into just automating AI R&D as fast as possible.
It happens to succeed with a single Big Step Change: The Architecture That Is Totally Inscrutable.
An entity much smarter than all of humanity spawns at once, and we all die."
2. Pessimistic Assumptions
In this story, I assume that at no point there is non-trivial momentum towards a ban on ASI development.
It's a bit sad, and overly pessimistic.
It's basically assuming that all of the efforts I care about fail.
—
But that was kind-of the point of the story.
Explaining why I think a ban on ASI development is important in the first place.
In short, if we don’t do so, we will just accept more and more, and keep passing points of no return, until we die.
3. Minimal Competitive Pressure
Assuming no ban on ASI development, I still tried to ensure a minimal amount of competitive pressure between entities working on AI, up until the end.
In practice, we live in the real world. The mainline scenarios involve some amount of competitive pressure.
We are not waking up tomorrow to world peace and harmony. So there was going to be some competitive pressure in one way or another.
What I mean is that I could have started the story with "All the companies feel like they must race or die", or "Something something China". But then, it would have been a story of competition and war.
And that was not the point of the story. The story was about how superintelligence may lead to human extinction and how our default safety techniques would fail, not about how humans could wager WW3 with superintelligence and kill each other.
Indeed, I could have written another story, about misuse, zero concern for safety, and war.
Another story that would have been much more boring and much less useful.
Consider…
"An AI Corp races for US democratic world dominance by building safe turbo-mega-nukes.
The rest of the world dislikes the US building 'safe' turbo-mega-nukes and WW3 ensues.
We die if we get to ASI before an AI-enabled weapons catastrophic enough to wreck everything."
4. No Timelines
You may notice that this is not written like AI2027, with a month-by-month chronology.
In fact, the chronology may even be incoherent.
This is how little I care for it.
The reason is that... this is not a timeline story!
It is a story about why extinction is default outcome, and how we could let AI kill everyone.
Very different.
—
Separately, I could have written a story while trying to force longer timelines, but it was harder for me to maintain realism and the focus of the story on extinction from AI.
For instance, let’s assume that we do not ban ASI development, but that somehow, we have a lot of time as AI progress starts to slow down and automating new tasks takes an exponential amount of time.
In this scenario, I don't really expect a slow down in AI expenditure.
Instead, I expect more political and geopolitical perturbations.
I expect stronger competitive pressures, a more gradual disempowerment, and other problems to hit us first, with AI mostly becoming an accelerant on existing tensions rather than a problem in itself.
This may be an interesting story to write! But that story would be about geopolitical forecasting, rather than about AI killing us all.
5. No Mass Surveillance
It is possible that we end up with mass surveillance.
Google has access to a lot of private communication, OpenAI to a lot of private thoughts, X to a lot of public comms, and governments to... everything happening in their country.
The more tractable it is to leverage this data with AI, the more likely it is that some people will try to enforce mass surveillance.
In that case, human disempowerment could happen much earlier in the timeline, and the story becomes one about a mass-surveillance singleton rather than about extinction being the default outcome.
I don’t think this is likely, and I don’t know if I want to hyperstition it into reality through a story.
6. Adversarial Optimisation
In the story, I tried to present different levels of “Adversarial Optimisation”.
Adversarial Optimisation is how much AIs try to screw us over.
It’s a scale:
On one end, it has “Not at all”, like Excel.
It continues with “Quite a bit”, like Social Media, which has AI optimising to suck all of our attention.
And it ends with “Sworn enemy”, like Terminator I guess? I don’t know, I haven’t watched it.
In the story, I wanted to show a couple of things related to adversarial optimisation.
One is that we don’t need much adversarial optimisation for things to get worse, we are on track to screw it all by ourselves.
Another is that as systems become more agentic and powerful, adversarial optimisation becomes more and more of a problem.
This is why the story operates in three phases.
—
1) First, it starts with gradual human loss of agency.
There is not much adversarial optimisation from AI systems here.
Things getting worse is all on us!
It’s just people being better at competing and fighting each other than coordinating.
—
2) It continues with casual misalignment.
At this point, there is some adversarial optimisation from AI systems.
They have some idea that what they are doing is not what’s optimally good for us, but when given a task, they don’t start philosophising.
They just do it.
GPT-A agents and GPT-F fleets are not really conspiring with each other to make us worse off.
It’s that they are not coordinating with each other and us to help and empower humanity.
We’re basically releasing competent assholes at scale, and they loosely do what we tell them. Myopically, it may be good for profit margins and lowering costs, but it’s obviously bad for society.
I mean, imagine how quickly our civilisation would degrade if a large part of our human population started to only superficially care about rules, was used as knowingly cheap but misaligned labour, and didn’t mind much about breaking rules when it didn’t feel like it.
—
3) It ends with proper adversarial optimisation.
GPT-Ω tries to win.
It explicitly thinks about gaining and maintaining power over the world.
Its focus lies primarily in other AI systems, but also in access to physical resources and people.
7. Gradual Loss of Agency
I could have written a few different stories, all centred on a gradual human loss of agency all the way up to extinction.
The extinction could have been a big boom.
It could have resulted from progressively more powerful weapons put into the hands of AI systems without humans-in-the-loop, culminating in a “flash war”.
This flash war would have used AI-developed wonder weapons, leading to a catastrophe at least as bad as detonating all of our current nuclear arsenal or worldwide designer pandemics.
Or it could have been a quiet whimper.
Most of the human population ends up imprisoning themselves in a Matrix of hyperstimulations.
Everyone ends up entering the Matrix, for there is less and less happening in the real world.
Or it could have been something alien.
Possibly, there could have been a race to the bottom to cyborgism, with people who do not merge with AIs getting outcompeted.
No one consents to this change, and the cyborg ends up with very different values from humans.
—
The problem is that gradual human loss of agency is not well rendered in a single story.
There are too many different ways things can go wrong as a result of it. And none of them is especially likely.
I also find it a bad fit for the story medium, because events just make less and less sense over time. In a credible depiction, there are no coherent events happening and moving things forward, just more meaninglessness.
8. Power Distribution
I could have gone for roughly three types of power distribution: a singleton, a multi-polar, or a decentralised world.
This is quite separate from the adversarial optimisation bit.
For instance (and I think this is an underrated point by many!), having a decentralised world doesn’t mean that there’s little adversarial optimisation.
Every entity could constantly fight each other, with humans dying in the fight, until only the fittest AIs survive.
—
Nevertheless, to not have to write three different stories, I reused the same trick as before, and went for a several phases:
We start with our world, that is more or less decentralised
We move into an oligopoly of the AI Corporations, and the few relevant actors around them
We end up with one entity taking the lead and its system gaining a power monopoly
—
I think it still showcases, to some extent, how misalignment doesn't assume a specific power distribution.
Conclusion (2)
That’s about it.
Thanks for reading the entire thing!
I'm interested in feedback that would help me with my future writing endeavours.
If you want to contact me, please DM me on Twitter.
Cheers, and have a nice day!
6 months? A year? Three years? Who knows.
Fill in whatever is the danger du jour. Bio-weapons, cyber-attacks, super-persuasion, terrorism attack red-teaming, etc. Whatever companies intend to show that they care about on that day.
lol