Intention-based reasoning is a type of reasoning that primarily focuses on the intentions of people. When reasoning through intentions, one predicts, rewards, punishes, trusts and in general thinks of the world based on intentions.
I find that people tend to reason too much that way. This has been a very common pattern in my life, where I find others too attached to intentions while I easily ignore them.
I’ll outline a couple of examples, that I find quite pervasive.
Passive and Active Honesty
What they are
A common view of honesty is that honesty is the action of presenting what is in your mind with as few intermediaries as possible. The more you say what you think or feel immediately, without deceit, the more honest you are.
This is what I call “Passive Honesty”. In this model, honesty is more like a propensity, a trait of character, something that is evaluate on a moment by moment basis.
I find this view so simplifying that in many situations, it is completely wrong. Let’s take a simple example.
When we get angry, we start thinking bad things of people. If I think “You’re a twat” while angry, saying it is not honest. It reflects only a transient state of mind, rather than what I truly think of the person.
To be truly honest, I must instead actively reflect.
This might involve the practice of distinguishing my feelings with the world, and to do enough of it that I can then say when I’m angry “I’m sorry, but right now, I can’t help but feel like you’re a twat. Let me disengage until I feel calmer and can resume the conversation.”
Or I might have just told the person “You’re a twat” and the reflection came later. Then, I might realise that I was wrong, not as in morally wrong, but as in realising that I do not in fact think that the person is a twat. In that case, I should apologise both for hurting them and for having been dishonest and letting them on thinking I thought worse of them than I did.
Active Honesty needs more
Notice that both of these paragraphs are longer than just writing ‘I said “You’re a twat”’. It is not a coincidence. Active Honesty takes more time, introspection, analysis, practice, and effort overall. It is more complex.
Some people are too dumb. They can’t introspect for their life and never notice their feelings. They are too bad at analysing to understand their emotional patterns. They have no autonomy and can not plan a new habit and practice a new behaviour reliably.
Those people will predictably lack active honesty. Even if they tell you what they “truly” think at a specific time, it is not worth much. Immediate thoughts are fickle, follow the winds, and are only the reflect of what lies at the surface of one’s mind.
By focusing too much on intentions, one ends up rewarding passive honesty at the expense of active honesty.
Active Honesty as True Honesty
I’m a simple person. I don’t think seriously most of the time, but when I do, I prefer using my serious thoughts for predictions rather than changing my feelings about people.
This is why I internally think of Active Honesty as the True Form of Honesty. It’s the one that lets me make predictions!
If someone tells me “I commit to doing X”, I do not care that they truly feel like it. I truly feel like committing to at least ten things all the time, and I know I can not and will not afford most of them.
I much prefer someone who half-feels like it, doesn’t share that fact with me, but has a enough of a clear schedule and mental discipline that they can tell me “Yup, I can commit to 2 hours a week for the next 3 months. I could even go up to 4 hours if you took care of this other thing for me.”
The latter type of person is just much more reliable and thus trustable.
Eagerly psycho-analysing people
My (correct) way of doing things
Let’s move on to another of my pet peeves around reasoning based on intentions.
Personally, when I try to predict the behaviour of people, I start with their past actions. As in, I look at what they have done in the past, and assume they’ll do more of that. When that is not clear enough, I move on to what they have publicly declared.
And finally, begrudgingly, when their past behaviour and public statements are not enough, I move on to analysing their psyche, and try to guess at their intents. This is painful, and the weakest part of my models.
In practice, I often see people doing the opposite. They start with psycho-analysis.
A Mean Cynical Man
In practice, I am often described as cynical and mean.
When someone has cheated, I expect they’ll cheat again. When someone has been violent, I’ll expect they’ll stay violent. When someone has found themselves many times in terrible situations, I expect they’ll keep finding themselves in terrible situations. When someone postpones deadlines, I expect they’ll keep postponing deadlines.
This is usually misunderstood by psycho-analysers. They often reprimand me because they think I am ascribing intentions to others.
When I say “I expect this person will stay violent”, they hear “I believe this person is a sadist who wants to be violent, pleasantly imagines scenes of violence, and plans accordingly”.
When I say “I expect this person will keep finding themselves in bad situations”, they hear “I am blaming the victim, I believe they look for trouble, pleasantly imagine scenes of being in bad situations, and plans accordingly”.
Et cetera.
But that is not at all what is going on. Very often, I ascribe great intentions to people. I believe that aside from a few sociopaths, most people are trying to do good. I believe that, including sociopaths, almost everyone suffers too much of attachments and traumas to stay focused and do good reliably.
I just don’t think that intentions are the best way to predict the behaviour of people.
Anthropic
Around the creation of Anthropic, many EAs had hope that they would not race.
I was very skeptical of this. Most of the founding team worked at OpenAI, like its CEO Dario Amodei, Chris Olah or Jack Clark. Dario specifically led the development of GPT-2 and GPT-3.
It was quite obvious to me that if they built their own org, it would primarily be to build their own GPT4 and act the same way OpenAI did.
A bunch berated me at the time, thinking that I ascribed bad intentions to Anthropic’s core team.
The truth is, I think I only had one full hour conversation with Dario Amodei 2 or 3 years ago, and like an afternoon with Chris Olah. I can’t remember having exchanged with Jack Clark. I have not talked to them nearly enough to know what their intents are.
But I don’t need to! Of course they were going to race. Why else would a team that was racing at OpenAI now build an AGI company with $500mln of misappropriated FTX funds?
Their team has always been racing, why would I expect it to stop?
It was then confirmed from their pitch in early 2023 that their plan is to do Recursive-Self Improvement (branding their as “next-gen algorithm for AI self-teaching”).
“These models could begin to automate large portions of the economy,” the pitch deck reads. “We believe that companies that train the best 2025/26 models will be too far ahead for anyone to catch up in subsequent cycles.”
This belief of racing to be so far ahead that others can’t catch up has been reiterated in Dario’s recent post on export controls:
It's unclear whether the unipolar world will last, but there's at least the possibility that, because AI systems can eventually help make even smarter AI systems, a temporary lead could be parlayed into a durable advantage. Thus, in this world, the US and its allies might take a commanding and long-lasting lead on the global stage.
My main prediction is that they’re gonna keep racing until AGI or they are forced to stop by governments.
Any psycho-analysis I do of them has to contend with their actions in the race first, and explain the rest second. Right now, I am using the same model for them as I am using for Demis Hassabis.
“My organisation can use AGI to help with aligning RSI/ASI once we have it. It’s an engineering problem that will take some time. Thus our goal is to race as fast as possible and secure the protections of billionaires and governments to get a lead on AGI that lasts for long enough that we can spend time on safety before having to YOLO it.”
But again, any psycho-analysis must first contend with the prediction that they are going to keep racing until at least AGI. If it predicts that they are actually helping regulation to not race or things like this, it’s most likely wrong. I predicted and keep predicting that they will optimise for policy work that will not prevent them from racing. Respecting this constraint of not preventing them from racing, their policy work must then look good to different actors with different weights (EAs, their engineers, some safety minded institutions, etc.)
This is how you end up with “voluntary commitments”, “responsible scaling policies”, and half of EAs in “AI Safety” working on “evals”.
Deception
Back to intention-based reasoning.
Actions in the real world are much harder to fake than intentions. So when we focus on actions, we build systems that are more resilient to deception.
But intentions are too easy to fake. Focusing on intentions incentivises and rewards deception.
This erodes trust in more places than one might consider in the first place.
So many types of deception
The most obvious type of deception is lies. Someone feels a certain way, knows something to be true or is planning an endeavour, but states otherwise to mislead someone else. I have already written about this type of deception here.
Basically, lying is bad, because you can’t trust people’s assessments anymore and must instead double-check everything.
White lies are bad for a similar reason, where we can’t trust that people are not hiding obviously relevant information. Thus, we must always carefully examine loopholes and constantly exhaustively examine people we interact with.
Another infamous type of deception is self-deception. Self-deception usually arises from punishing certain types of thoughts.
For instance, there are some cultures were it’s bad to explicitly like power. In those cultures, people experiencing a desire for power will feel bad, or get punished by others for expressing it.
When it happens to someone, they will usually repress the problematic desire. Repression might involve ignoring it, rationalising it away or flinching at situations that trigger it. All of those are forms of self-deception, that end with the person unaware of their desires.
Someone that has repressed their hunger for power might start assuming that they care a lot about principles, as it feels good to enforce them. In that way, they have found outlets for their repressed desires that work.
When we are too observant of people’s intentions, we are initiating many perverse dynamics similar to the one I have just described. This is very bad, because when it happens, even when people are not lying, we can’t rely on their self-assessments anymore.
Obfuscation is a lighter and more common form of deception. Instead of explicitly leading someone on believing false things, obfuscation is about making it hard to figure out what’s true.
In the context of masking intents, obfuscation manifest itself in many ways.
For instance, there are people for whom it’s not clear whether they mean something literally or not. When asked, it always mysteriously ends up being whichever is convenient for them.
Others are confused about their feelings, in ways that are beneficial to them in the short-term, and detrimental to everyone in the long run.
By focusing on intentions, we move the equilibrium towards one where uncovering truth is impossible short of reading someone’s mind. Not only do people become better at deception and self-deception, at some point, no one wants to pay the cost of investigating and cross-checking information.
When we focus on intentions, figuring out what’s true devolves to how much trust someone can acquire through coercion, social prowess and tribal affiliations.
Extended Cognition
At this point, I think we’re in deep enough that I can start complaining about a deep problem that I have about reasoning based on intentions, that should become more noticeable thanks to the above parts.
Basically, I think that intention-based reasoning relies on a model of intentions that is too restrictive. In this naive model, intentions are only what people feel and verbalise within their mind.
But I think there’s more to intentions than what lies at the surface.
For instance, a man feels like he wants to help around the house. But somehow, they never seem to have time to help, all of his activities fill all of his time. He doesn’t know how the appliances work and where things are, which makes teaching him more costly than just doing things without him. He doesn’t have the habit of checking around the house for what’s need upkeep or cleaning.
That man might truly feel like he wants to help around the house and experience the desire. He might even immediately agree and do as asked whenever he’s asked.
But it never happens. And it’s not a coincidence. In this situation, even though at a superficial level, the man has the explicit intention to help, I believe he is better modelled as having the intention to not help.
By being “better modelled”, I do not merely mean that the latter model makes for better predictions (ie: the man will not start meaningfully helping around and taking care of the house).
I more mean that we are more than our surface thoughts and feelings. We are our habits. We are our feedback loops. Part of we are is ingrained not in our thoughts, but in how we interact with our environment.
Some people learn better in formal school systems with teachers. It’s a fact about their brain and their psychology, their restricted self. Two people with an identical restricted identity might end up with very different learning abilities by virtue of having one of them being at school and the other learning by themself at home. The former will thrive while the latter will just laze around.
That comes from the fact that people are more than just their brains and psychology, they also embed a part of their environments, the people they frequent, their habits, the media they consume, et cetera.
If someone consistently finds themselves with bad friends that get them to do bad things, they might internally be the purest lamb, but they are still better modelled as being a bad person. Because them finding themselves with bad friends getting them to do bad things is in fact a part of who they are.
There’s a reason for why they find themselves doing those things with those people. Even if I don’t know what it is because I am not a genius of psychoanalysis like the world has never seen, it doesn’t matter. I know that there’s a part of their extended self that truly has the intention of finding themself with bad people doing bad things, and it’s going to be hard to convince this extended self to stop, even though their restricted self might feel super bad about it and self-flagellate.
Autistic people and AI Alignment
There’s a common thing that I get from autistic people which is “I can’t lie because it’s too hard for me to model other people, so I’m always honest”. When I challenge them, they will double down, giving many examples of their life where people gave them shit for being too frank, honest and authentic.
If you remember the part about passive honesty and active honesty, the situation becomes clear. They only think about passive honesty, about their surface thoughts, about their restricted self, and ignore the rest.
For autistic people, it is even harder than for most people to be actively honest. Active honesty requires building a good understanding of how people react to each other and to themselves.
There’s an alignment strategy in AI, underpinning a lot of the thinking in interpretability, which is basically “As long as we can get AIs to want good things for people, in their thoughts, then everything’s alright”.
For the same reason autistic people are not a beacon of pure and neutral conflict-less trust who transcend all social problems solely because they are frank, just getting AIs to think good thoughts and want good things is not a viable alignment strategy.
Making this the primary way to get alignment is terrible, and is yet another example of intention-based reasoning.
Coordination and Politics
We live in a quite pre-scientific world when it has to do with coordination.
To know whether we want to work closely with someone, we usually care more about whether they harbour good intentions towards us rather than whether there can be a reliable work relationship.
From close relationships to wider groups, coordination relies more on being liked, reputation and vibes than protocols, interfaces and carefully evaluated track records.
Politicians come in various forms, elected government officials, CEOs and corporate executives, or community leaders.
The are all hyper-aware of this fact, that who is willing to coordinate with who primarily works through intention-based judgments, and they abuse it.
How their actions are framed, what intentions can be ascribed behind, matters much more than what their actions (and their outcomes!) actually are.
The same is true in communities where autistic people are over-represented. The same is true in communities built around authenticity and self-discovery.
Active Honesty and overcoming intention-based reasoning is hard, and takes a lot of effort. We managed to do so in physics, when we moved on from explanations based on supernatural entities willing natural phenomena into existence to more objective and predictive models.
Sadly, while interacting with the rest of the world, especially when it involves thinking people rather than inert matter and energy, all of that scientific rigour disappears.
Conclusion
There’s no conclusion here.
It’s just a small rant, because people want me to write, and I haven’t written in a while.
I really enjoyed this. I'd like to see you your thoughts on motivators. Especially in political economy where so many people seem to think and write about it seemingly without awareness of the centrality of people's varying Will to Power and/or Accumulation, to how things unfold in reality. Associated with this the use of, and susceptibility to, being manipulated.
Would it be fair to say you think of intentions as “passive self-deception”?
I understand this was just a rant but thinking this way helped me tie these thoughts together