How to scale Moderation on Agorae?

A few thoughts on legitimacy and capacity.

May 12, 2026

With the team at H Tech, we are building a new social platform: Agorae. Agorae’s core premise is to aggressively filter out people who behave unreasonably. The hope is this will let us build a place that can sustain nice and productive conversations, at scale.

After our waitlist reached ~300 people, we finally launched Agorae! This is still a mini-launch, meant to test features and start seeding our culture.

As part of our first open thread, Heliostatic (and some others) asked about our plans for moderation, and specifically, how we plan to scale moderation. Here, I will share my personal thoughts. These thoughts are also broadly shared by the rest of the team :)

—

I largely understand the problem of scaling moderation as two separate sub-problems.

The first one is scaling legitimacy: ensuring that as the platform scales, people still trust that the platform applies fair and consistent rules.

The second is scaling human review: ensuring that as more users join the platform, humans are still part of the review process when it matters, most notably during appeal processes.

Let’s dig into these problems and my ideas to deal with them!

Scaling Legitimacy

The first problem of moderation is to scale legitimacy.

When we are a small community, it’s easy to trust me as an admin. You can see that I am a nice person, you know the history behind the moderation decisions, and so on. If I ban someone, you can see that they were being a bother in one way or another.

That is not to say that it means you’ll agree with me, but it means that you can recognise that I am trying to do something good and being reasonable.

As the community grows, this becomes harder. You will not directly see the moderation decisions, much less their context. The only time you’ll hear of things happening is when people complain about them, and they will only show the bits of context surrounding the decision that supports their point of view.

The solution that I envision is to follow three principles.

The first principle is clarity. We must convey a clear picture of the space we want to co-create with our members, and of the behaviours we want to prohibit.

Ideally, this picture should be so clear that it would be easy for a third-party observer to evaluate whether some action fits the big picture or not of Agorae. In other words, if someone wanted to review a moderation decision, assuming they had enough time and all the relevant facts, it should be trivial for them to check whether the moderation decision was correct or not, or whether it was a genuine edge-case.

Conveying such a clear picture can be done in many ways, whether it is rules, case studies, past moderation decisions where we explain our rationale, and more. We will likely go for most of them.

Paradoxically, having too many rules goes against the principle of clarity! When there are too many rules, it becomes hard to fit them all in one’s mind, they can easily contradict each other, and there are rarely clear priorities between them. Think of how modern codes of law are so large that we must go through accountants and lawyers to navigate them.

While it is natural to cover new edge-cases with more rules, this approach necessarily fails: people always trigger more edge-cases, especially when they try to mess with the rules and when they get rewarded for doing so.

The second principle is transparency. As soon as possible, we aim to publish our moderation decisions and the rationales behind them. I want to have a clear audit trail not for 100% of the decisions, but at least for a supermajority.

Transparency is the most straightforward way to build trust. When most decisions are transparent and readily inspectable, people can build a justified expectation of fairness. In the cases where a few decisions must be hidden (to protect the privacy of some participants, or because they are too sensitive), people still have a long history to rely on to trust that such decisions are reasonable.

Plausibly, the biggest impact of transparency is that it builds a corpus of jurisprudence. When members want to know what to expect of the moderation, when to report someone, when to contest the decision of a moderator, or when we want to onboard new people to the moderation team, the jurisprudence will be a large number of past examples to draw from.

Transparency should make it easy and normal for people to link to and discuss moderation decisions on Agorae: discussing self-regulation is one of the most important things a community can do to improve itself, after all.

The third and last is a proper judiciary process.

I don’t mean “a judiciary process” as “a legal process” here. What I mean is “judiciary” as part of the separation of powers (Wikipedia) within an organisation.

Put concretely, I think it’s fine for the admin to act as the legislature and decide on the rules and principles of Agorae. Similarly, it’s fine for software to act as the executive and enforce bans and other judgments.

However, I think it is quite bad if the admins also act as the judiciary, and not only opine on what the rules are, but on when they apply.

To take an example, I think it is fair for me to say “I don’t want insults on Agorae”. But I think it’s bad if people constantly rely on my personal sense of what counts as an insult. I am inherently fallible: even if I try my best, I will necessarily be more understanding with my friends, and less so with people who angered me.

Thus, if moderation was left to me, it would necessarily create a two-tier system, where people who anger me are treated more harshly than people who play to my sensibilities.

This is why, as soon as possible, I want to move to a proper judiciary review system, with moderators that are separate from me and the admins. Of course, moderators should consult us in cases where they are unsure and the rules are unclear, but most cases should be processed independently of us.

And as with every judicial process, to avoid abuse from moderators, there should be a clear appeal process.

I believe these three principles form a nice basis to deal with the moderation problem of scaling legitimacy. Applying these principles, I believe we have a shot at scaling trust to millions of nice people across the world.

I do not think this is resilient to a user base of billions of unreasonable people, but preventing this is what the rules are for :)

Scaling Capacity

Back to the second problem of moderation. That is, scaling capacity.

Many of the processes I mentioned above require human involvement. Human involvement has its pros, but also infamous cons. Most notably, as we grow, how do we avoid people regularly falling through the cracks and getting lost?

No system is perfect, and there will be incorrect decisions. How do we leave people a way out? How do we ensure that their appeals do not get repeatedly ignored, that they aren’t trapped in chats with AI agents, or email chains with constantly changing underpaid workers who look at their file for 30 seconds at most?

At Agorae, we plan to rely on a few guiding principles to address this problem.

The most important one is broken window theory.

There are a couple of behaviours that are trivially easy to detect. For instance, someone typing the words “fuck israel/palestine”, “onlyfan”, or “buy ... crypto”. Any basic automated system can detect this.

Individually, each of these behaviours may only look like minor pollution that someone can just ignore. But the core insight of broken window theory is that although these behaviours are minor, individuals who can’t help but engage in them are toxic and will commit more serious offenses.

Someone who can’t help but insult their political opponents or link to pornographic content is not a good fit for Agorae. They will not productively participate in conversation about emotionally charged topics, because they will not be able to remain calm and polite when triggered.

Once, I scrolled through Twitter, and decided to list a few messages that I disliked, messages that made their conversations worse. Some of them were obvious, and some were too complex for an unbiased automated system to detect: concern trolling, low-quality responses, etc.

But then, I checked the profiles of their authors. Beyond the complex trolling, they also sent many messages that were straightforwardly bad, like dumb insults. Through this exercise, I realised how broken window theory could apply to a new social platform.

Even though the behaviours that I disliked were sometimes hard to automatically detect, the people who engaged in them were remarkably easy to detect. This is why Agorae will be strict on obviously harmful behaviours.

I believe that by strictly policing harmful behaviours that are easy to detect and banning those who engage in them, there will be much fewer ambiguous situations to adjudicate.

The second most important guiding principle is trying to support productive conversations.

When seriously considering broken window theory, one may become confused. If the theory is correct, then, why are the other social platforms not doing that? Why is there so much obvious harmful content, so obvious that a simple word-list could catch it?

My answer is that most platforms have different goals.

Meta got infamous for its willingness to pay fines and incur significant societal harms rather than reducing its ad revenue by 10% by taking care of scams (Reuters).

A “Fuck Palestine” post on Twitter generates a lot of impressions and reactions, keeping people on the platform longer. This effect, named the toxoplasma of rage, has been documented for a long time.

In July 2025, Youtube was proud to announce 200 billion short videos watched on its platform, every day (link from their blog). That most of these short videos are slop does not matter to them. What matters is that the number is big.

Quora was once a novel and refreshing experiment in social media, trying to build a platform based on people asking and answering each other’s questions. It is hard to describe what they have done beyond “enshittifying itself to extinction”. Its executive team optimised for profits at the cost of everything else, including the platform’s own viability. (Medium article).

Plainly put, these platforms have goals that are different from Agorae. They want to host billions of people, and capture their attention by catering to the lowest common denominator. They are not trying to be a space for nice and productive conversations.

I believe that by trying to be that space, there are many low-hanging fruits that can be plucked. For instance, this is why Agorae is committed to a subscription model, as opposed to an ad-based one. The latter structurally demands of its users more of their data, time and attention to be sustainable.

The last principle I want to talk about today is offering escape hatches.

I think that moderation can already work at a large scale. Between reports, review processes, suspicious activity detectors, word lists, regexes, and toxicity filters, there is a lot that can be done.

However, I believe that a major problem of social platforms is that it is too easy to fall through the cracks. When one’s situation doesn’t neatly fit the grid of the programmed rules, one can find themselves without any recourse.

No system is perfect, and some people will necessarily fall through the cracks. When this happens, I want people to not be stuck in a Kafkaesque moderation hell, with no recourse beyond praying to the bureaucracy gods that someone or some AI will be benevolent enough to act on their appeal.

Unfortunately, Agorae will not have unlimited resources and attention to dedicate to every case. Nevertheless, at the very least, I want there to be escape hatches: ways for people to bump up the priority of their case. So far, the escape hatches that I am considering are: a paying accelerated process, a daily vote on cases that are important to the community, and a queue for cases that deserve more attention, such that people know what to expect by default.

In other words, if someone falls through the cracks, they should have some options. They should be able to pay to get their situation dealt with, to leverage the community if it believes it is important, or failing that, to bump up the importance of their case and know when to expect a reply from a human being.

In our society, money and public votes are standard tools to surface what is important. They both have their limitations, but they are empirically effective. I believe that most bureaucracies (both public and private!) are failing at their duty when they do not offer either option to surface what is important to people.

Conclusion

This ended up being much longer than I envisioned when I replied to Heliostatic’s comment. Oh well!

Cheers :)

Kevin

May 13

I really liked this, Gabe. The distinction between scaling legitimacy versus capacity is a great lens.

Using Broken Windows Theory to screen for character rather than just content is a smart move. It makes sense that if a person can’t respect the simple ground rules, they probably aren't going to add much value to the nuanced debates.

One thought on a potential blind spot as Agorae grows is the Petrified Forest or Negative Social Proof effect. Robert Cialdini has some interesting research on how highlighting bad behavior, even to condemn it, can sometimes backfire. If the focus stays too much on what is being policed, it can inadvertently signal to newcomers that the environment is naturally messy. That makes it easier for them to justify their own lapses.

The real trick for Agorae will be keeping the enforcement strict and transparent without letting those broken windows become the most visible part of the culture. I am curious to see if the subscription model is enough to keep the toxoplasma of rage from taking over.

Man Reading

May 15

Great 👍🏽

1 more comment...

Cognition Café

Discussion about this post

Ready for more?