More about the Baseline
A core design principle is Keep It Simple, Stupid. I recommend meditating on it.
Relatedly, when trying to do anything that spans more than a day, I start with a dumb plan, the Baseline. With it, I can immediately consider the worth of what I want to do: I have a basic cost estimate (that should only go lower as I think more!), and I can compare it to its benefits.
The Baseline plan should be cheap and get results quickly. I capture the MVP, the smallest useful version of what I want with it. If I want to test a game or whether one can induce mild synesthesia to help develop Absolute Pitch, I look for the smallest demo that can be implemented.
The Baseline plan for most things should be super local. It should involve changing as little as possible, doing localised things, and making it easy to trace the results back to different parts of the plan.
As a rule of thumb, I start thinking about more complex plans only after I have reached a solid Baseline. Even then, the more complex plans will look like improvements to the Baseline (making it cheaper, more minimal, and more local) or small variations to the Baseline (comparably reasonable).
Giga-Braining
I hang around with a lot of smart people.
Because they can easily come up with fancy plans and arguments, they make it their thing. So whenever there is a problem, they devise a Fancy Plan. Whenever there’s a thing to be considered, they try to come up with Fancy Arguments or Counter-arguments (“Have you considered…?”).
This can be done adversarially just to troll others. Concern trolling, Gish Gallop, and Denial-of-Service Attacks are all real.
But smart people too often do it to themselves. This type of overthinking is what I call “Giga-Braining.” If it were an image meme, Giga-Braining would stand between Big-Brain Wojack and Galaxy-Braining.
A smart person Giga-Braining themselves will come up with the craziest counter-arguments for why their ideas are wrong, why their opponents are virtuous, and why they should become subservient to them.
A smart person giga-braining themselves will come up with the strongest justifications to explain why the obviously bad shit that they’re doing is actually good. Arguing with them does not help. Their case relies on their extremely advanced idio(syncra)tic belief system, and you should have a deep acquaintance with it before you can even start making arguments that are legible to them.
A smart person giga-braining will devise the most expensive plans for the simplest things. They are scrubs on steroids. They have forgotten what winning is. Instead, they focus all their attention on elegance, scale, or whatever abstract principle that is not their immediate bottleneck.
To be clear, everyone bullshits themselves. But a regular person who bullshits themselves can just go to a psychologist or a priest. They will have seen countless bullshit of the same type and can help call out the person on their bullshit.
But what makes Giga-Braining a special thing is that smart people Giga-Braining themselves can’t just go to a psychologist. Their bullshit is advanced; it is unique and special. It is the type of bullshit one could write True Crime and mystery novels about. To the outside world and themselves, it might not even be clear that it is bullshit. They might be thoroughly convinced.
The Complexity Tax
Ockham’s Razor (also known as the law of parsimony) states that explanations should not be overly complex, especially when simpler explanations are available.
It is hard to apply Ockham’s Razor. Sometimes, simpler explanations (“magic!”) are not necessarily better. Finding a better definition or alternative to Ockham’s Razor, that doesn’t fall prey to this, is hard. But some people have tried.
Nevertheless, let’s stick to it for now.
My generalisation of Ockham’s Razor states that Complexity should be Taxed.
When deciding between a straightforward plan and a complex plan, the complex plan should be down-weighted. On priors, at first glance, we should assume it is less likely to succeed, has a lower pay-off and has a higher cost. There is no need to make an argument or find a reason for why that is the case; it should be the default expectation.
When deciding between a straightforward argument and a complex argument, the complex argument should be down-weighted. On priors, at first glance, we should consider it less likely to be correct and to not have a flaw. There is no need to look for a flaw; it should be expected by default.
Between a straightforward idea and a complex idea…
Making the Complex Simple
Sometimes, the only correct solution is complex. In that case, my advice still applies: we are less likely to succeed, we are more likely to be wrong, and so on.
To cope with this, we should build stronger fundamentals so that the possible solutions become simpler over time, and we can start distinguishing between them without Giga-Braining ourselves.
What high schoolers do in math classes would have been rightly considered way too complex 500 hundred years ago.
What middle schoolers do in math classes would have been - and rightly so - considered way too complex 2000 years ago.
What primary-schoolers…
They are only simple now that we have crushed our fundamentals and made them so natural that children and teenagers can learn them. We have reinforced them, developed metamathematics, invented formal proof theory, and mastered formalism to be super sure of them.
This is the type of effort needed to make the complex simple: a frenetic amount of epistemology, science, distillation, and pedagogy from the brightest minds of our generation.
I feel like I have a natural defense against this: I always try to phrase things in my own mind in very simple and intuitive ways, and am generally allergic to fancy language.
I believe that it helps me detect things which are genuinely interesting and *weird* or special or complex, because such things passed my simplifier filters and are still weird.
Very much agree with your arguments here. I'm surprised you didn't mention the Bayesian connection here, as that's how I always think about this trade off (and you seem similarly statistically minded).
Of course you want your mental model to fit reality better - a large p(y|x,theta) - but you also want the model parameters to be very likely too - a large p(theta). That seems to me to be the correct theoretical framework for making that trade off.