2 Comments

Very much agree with your arguments here. I'm surprised you didn't mention the Bayesian connection here, as that's how I always think about this trade off (and you seem similarly statistically minded).

Of course you want your mental model to fit reality better - a large p(y|x,theta) - but you also want the model parameters to be very likely too - a large p(theta). That seems to me to be the correct theoretical framework for making that trade off.

Expand full comment

I believe what I am saying here is better modelled through decision theory and policies, rather than epistemic considerations.

In those terms, I am saying that you should often let go of a larger p(model) and p(reality|model), when the model is too complex. It will often be better to go for a simpler model that predicts worse and is less likely, because of various costs.

For instance:

- You never get p(model), you only get an estimate of it, which you can model as sampling from a random variable that is centered around p(model). And the thing is, the more complex the model is, the more noisy that random variable is. So when you sample a high p(model), when the model is simple, you are confident that it is the correct p(model). Whereas, for a complex model, you should be more confident that you just got lucky.

- The cost of maintaining the model. The model is not fully embedded in your brain, you need to make sure it stays coherent, to write it down somewhere.

- The cost of updating the model. You need to compute updates. The more complex it is, the costlier the updates are.

- The cost of sharing the model. The more complex the model is, the harder it is to share.

Expand full comment