Very much agree with your arguments here. I'm surprised you didn't mention the Bayesian connection here, as that's how I always think about this trade off (and you seem similarly statistically minded).
Of course you want your mental model to fit reality better - a large p(y|x,theta) - but you also want the model parameters to be very likely too - a large p(theta). That seems to me to be the correct theoretical framework for making that trade off.
I believe what I am saying here is better modelled through decision theory and policies, rather than epistemic considerations.
In those terms, I am saying that you should often let go of a larger p(model) and p(reality|model), when the model is too complex. It will often be better to go for a simpler model that predicts worse and is less likely, because of various costs.
For instance:
- You never get p(model), you only get an estimate of it, which you can model as sampling from a random variable that is centered around p(model). And the thing is, the more complex the model is, the more noisy that random variable is. So when you sample a high p(model), when the model is simple, you are confident that it is the correct p(model). Whereas, for a complex model, you should be more confident that you just got lucky.
- The cost of maintaining the model. The model is not fully embedded in your brain, you need to make sure it stays coherent, to write it down somewhere.
- The cost of updating the model. You need to compute updates. The more complex it is, the costlier the updates are.
- The cost of sharing the model. The more complex the model is, the harder it is to share.
Very much agree with your arguments here. I'm surprised you didn't mention the Bayesian connection here, as that's how I always think about this trade off (and you seem similarly statistically minded).
Of course you want your mental model to fit reality better - a large p(y|x,theta) - but you also want the model parameters to be very likely too - a large p(theta). That seems to me to be the correct theoretical framework for making that trade off.
I believe what I am saying here is better modelled through decision theory and policies, rather than epistemic considerations.
In those terms, I am saying that you should often let go of a larger p(model) and p(reality|model), when the model is too complex. It will often be better to go for a simpler model that predicts worse and is less likely, because of various costs.
For instance:
- You never get p(model), you only get an estimate of it, which you can model as sampling from a random variable that is centered around p(model). And the thing is, the more complex the model is, the more noisy that random variable is. So when you sample a high p(model), when the model is simple, you are confident that it is the correct p(model). Whereas, for a complex model, you should be more confident that you just got lucky.
- The cost of maintaining the model. The model is not fully embedded in your brain, you need to make sure it stays coherent, to write it down somewhere.
- The cost of updating the model. You need to compute updates. The more complex it is, the costlier the updates are.
- The cost of sharing the model. The more complex the model is, the harder it is to share.