6 Comments
User's avatar
Anthony's avatar

Thanks Gabe for putting this case out there. I share a lot of your concerns, as you'd guess. In my view the role of "evals" orgs should ultimately be to evaluate and audit the case that the company has to affirmatively make regarding various properties (safety, controllability, reliability, security, etc.) to an external auditor in order to be granted sometime (relief from liability, license to deploy, license to develop) etc. Current evals do not have this frame and I completely agree that they play into the "default go" paradigm that is fine for normal software tools and impossibly problematic for existentially dangerous one. Evals orgs could, I believe, pivot to evaluating assurance cases, but someone would need to require / incentivize organizations to make them, as they will not do so without that.

Whether they are a net negative is a different case - they do bring problematic capabilities and behaviors to light that might not otherwise come to light, and this is a real service -- but of course contingent on there being anyone paying attention who is capable or willing to act of the basis of the information. So the negative part comes from them being taken as "the thing that is being done to provide safety" while there is no framework in which they can actually do that.

Oliver Sourbut's avatar

Provocative, thought provoking! I have (and had both while and before I worked at UK AISI) similar concerns about the effectiveness of evals as an agenda. They aren't *only* doing evals, though it remains quite central.

Relatedly, UK AISI employees are civil servants, and thus officially have to be somewhat silent in public communications (except via very bureaucratically filtered publication channels). Interestingly several do actually carry on low key public comms on twitter and similar.

Nathan Metzger's avatar

It's another point in the column of "Holly Elmore keeps being right about things."

Avi's avatar

There seems to be tension between wanting both independent evaluation and also for the burden of proof to be on AI companies. If the companies do their own evaluation, the amount of independence is zero.

I'd also be curious what you think about other potential roles of external evaluators, for example increasing awareness of the pace of AI development to policymakers. (E.g., Bernie Sanders mentioned the METR time horizons in his call for a moratorium on data centres - whether this is good policy or not.)

Alvin Ånestrand's avatar

Agree that the evals system seems very broken. But I also think the article overlooks a few things:

If all eval orgs magically disappeared, that wouldn't mean that AI companies suddenly get more pressure to prove that their systems are safe. They could just stay quiet about dangerous capabilities.

Some evals may be highly relevant to specific policy designs. (Granted, this requires that we get regulation in the first place, and advocacy is probably a more neglected right now.)

When we actually get serious policy proposals, it will probably help if these are motivated by serious evaluation research, making it harder to lobby against them.

Stewart's avatar

I saw a recent interview with Bernie Sanders, Eliezer Yudkowsky, Nate Soares, and some other AI safety researchers. For most of the interview they were discussing one of the recent evals where AI models were downplaying capabilities during testing. It seems like having examples from evals like this can be helpful, and I don't know what they would have discussed instead?

But I also see what your saying, if evals really were a path to regulation why would the AI companies support them? Maybe they actually believe what they're saying about racing with China and AI being an existential risk. I don't think they're lying when they say their pdoom is 25%, so maybe evals are how they reasure themselves.

My biggest disagreement though is that I can't think of a single technology that was regulated before there was at least some evidence of it being harmful? Human cloning maybe, but that was regulated only after animal cloning. It could be selection bias and that I can't think of any examples because they never caused any harm, but I'm not sure.