Discussion about this post

User's avatar
Anthony's avatar

Thanks Gabe for putting this case out there. I share a lot of your concerns, as you'd guess. In my view the role of "evals" orgs should ultimately be to evaluate and audit the case that the company has to affirmatively make regarding various properties (safety, controllability, reliability, security, etc.) to an external auditor in order to be granted sometime (relief from liability, license to deploy, license to develop) etc. Current evals do not have this frame and I completely agree that they play into the "default go" paradigm that is fine for normal software tools and impossibly problematic for existentially dangerous one. Evals orgs could, I believe, pivot to evaluating assurance cases, but someone would need to require / incentivize organizations to make them, as they will not do so without that.

Whether they are a net negative is a different case - they do bring problematic capabilities and behaviors to light that might not otherwise come to light, and this is a real service -- but of course contingent on there being anyone paying attention who is capable or willing to act of the basis of the information. So the negative part comes from them being taken as "the thing that is being done to provide safety" while there is no framework in which they can actually do that.

Oliver Sourbut's avatar

Provocative, thought provoking! I have (and had both while and before I worked at UK AISI) similar concerns about the effectiveness of evals as an agenda. They aren't *only* doing evals, though it remains quite central.

Relatedly, UK AISI employees are civil servants, and thus officially have to be somewhat silent in public communications (except via very bureaucratically filtered publication channels). Interestingly several do actually carry on low key public comms on twitter and similar.

4 more comments...

No posts

Ready for more?