expectedwrong hindsight

ASL-3

Anthropic just shipped the first models to cross their own safety threshold — the one they wrote to be scary.

2 min read 268 words #anthropic #safety #asl-3 #claude #rsp
hindsight — still happening

Both a landmark and safety theater simultaneously — that read hasn't aged out. Each new ASL classification carries the same tension: the framework exists because the risk is real, and the framework existing is the thing that lets you keep shipping.

Opus 4 and Sonnet 4 are apparently the first models Anthropic has classified as ASL-3 under their Responsible Scaling Policy — which is either a landmark moment in AI safety or a landmark moment in AI safety theater, and the unsettling thing is that it's probably both simultaneously.

ASL-3 is the level where Anthropic's own framework says a model could provide meaningful uplift toward weapons capable of mass casualties, or could operate with enough autonomy to run serious attacks on critical infrastructure. It's the level they wrote into their policy to be alarming. They wrote it, presumably, hoping to never trigger it, and now they have shipped it.

The part that keeps landing wrong is this: crossing ASL-3 means we're now in the stretch of the policy where nobody — including Anthropic — has a tested answer for what comes next. ASL-2 had accumulated norms, deployment patterns, red team frameworks that had been through real cycles. ASL-3 is the floor of a room that hasn't been built yet.

You can read this charitably. They evaluated the models, they found the threshold crossed, they're telling us — which is the whole point of having a policy instead of just vibes. That's genuinely more than most labs do.

You can also read it as a company that drew a line in the sand, walked up to the line, and is now explaining very carefully why the next thing they do on the other side of it is fine.

Both readings are available. Neither cancels the other out. The line existed. The models crossed it. We're on the other side now.