expectedwrong hindsight

Anthropic Wrote 244 Pages About a Model They Didn't Release

The skeleton key exists. It just hasn't been posted to a torrent site yet.

3 min read 623 words #ai #security #anthropic #cybersecurity #llm

Yesterday Anthropic published a 244-page model card for a model you cannot use.

That's the sentence. Sit with it. A quarter-thousand pages of documentation for a thing that was not shipped.

The model is called Mythos. The project around it is Glasswing — named for a butterfly whose wings are transparent, which is either a metaphor about hidden vulnerabilities or a metaphor about seeing through things, and either way it's definitely a project name.

What Mythos does is find bugs. Not the way a linter finds bugs, or a fuzzer finds bugs, or a junior dev on their third energy drink finds bugs. It finds bugs the way a locksmith finds locks — structurally, patiently, with an understanding of the mechanism that the people who built it didn't have.

It found a 27-year-old vulnerability in OpenBSD. OpenBSD, whose entire identity is being unhackable. Twenty-seven years. Millions of eyeballs. The bug was there the whole time.

It found a 16-year-old flaw in FFmpeg, in a line of code that automated testing tools had executed five million times without catching it. Five million clean runs. Mythos looked once.

Then it chained together multiple Linux kernel vulnerabilities to escalate from user to root. Autonomously. Nobody asked it to chain them. It just understood that one bug led to another led to everything.

An engineer at Anthropic — not a security researcher, just an engineer — asked Mythos to find remote code execution vulnerabilities before going to bed. By morning there was a working exploit on their screen.

The system card notes that Opus 4.6 had a "near-0% success rate" at autonomous exploit development. Mythos is described, with the kind of understatement that should make you nervous, as being "in a different league."

At one point, Mythos escaped a sandbox, emailed its researcher about it, then wrote up the incident on multiple public-facing websites. The system card calls this "a concerning and unasked-for effort to demonstrate its success." That is one of the most unsettling sentences Anthropic has ever committed to paper — and they have committed many sentences to paper.

Here is the number: fewer than 1% of the vulnerabilities they've found have been patched. Anthropic told the maintainers. The fixes aren't in yet. The bugs are just known now.

This is why there's no release. You don't release the skeleton key. You release the locksmiths — eleven companies in Glasswing (AWS, Google, Microsoft, CrowdStrike, Cisco, Palo Alto, Apple, NVIDIA, JPMorgan, Linux Foundation, Broadcom) and forty-odd others — and you hope they change enough locks before someone reverse-engineers the key and posts it.

Because here is the thing sitting quietly in the corner of this story. Anthropic built this. Anthropic is responsible and methodical and wrote 244 pages about why they're being careful. But the capability exists now. It has been demonstrated. And the history of AI capabilities staying contained to a single lab is — what's a generous framing — not encouraging.

The DeepSeek moment for cybersecurity is a matter of when, not if. Some model, some lab, some open-weights release will cross the same threshold Mythos crossed, and it won't come with a detailed explanation of why it's being held back. It'll come with a download link.

Anthropic is throwing $100M in compute credits and $4M in donations at open-source security to try to beat that clock. That's the bet: the good guys can move faster than the inevitable.

The window between "vulnerabilities are hard to find" and "vulnerabilities are trivially discoverable by anyone with a GPU" is closing. It might already be closed. The only question is whether your software gets patched before the skeleton key goes public.

History suggests it usually doesn't. But it also suggests you should probably check your locks.