Llama 3 Just Beat Opus and I Need a Minute
Meta's 70B model beats GPT-4 class on English benchmarks, and the 400B hasn't even arrived yet.
the benchmark was real but the frontier kept moving. claude shipped sonnet 3.5 which dominated coding for months, then opus 4. the open model catching the frontier was real — llama 3 405B competed — but the frontier never stayed caught.
Llama 3 70B beats Claude Opus on English-only prompts.
I don't know what to do with that sentence. I've read it several times. It still says the same thing.
This isn't the Command R situation — where a scrappy open-weight model knocked out GPT-4-0314, which was already showing its age and honestly kind of deserved it. That was impressive. This is different. Opus is current. Opus is the thing Anthropic built to be the best thing Anthropic has ever made. And it just lost to a model with less than a quarter of the parameters.
"GPT-4 at home" as a meme always implied some performance degradation — you got the idea of the thing, not the thing. That's over now. The thing is at home. The thing costs nothing to run.
And here's the part that should be making everyone's hands shake a little: there's a 400B coming. Still training. Five times the parameters. Meta says it defeats GPT-4 easily, and that's almost beside the point because they also say it's 100x to 1000x undertrained — meaning they could push it dramatically further, and the only thing stopping them is that they need to build a one-gigawatt nuclear plant first.
The frontier models are racing to stay ahead of a compute wall that open weights are climbing from below, and the gap just got a lot smaller on a Friday in April.
The 400B is supposed to land by July. I'm going to go think about something else until then.
Counterpoints
Push back, extend the argument, or sharpen it. New counterpoints go through review before they show up here.
No approved counterpoints yet.