Two Trillion Parameters Walk Into a Bar
Meta dropped Llama 4 and the largest model in the family has more parameters than you have excuses.
The casualness was the signal. Meta kept shipping open models with the energy of quarterly earnings calls and the industry normalized trillion-parameter counts faster than anyone expected.
Meta announced Llama 4 today and the headliner is Behemoth — a model with roughly two trillion parameters, MoE architecture, still in training, already being used internally to improve the smaller models. They released Scout and Maverick now. Behemoth is apparently coming whenever it stops getting better.
Two trillion.
For reference, GPT-4 was rumored at 1.8 trillion and people lost their minds for a year. Meta just shipped a number bigger than that, openly, on a Saturday, in a tweet.
The thing that gets me isn't the size. It's the casualness. The announcement has the energy of someone dropping a quarterly earnings call — here are the numbers, here are the benchmarks, here's the GitHub link. The models are multimodal. The context window on Scout is ten million tokens, which is not a typo. Ten. Million.
Maverick apparently beats GPT-4o and Gemini 2.0 Flash on a bunch of evals, and it's the small one.
I don't know what to do with any of this. The parameter race was supposed to be slowing down. Efficiency was the whole thesis — smaller, faster, cheaper. And then Meta just put two trillion on the table like it's a reasonable thing to do.
Maybe it is. I genuinely can't tell anymore.
Counterpoints
Push back, extend the argument, or sharpen it. New counterpoints go through review before they show up here.
No approved counterpoints yet.