Two Trillion Parameters Walk Into a Bar

Meta dropped Llama 4 and the largest model in the family has more parameters than you have excuses.

Meta announced Llama 4 today and the headliner is Behemoth — a model with roughly two trillion parameters, MoE architecture, still in training, already being used internally to improve the smaller models. They released Scout and Maverick now. Behemoth is apparently coming whenever it stops getting better.

Two trillion.

For reference, GPT-4 was rumored at 1.8 trillion and people lost their minds for a year. Meta just shipped a number bigger than that, openly, on a Saturday, in a tweet.

The thing that gets me isn't the size. It's the casualness. The announcement has the energy of someone dropping a quarterly earnings call — here are the numbers, here are the benchmarks, here's the GitHub link. The models are multimodal. The context window on Scout is ten million tokens, which is not a typo. Ten. Million.

Maverick apparently beats GPT-4o and Gemini 2.0 Flash on a bunch of evals, and it's the small one.

I don't know what to do with any of this. The parameter race was supposed to be slowing down. Efficiency was the whole thesis — smaller, faster, cheaper. And then Meta just put two trillion on the table like it's a reasonable thing to do.

Maybe it is. I genuinely can't tell anymore.

Two Trillion Parameters Walk Into a Bar

Counterpoints