We Missed Llama 3

Meta dropped what might be the most important open-source model release in years and some of us just... had a busy Thursday.

Llama 3 dropped today and I found out about it the same way I find out about most things — too late, from a tweet, already a few hours behind.

This is fine. This is a normal day now. A model that might actually compete with GPT-4, fully open weights, drops while you're heads-down on something else, and you surface hours later like someone who missed an earthquake because they were in the subway.

Yann posted. The 8B and 70B are out. The thing people are actually talking about is what's still in the oven — the big one, the rumored 400B+ that's still training, the one the rumor mill is calling a GPT-4 crusher, fully open source. Which, if true, is not a game changer so much as it is the entire board getting flipped face-down on the table.

And Meta trained this on 24,000 GPUs. Twenty-four thousand. There's a version of this story where the most important infrastructure decision of the decade was "how many H100s can we get" and Meta's answer was apparently "yes, all of them, also more."

The big ones are still coming. That's the part that makes you sit down for a second. This isn't the finale. This is the part of the movie where the trailer voice says and they're not done yet — except the trailer is real life and we are all in it whether we checked Twitter this morning or not.

We Missed Llama 3

Counterpoints