Google Showed Up
Gemini Ultra claims the GPT-4 benchmark crown, and nobody seems to know what to do with that information.
Gemini Ultra did show up. The benchmarks were real-ish. But "beats GPT-4 across most of them" turned into a more complicated story once people actually used the model. Google showed up and then the demo controversy showed up right behind it. The tier naming survived though — Ultra, Pro, Nano is still the lineup.
Jeff Dean posted the benchmarks this morning and the headline is that Gemini Ultra beats GPT-4 across most of them — which, if true, is the first time in about two years that the answer to "who has the best model" has not been "OpenAI."
Nobody seems to know what to do with that.
The tier naming is Ultra, Pro, and Nano — which is either a very confident product decision or a very optimistic one, depending on how the next six months go. My guess at the parameter math: Ultra is your 70B+ monster, Pro is the 30B workhorse you'd actually run in production, Nano is whatever you can squeeze onto a phone without making the phone sad. The on-device story is the interesting one. A model small enough to live on a Pixel that is also, apparently, not completely useless — that's the thing worth watching, not the benchmark brag.
The benchmark brag is still fun, though.
Gemini Ultra won't actually be available to anyone for a while. It's coming "early next year" inside something called Bard Advanced, which is Google's way of saying: we have the thing, you can't have it yet, please keep your subscription money warm. Pro is what you get now. Nano is what your phone gets now. Ultra is what the press release gets now.
They dumped a lot of YouTube videos today — deep dives, demos, the full technical explanation — which either means they're very confident in what they built or they learned something from watching OpenAI's demo cadence and decided to frontload the credibility. Maybe both. The videos are good, actually. Dense in the right ways.
The thing I keep coming back to is the nano model. Tiny, on-device, tuned to be useful within severe constraints — that's a different engineering problem than scaling a giant transformer, and the fact that it works at all is quietly more impressive than the Ultra number. Nobody will write the headline "Google's smallest model is surprisingly fine," but that's probably the real news.
Ultra beats GPT-4. Cool. Come back when I can use it.
Counterpoints
Push back, extend the argument, or sharpen it. New counterpoints go through review before they show up here.
No approved counterpoints yet.