Google Dropped a 27B Model That Beats GPT-4o and It Runs on Your Laptop
Gemma 3 is here and the size-to-capability ratio is genuinely embarrassing for everyone else.
A 27B model beating GPT-4o is still sitting there. Whether the benchmarks hold up outside Google's own evals is still the question. The gap between open and closed is still collapsing.
Google released Gemma 3 today and the 27B model is, by their evals, beating GPT-4o and Claude 3.5 Sonnet on a bunch of benchmarks — a 27 billion parameter model, the kind that fits on a single GPU, outperforming the flagship closed models from OpenAI and Anthropic.
This is either a sign that the benchmark game is completely broken, or that the gap between open-weights and closed has finally collapsed. Possibly both. Probably both.
The lineup is 1B, 4B, 12B, and 27B — all multimodal, all supporting a 128K context window, all speaking 140-something languages. The 1B model runs on a phone. The 27B model is apparently the thing you reach for when you want to beat the giants while keeping your electricity bill below a car payment.
What Google did here — if the numbers hold up outside their own test suite — is make "you need a frontier API subscription to get frontier performance" no longer obviously true. That's a weird sentence to type. It might not stay true. But today it is, or at least it's genuinely in question, which is more than you could say last week.
The Apache 2.0 license means you can do whatever you want with it. Fine-tune it, distill it, ship it in a product. Nobody's coming for you with usage restrictions.
The thing I keep coming back to: these weights are on Hugging Face right now. Anyone with a decent workstation is running inference on a model that benchmarks above GPT-4o. That used to sound like a prediction for 2027.
Counterpoints
Push back, extend the argument, or sharpen it. New counterpoints go through review before they show up here.
No approved counterpoints yet.