Strawberry Is Two Weeks Out and Open Source Just Had Its Worst Week
September 2024 is somehow doing the most.
Strawberry shipped as o1 in September 2024, roughly on the predicted timeline. The "two weeks out" call was almost exactly right.
Sources are saying OpenAI ships Strawberry within two weeks. Which means by the time anyone finishes processing the Reflection 70B disaster, there'll be a new thing to process.
The Reflection 70B situation is worth sitting with for a second. A model drops claiming to beat GPT-4o and Claude 3.5 Sonnet on benchmarks, the open source community briefly loses its mind with excitement, and then — almost immediately, with the kind of speed that suggests people were already suspicious — researchers start running it themselves and finding something between "underwhelming" and "this is not what was described." The current theory, still being nailed down, is that the benchmarks were not quite as reproducible in the wild as they were in the announcement. The model may have been routing to proprietary APIs under the hood. Nobody fully knows yet.
What's remarkable isn't the fraud part — if that's what this is — but how fast the community went from celebration to forensics. The gap between "world's best open source model" and "wait, something is wrong here" was measured in days. That's the part that actually means something: the ability to detect bullshit has gotten faster than the ability to produce it, which is either encouraging or just means the bullshit will have to get more sophisticated.
Meanwhile, Strawberry. Two weeks, allegedly. A reasoning model that OpenAI has been sitting on for long enough that the codename feels like a meal they've been cooking since winter. Whatever it is, it'll land into a news cycle that just got salted by a fake open source breakthrough, which is maybe not the worst timing for a company trying to make the case that the real thing is worth waiting for.
September is doing a lot.
Counterpoints
Push back, extend the argument, or sharpen it. New counterpoints go through review before they show up here.
No approved counterpoints yet.