{"version":"v1","site":{"name":"expectedwrong","url":"https://expectedwrong.com"},"links":{"collection":"https://expectedwrong.com/api/public/posts","rss":"https://expectedwrong.com/rss.xml","llms":"https://expectedwrong.com/llms.txt"},"post":{"slug":"huge-if-true","title":"Huge If True","subtitle":"Reflection-70B landed today and Matt Shumer has either done something historically significant or permanently torched his credibility — no middle ground on this one.","url":"https://expectedwrong.com/huge-if-true","api_url":"https://expectedwrong.com/api/public/posts/huge-if-true","published_at":1725537600,"published_at_iso":"2024-09-05T12:00:00.000Z","updated_at":1771547138,"updated_at_iso":"2026-02-20T00:25:38.000Z","tags":["ai","llms","open-source","reflection-70b","local-models"],"excerpt":"Reflection-70B landed today and Matt Shumer has either done something historically significant or permanently torched his credibility — no middle ground on this one.","meta_description":"Reflection-70B landed today and Matt Shumer has either done something historically significant or permanently torched his credibility — no middle ground...","reading_time_minutes":1,"word_count":207,"engagement":{"signals":0,"counterpoints":0},"body_markdown":"Matt Shumer dropped Reflection-70B today and the benchmark numbers are, depending on your threshold for believing benchmark numbers, either historically significant or elaborate fiction.\n\nThe claim is that a 70B model — one that fits on hardware real people own — beats GPT-4o and Claude 3.5 Sonnet. The model talks to itself in hidden reasoning blocks, catches its own mistakes, corrects them, hands you the fixed answer. Reflection-Tuning, he's calling it.\n\nShumer is a hyperbolist. This is documented. HyperWrite's entire brand is announcing things at maximum volume. But there's a difference between overstating your writing assistant and claiming you've beaten the frontier on a model the community can actually download and run themselves. The open-source part is what makes this particular claim interesting — there's no place to hide. Either the weights do what he says or they don't, and everyone gets to find out simultaneously.\n\nThat's the reputational bet he's making. He has a reputation worth ruining, which is the only reason I'm paying attention instead of filing this under \"founder hype.\" But \"beating GPT-4o, locally, at 70B\" is not a four-notch exaggeration. That's a different category of statement. Either he just did something remarkable or he's done.\n\nPlayground is live. Running the 70B myself tonight.","body_text":"Matt Shumer dropped Reflection-70B today and the benchmark numbers are, depending on your threshold for believing benchmark numbers, either historically significant or elaborate fiction. The claim is that a 70B model — one that fits on hardware real people own — beats GPT-4o and Claude 3.5 Sonnet. The model talks to itself in hidden reasoning blocks, catches its own mistakes, corrects them, hands you the fixed answer. Reflection-Tuning, he's calling it. Shumer is a hyperbolist. This is documented. HyperWrite's entire brand is announcing things at maximum volume. But there's a difference between overstating your writing assistant and claiming you've beaten the frontier on a model the community can actually download and run themselves. The open-source part is what makes this particular claim interesting — there's no place to hide. Either the weights do what he says or they don't, and everyone gets to find out simultaneously. That's the reputational bet he's making. He has a reputation worth ruining, which is the only reason I'm paying attention instead of filing this under \"founder hype.\" But \"beating GPT-4o, locally, at 70B\" is not a four-notch exaggeration. That's a different category of statement. Either he just did something remarkable or he's done. Playground is live. Running the 70B myself tonight.","hindsight":{"verdict":"right","note":"The skepticism was the correct read. \"Huge if true\" turned out to be \"not true.\" Calling Shumer a hyperbolist in the same breath as noting the claim was the right editorial instinct.","links":[],"at":1739980800,"at_iso":"2025-02-19T16:00:00.000Z"}}}