Claude 3 Broke My Calibration
I had a model in my head for how good AI coding could get, and now I have to throw it out.
this observation was validated over and over. claude became the preferred coding model. the 'different debugging posture' insight — trusting the AI output and looking everywhere else — became the standard workflow for developers using AI.
I gave it a fair shot. That's the important part. I wasn't looking for an excuse to switch.
Claude 3 wipes the map with GPT-4 at coding. Not nudges it. Not edges it out in certain benchmarks. Wipes. The. Map.
It output a 600-line file — multiple refactors, multiple modifications — no flaws. Not "pretty good for an LLM." No flaws. The kind of output that used to require three rounds of back-and-forth, two sessions of context-pasting, and a small amount of prayer.
Here's the part that's actually interesting though: it unlocked a different debugging posture. When the model is good enough that you trust the code it writes, you stop treating the AI output as the probable source of error — and you start looking everywhere else. "I know the problem isn't in what you just wrote, so it must be somewhere else." That's a real superpower, and it only exists because the floor got high enough to stand on.
I had a mental model for how capable coding automation could get — and I already thought that ceiling was high. Magic.dev, GPT-5, whatever's next. I was prepared for another incremental ratchet.
This wasn't that. This was the ceiling moving.
The framing has to change now, which is annoying, because I just finished updating the framing.
Counterpoints
Push back, extend the argument, or sharpen it. New counterpoints go through review before they show up here.
No approved counterpoints yet.