{"version":"v1","site":{"name":"expectedwrong","url":"https://expectedwrong.com"},"links":{"collection":"https://expectedwrong.com/api/public/posts","rss":"https://expectedwrong.com/rss.xml","llms":"https://expectedwrong.com/llms.txt"},"post":{"slug":"claude-3-broke-my-calibration","title":"Claude 3 Broke My Calibration","subtitle":"I had a model in my head for how good AI coding could get, and now I have to throw it out.","url":"https://expectedwrong.com/claude-3-broke-my-calibration","api_url":"https://expectedwrong.com/api/public/posts/claude-3-broke-my-calibration","published_at":1710763200,"published_at_iso":"2024-03-18T12:00:00.000Z","updated_at":1771538978,"updated_at_iso":"2026-02-19T22:09:38.000Z","tags":["ai","coding","claude","llm"],"excerpt":"I had a model in my head for how good AI coding could get, and now I have to throw it out.","meta_description":"I had a model in my head for how good AI coding could get, and now I have to throw it out.","reading_time_minutes":1,"word_count":217,"engagement":{"signals":0,"counterpoints":0},"body_markdown":"I gave it a fair shot. That's the important part. I wasn't looking for an excuse to switch.\n\nClaude 3 wipes the map with GPT-4 at coding. Not nudges it. Not edges it out in certain benchmarks. Wipes. The. Map.\n\nIt output a 600-line file — multiple refactors, multiple modifications — no flaws. Not \"pretty good for an LLM.\" No flaws. The kind of output that used to require three rounds of back-and-forth, two sessions of context-pasting, and a small amount of prayer.\n\nHere's the part that's actually interesting though: it unlocked a different debugging posture. When the model is good enough that you trust the code it writes, you stop treating the AI output as the probable source of error — and you start looking everywhere else. \"I know the problem isn't in what you just wrote, so it must be somewhere else.\" That's a real superpower, and it only exists because the floor got high enough to stand on.\n\nI had a mental model for how capable coding automation could get — and I already thought that ceiling was high. Magic.dev, GPT-5, whatever's next. I was prepared for another incremental ratchet.\n\nThis wasn't that. This was the ceiling moving.\n\nThe framing has to change now, which is annoying, because I just finished updating the framing.","body_text":"I gave it a fair shot. That's the important part. I wasn't looking for an excuse to switch. Claude 3 wipes the map with GPT-4 at coding. Not nudges it. Not edges it out in certain benchmarks. Wipes. The. Map. It output a 600-line file — multiple refactors, multiple modifications — no flaws. Not \"pretty good for an LLM.\" No flaws. The kind of output that used to require three rounds of back-and-forth, two sessions of context-pasting, and a small amount of prayer. Here's the part that's actually interesting though: it unlocked a different debugging posture. When the model is good enough that you trust the code it writes, you stop treating the AI output as the probable source of error — and you start looking everywhere else. \"I know the problem isn't in what you just wrote, so it must be somewhere else.\" That's a real superpower, and it only exists because the floor got high enough to stand on. I had a mental model for how capable coding automation could get — and I already thought that ceiling was high. Magic.dev, GPT-5, whatever's next. I was prepared for another incremental ratchet. This wasn't that. This was the ceiling moving. The framing has to change now, which is annoying, because I just finished updating the framing.","hindsight":{"verdict":"right","note":"this observation was validated over and over. claude became the preferred coding model. the 'different debugging posture' insight — trusting the AI output and looking everywhere else — became the standard workflow for developers using AI.","links":[],"at":1739980800,"at_iso":"2025-02-19T16:00:00.000Z"}}}