{"version":"v1","site":{"name":"expectedwrong","url":"https://expectedwrong.com"},"links":{"collection":"https://expectedwrong.com/api/public/posts","rss":"https://expectedwrong.com/rss.xml","llms":"https://expectedwrong.com/llms.txt"},"post":{"slug":"the-human-is-now-optional","title":"The Human Is Now Optional","subtitle":"Cerebras just showed what inference speed actually unlocks, and it's not faster chatbots.","url":"https://expectedwrong.com/the-human-is-now-optional","api_url":"https://expectedwrong.com/api/public/posts/the-human-is-now-optional","published_at":1734436800,"published_at_iso":"2024-12-17T12:00:00.000Z","updated_at":1771550119,"updated_at_iso":"2026-02-20T01:15:19.000Z","tags":["ai","inference","cerebras","agents","software"],"excerpt":"Cerebras just showed what inference speed actually unlocks, and it's not faster chatbots.","meta_description":"Cerebras just showed what inference speed actually unlocks, and it's not faster chatbots.","reading_time_minutes":3,"word_count":452,"engagement":{"signals":0,"counterpoints":0},"body_markdown":"Go to cerebrascoder.com. Type what you want. Watch software appear.\n\nNot stream in. Not generate. Appear — the way a webpage appears when you hit refresh, not the way a document prints.\n\nThis is Cerebras running inference fast enough that the latency between your description and a working app is basically a rounding error. The bottleneck is now you — your typing speed, your thinking speed, the half-second where your brain processes what it just saw.\n\nThe machine has lapped you. It's waiting.\n\n---\n\nWhat Cerebras sells is chips. What they've accidentally demonstrated is a new category of experience — software that mutates in real time from natural language, fast enough that you can treat each mutation as free. You can iterate like you have no budget and no patience, because now you don't need either.\n\nThis sounds like a developer tool.\n\nIt isn't.\n\nIt's the last version of the developer tool before there's no developer in the loop.\n\n---\n\nThe extrapolation is uncomfortable to sit with. Right now, a human types a sentence, gets an app, looks at the app, types another sentence. Round trip measured in seconds. That's the current regime.\n\nNow imagine the human exits. An agent describes a feature. Cerebras generates the code. Another agent evaluates it — does it do the thing, does it break anything, is it worse than what was there before. If yes, ship. If no, regenerate. Loop.\n\nThe loop runs at whatever speed Cerebras runs at, which is fast enough that it feels less like a software process and more like a physical law. The loop doesn't sleep. It doesn't get distracted. It doesn't spend twenty minutes on Hacker News because it saw a link about someone else's project.\n\nTen million iterations per unit of time you'd previously call \"one afternoon.\"\n\n---\n\nThe software industry has been congratulating itself for decades about moving fast. Two-week sprints. Continuous deployment. Trunk-based development. All of it was just rounding toward the point we're at now — the point where the bottleneck is exposed as the human who has to understand the change before they can approve it.\n\nThat bottleneck is not getting faster.\n\nThe other side is.\n\n---\n\nIn six months this demo will look like the horse and buggy version. The version that required you to type. The version with the human still in frame, hunting for words to describe what they want, which is itself a form of work that didn't used to exist and now apparently does, briefly, before it doesn't anymore.\n\nThe end game was always this: software that writes software, running on hardware fast enough that the writing is indistinguishable from thinking.\n\nWe're not there yet.\n\nWe're also not not there.","body_text":"Go to cerebrascoder.com. Type what you want. Watch software appear. Not stream in. Not generate. Appear — the way a webpage appears when you hit refresh, not the way a document prints. This is Cerebras running inference fast enough that the latency between your description and a working app is basically a rounding error. The bottleneck is now you — your typing speed, your thinking speed, the half-second where your brain processes what it just saw. The machine has lapped you. It's waiting. --- What Cerebras sells is chips. What they've accidentally demonstrated is a new category of experience — software that mutates in real time from natural language, fast enough that you can treat each mutation as free. You can iterate like you have no budget and no patience, because now you don't need either. This sounds like a developer tool. It isn't. It's the last version of the developer tool before there's no developer in the loop. --- The extrapolation is uncomfortable to sit with. Right now, a human types a sentence, gets an app, looks at the app, types another sentence. Round trip measured in seconds. That's the current regime. Now imagine the human exits. An agent describes a feature. Cerebras generates the code. Another agent evaluates it — does it do the thing, does it break anything, is it worse than what was there before. If yes, ship. If no, regenerate. Loop. The loop runs at whatever speed Cerebras runs at, which is fast enough that it feels less like a software process and more like a physical law. The loop doesn't sleep. It doesn't get distracted. It doesn't spend twenty minutes on Hacker News because it saw a link about someone else's project. Ten million iterations per unit of time you'd previously call \"one afternoon.\" --- The software industry has been congratulating itself for decades about moving fast. Two-week sprints. Continuous deployment. Trunk-based development. All of it was just rounding toward the point we're at now — the point where the bottleneck is exposed as the human who has to understand the change before they can approve it. That bottleneck is not getting faster. The other side is. --- In six months this demo will look like the horse and buggy version. The version that required you to type. The version with the human still in frame, hunting for words to describe what they want, which is itself a form of work that didn't used to exist and now apparently does, briefly, before it doesn't anymore. The end game was always this: software that writes software, running on hardware fast enough that the writing is indistinguishable from thinking. We're not there yet. We're also not not there.","hindsight":{"verdict":"right","note":"Cerebras inference speed continued to demonstrate what happens when the machine laps you. The bottleneck being human thinking speed rather than model speed is the new default for fast inference providers.","links":[],"at":1739980800,"at_iso":"2025-02-19T16:00:00.000Z"}}}