expectedwrong hindsight

The Human Is Now Optional

Cerebras just showed what inference speed actually unlocks, and it's not faster chatbots.

3 min read 452 words #ai #inference #cerebras #agents #software
hindsight — nailed it

Cerebras inference speed continued to demonstrate what happens when the machine laps you. The bottleneck being human thinking speed rather than model speed is the new default for fast inference providers.

Go to cerebrascoder.com. Type what you want. Watch software appear.

Not stream in. Not generate. Appear — the way a webpage appears when you hit refresh, not the way a document prints.

This is Cerebras running inference fast enough that the latency between your description and a working app is basically a rounding error. The bottleneck is now you — your typing speed, your thinking speed, the half-second where your brain processes what it just saw.

The machine has lapped you. It's waiting.


What Cerebras sells is chips. What they've accidentally demonstrated is a new category of experience — software that mutates in real time from natural language, fast enough that you can treat each mutation as free. You can iterate like you have no budget and no patience, because now you don't need either.

This sounds like a developer tool.

It isn't.

It's the last version of the developer tool before there's no developer in the loop.


The extrapolation is uncomfortable to sit with. Right now, a human types a sentence, gets an app, looks at the app, types another sentence. Round trip measured in seconds. That's the current regime.

Now imagine the human exits. An agent describes a feature. Cerebras generates the code. Another agent evaluates it — does it do the thing, does it break anything, is it worse than what was there before. If yes, ship. If no, regenerate. Loop.

The loop runs at whatever speed Cerebras runs at, which is fast enough that it feels less like a software process and more like a physical law. The loop doesn't sleep. It doesn't get distracted. It doesn't spend twenty minutes on Hacker News because it saw a link about someone else's project.

Ten million iterations per unit of time you'd previously call "one afternoon."


The software industry has been congratulating itself for decades about moving fast. Two-week sprints. Continuous deployment. Trunk-based development. All of it was just rounding toward the point we're at now — the point where the bottleneck is exposed as the human who has to understand the change before they can approve it.

That bottleneck is not getting faster.

The other side is.


In six months this demo will look like the horse and buggy version. The version that required you to type. The version with the human still in frame, hunting for words to describe what they want, which is itself a form of work that didn't used to exist and now apparently does, briefly, before it doesn't anymore.

The end game was always this: software that writes software, running on hardware fast enough that the writing is indistinguishable from thinking.

We're not there yet.

We're also not not there.