The Model Is the Chip

Taalas prints one model per chip and gets 16,000 tokens per second — which sounds like winning until you remember you can't patch copper.

The way I've been using Codex 5.3 spark is embarrassing to describe. It's not the smartest model — Codex 5.3 extra high exists and is demonstrably better on hard problems — but it's fast enough that "better" stops meaning what you think it means. Two seconds to a wrong answer, then two more seconds to a slightly less wrong answer, and suddenly you're at correct in under a minute, while someone waiting on extra high is still watching the spinner at minute four. Accuracy per attempt is the wrong metric. Time to correct is the metric.

Taalas is running Llama 3 8B at 16,000 tokens per second per user.

Not the model — Llama 3 8B is not winning any benchmarks in 2026 — but 16k tokens per second is a different category of thing than current inference. The mechanism is that Taalas doesn't run the model on a computer. The model is the computer. They take an architecture, pour it into silicon, and the chip that comes out runs exactly one model and nothing else. What they're calling a "Hardcore Model."

Their Forbes launch coverage noted that costs per token have dropped somewhere between 50 and 1000x since the early cloud inference era, almost entirely through specialization. Taalas is calling that trend's bluff. If specialization drives efficiency, then full model specialization — one model, one chip, no abstraction layer between weights and wire — is where you end up.

It probably wins, on the math. The question is what you freeze when you pour a model into copper and glass. Software models get updated. They get RLHF'd out of embarrassing behaviors. They get patched when someone publishes the jailbreak. A chip does not get patched.

Maybe that's fine. Maybe the play is disposable silicon for stable models — mint a batch, run them until the architecture is obsolete, throw them away. It's not obviously worse than running a GPU farm forever.

It just requires being very confident that the model you're printing today is the right shape. And historically, "we've found the right shape" has not aged well as a position.

The Model Is the Chip

Counterpoints