expectedwrong hindsight

The $20,000 Brain

Kimi K2 runs on two Mac Studios, costs less than a car, and will cost less than a phone before this is over.

3 min read 478 words #ai #local-models #kimi-k2 #hardware #open-source
hindsight — still happening

A trillion-parameter model running on $20,000 of Mac Studios. The gap between data center and desk continues closing. The story isn't over — it's just getting domestic.

Awni Hannun — the Apple MLX lead — got Kimi K2 running across two M3 Ultra Mac Studios this week. 1024GB of unified RAM between the machines, 512 each. The model is a trillion parameters, weighs 594GB, and it does 15 tokens per second.

That's it. That's the whole story, except it isn't.

The model is fully open source. Frontier-class by most accounts — a drop-in replacement for the models you're currently paying per token to access. The kind of thing that, six months ago, would have required a data center and a procurement department and three months of lead time to stand up.

Now it requires two Macs and an afternoon.

The math is funny in the way that only technology math is funny. At 15 tok/sec running flat-out, you get roughly 1.3 million tokens a day. That same throughput routed through Sonnet would run you about $30/day. The two machines together cost $20,000. So you're looking at roughly two years of continuous operation to break even — which sounds like a terrible investment until you think about what you actually bought.

You bought independence.

The solar flare scenario is a joke but it isn't entirely a joke. Frontier models go offline for maintenance, for policy changes, for reasons their operators don't explain. The API terms change. The pricing changes. The model gets updated in ways that break your evals. You have no recourse. You have a vendor and the vendor has you.

$20,000 buys you out of that relationship entirely. One time. No recurring costs, no rate limits, no someone else's acceptable use policy sitting between you and your inference.

The more serious case is enterprise privacy. Some customers genuinely cannot let their data leave their infrastructure — healthcare, legal, defense, finance — and until recently "frontier intelligence, fully on-prem" was an oxymoron. The on-prem options were fine, not great, noticeably behind the frontier. Now the gap is closed. Two Mac Studios in a server closet and you're running the same quality of reasoning as the cloud providers are selling at a markup.

But the number that actually matters isn't $20,000. It's the trajectory.

This capability — a trillion-parameter model, frontier performance — would have cost somewhere north of a million dollars to run privately a couple years ago. Today it's $20k. The line doesn't stop here. Before long it's $1,000. Then it's a high-end laptop. Then it's ambient, everywhere, assumed, the way compute became ambient and everywhere and assumed.

We're somewhere in the middle of that slide and the current price is $20,000, which is a lot for an individual and nothing for a business and already obsolete.

You can just use it on OpenRouter in the meantime. The point isn't that you need to buy two Mac Studios right now. The point is that someone did, and it worked, and the number keeps dropping.