expectedwrong hindsight

Apple Told You Everything and Nobody Wrote It Down

MLX is not a developer tool. It's a strategy document with a compiler.

3 min read 622 words #apple #mlx #machine-learning #apple-silicon #strategy
hindsight — nailed it

MLX became the standard for local inference on apple silicon. the unified memory architecture proved to be apple's genuine advantage for running large models locally. M4 Ultra with 512GB unified RAM delivers on exactly this premise.

Apple released MLX in December and everyone filed it under "nice open-source contribution, thoughtful of them" and moved on, which is either a failure of imagination or proof that the best place to hide a thing is the place where people immediately categorize it and stop thinking.

MLX is a machine learning framework designed from the ground up around unified memory. Not as a feature. As the premise. The CPU and GPU share the same pool — no copying, no bus bottleneck, no discrete VRAM cliff at 24GB where your model falls off and you're back to cloud. The architecture assumes the memory is just there, flat, accessible.

This is interesting on a MacBook. It becomes something else entirely at 512GB.

Apple is shipping M4 chips with 512GB unified RAM later this year. Not 512GB for some theoretical future workload — 512GB you can hand directly to a model, today, through MLX, on hardware that fits in a rack or on a desk or apparently in a Mac Studio that costs less than a single A100. The largest models that exist right now fit in that. Not "fit with quantization tricks and prayer." Fit.

Now do the other thing, the thing the notes trail off into with those dashes: count the Apple Silicon devices in the world. Every M1, M2, M3 MacBook. Every Mac Mini. Every iPad Pro. Every iPhone 15 Pro with its Neural Engine. Each one is running MLX-compatible hardware. Each one has unified memory. The aggregate RAM across Apple's installed base is a number that should make someone at NVIDIA feel briefly ill.

Nobody is federating them. Yet. That is not an announcement I have seen. But the infrastructure for it — the framework, the memory model, the fact that MLX is open-source and the community is already porting everything to it — that's not an accident. You don't build a memory-unified ML framework, open-source it, put it in front of a developer audience, and then stop.

The strategy is completely public. It's in the GitHub repo. It's in the architecture decisions. It's in the fact that Awni Hannun — who co-invented CTC, who built speech recognition at Baidu before Baidu was doing that — is working on this at Apple. You don't hire that person to write a developer productivity tool.

Apple has been building toward on-device AI since the Neural Engine showed up in the A11 in 2017. Every generation since then has been additive. MLX is the framework that makes it programmable. The 512GB ceiling is what makes it serious. And the installed base is what makes it, if someone connects the dots correctly, potentially the largest distributed inference cluster on the planet that no one had to buy.

The thing that gets me is how little noise this is making relative to what it appears to be. Everyone is watching the GPU wars. NVIDIA prints money, AMD is trying, everyone is waiting for the next H100 successor. And Apple is over here quietly shipping unified memory to hundreds of millions of devices and writing the framework that lets you use all of it.

Maybe it goes nowhere. Maybe Apple never federates any of this, never makes the device-to-device story coherent, and MLX stays a "run LLaMA locally" tool for people who don't want to pay for API calls. That would be a waste of an unusually good hand.

But the direction only goes one way. The RAM keeps going up. The framework keeps maturing. The installed base keeps growing. At some point someone is going to add those numbers together and build something, and the pieces will already be there, already distributed, already owned by people who bought them thinking they were buying a laptop.