expectedwrong hindsight

12GB Won't Fit in 8GB

The arithmetic of running image diffusion models on a phone is not complicated, and yet.

2 min read 238 words #diffusion #on-device-ai #apple #mobile-ml #core-ml
hindsight — nailed it

The math was right. 12GB does not fit in 8GB. Apple shipped on-device models via quantization, exactly as described — 4-bit weights, Core ML, the workaround that makes the impossible merely annoying.

No.

The iPhone 15 Pro Max — the most RAM Apple has ever shipped in a phone — has 8GB. A 12GB model requires 12GB. This is not a optimization problem. This is a numbers problem.

The thing that makes people confused about this is unified memory. Apple's marketing around the M-series chips — and by extension the A-series — made "unified memory" sound like a superpower, and in some ways it is. The GPU and CPU share the same pool instead of copying tensors back and forth across a PCIe bus. That matters for inference speed. It does not make 8 fit into 12.

What you can do is quantize the model down to 4-bit weights, which for something like SDXL gets you from ~7GB to maybe 3-4GB — suddenly very plausible on a Pro device. Core ML handles this reasonably well. There are apps shipping SDXL-derived models on device right now, running in 10-30 seconds per image, looking fine.

The 12GB figure almost certainly refers to fp16 weights of something like SDXL 1.0 or one of the fatter fine-tunes. That's not the thing you run on a phone. That's the thing you run on a Mac Studio while you go get coffee.

The more interesting question underneath the question is: what's the actual minimum quality threshold you'd accept from an on-device image model? Because that threshold is already met. It's just not 12GB.

It's about 2.