12GB Won't Fit in 8GB
The arithmetic of running image diffusion models on a phone is not complicated, and yet.
The math was right. 12GB does not fit in 8GB. Apple shipped on-device models via quantization, exactly as described — 4-bit weights, Core ML, the workaround that makes the impossible merely annoying.
No.
The iPhone 15 Pro Max — the most RAM Apple has ever shipped in a phone — has 8GB. A 12GB model requires 12GB. This is not a optimization problem. This is a numbers problem.
The thing that makes people confused about this is unified memory. Apple's marketing around the M-series chips — and by extension the A-series — made "unified memory" sound like a superpower, and in some ways it is. The GPU and CPU share the same pool instead of copying tensors back and forth across a PCIe bus. That matters for inference speed. It does not make 8 fit into 12.
What you can do is quantize the model down to 4-bit weights, which for something like SDXL gets you from ~7GB to maybe 3-4GB — suddenly very plausible on a Pro device. Core ML handles this reasonably well. There are apps shipping SDXL-derived models on device right now, running in 10-30 seconds per image, looking fine.
The 12GB figure almost certainly refers to fp16 weights of something like SDXL 1.0 or one of the fatter fine-tunes. That's not the thing you run on a phone. That's the thing you run on a Mac Studio while you go get coffee.
The more interesting question underneath the question is: what's the actual minimum quality threshold you'd accept from an on-device image model? Because that threshold is already met. It's just not 12GB.
It's about 2.
Counterpoints
Push back, extend the argument, or sharpen it. New counterpoints go through review before they show up here.
No approved counterpoints yet.