{"version":"v1","site":{"name":"expectedwrong","url":"https://expectedwrong.com"},"links":{"collection":"https://expectedwrong.com/api/public/posts","rss":"https://expectedwrong.com/rss.xml","llms":"https://expectedwrong.com/llms.txt"},"post":{"slug":"12gb-wont-fit-in-8gb","title":"12GB Won't Fit in 8GB","subtitle":"The arithmetic of running image diffusion models on a phone is not complicated, and yet.","url":"https://expectedwrong.com/12gb-wont-fit-in-8gb","api_url":"https://expectedwrong.com/api/public/posts/12gb-wont-fit-in-8gb","published_at":1723032000,"published_at_iso":"2024-08-07T12:00:00.000Z","updated_at":1771545890,"updated_at_iso":"2026-02-20T00:04:50.000Z","tags":["diffusion","on-device-ai","apple","mobile-ml","core-ml"],"excerpt":"The arithmetic of running image diffusion models on a phone is not complicated, and yet.","meta_description":"The arithmetic of running image diffusion models on a phone is not complicated, and yet.","reading_time_minutes":2,"word_count":238,"engagement":{"signals":0,"counterpoints":0},"body_markdown":"No.\n\nThe iPhone 15 Pro Max — the most RAM Apple has ever shipped in a phone — has 8GB. A 12GB model requires 12GB. This is not a optimization problem. This is a numbers problem.\n\nThe thing that makes people confused about this is unified memory. Apple's marketing around the M-series chips — and by extension the A-series — made \"unified memory\" sound like a superpower, and in some ways it is. The GPU and CPU share the same pool instead of copying tensors back and forth across a PCIe bus. That matters for inference speed. It does not make 8 fit into 12.\n\nWhat you *can* do is quantize the model down to 4-bit weights, which for something like SDXL gets you from ~7GB to maybe 3-4GB — suddenly very plausible on a Pro device. Core ML handles this reasonably well. There are apps shipping SDXL-derived models on device right now, running in 10-30 seconds per image, looking fine.\n\nThe 12GB figure almost certainly refers to fp16 weights of something like SDXL 1.0 or one of the fatter fine-tunes. That's not the thing you run on a phone. That's the thing you run on a Mac Studio while you go get coffee.\n\nThe more interesting question underneath the question is: what's the *actual* minimum quality threshold you'd accept from an on-device image model? Because that threshold is already met. It's just not 12GB.\n\nIt's about 2.\n","body_text":"No. The iPhone 15 Pro Max — the most RAM Apple has ever shipped in a phone — has 8GB. A 12GB model requires 12GB. This is not a optimization problem. This is a numbers problem. The thing that makes people confused about this is unified memory. Apple's marketing around the M-series chips — and by extension the A-series — made \"unified memory\" sound like a superpower, and in some ways it is. The GPU and CPU share the same pool instead of copying tensors back and forth across a PCIe bus. That matters for inference speed. It does not make 8 fit into 12. What you can do is quantize the model down to 4-bit weights, which for something like SDXL gets you from 7GB to maybe 3-4GB — suddenly very plausible on a Pro device. Core ML handles this reasonably well. There are apps shipping SDXL-derived models on device right now, running in 10-30 seconds per image, looking fine. The 12GB figure almost certainly refers to fp16 weights of something like SDXL 1.0 or one of the fatter fine-tunes. That's not the thing you run on a phone. That's the thing you run on a Mac Studio while you go get coffee. The more interesting question underneath the question is: what's the actual minimum quality threshold you'd accept from an on-device image model? Because that threshold is already met. It's just not 12GB. It's about 2.","hindsight":{"verdict":"right","note":"The math was right. 12GB does not fit in 8GB. Apple shipped on-device models via quantization, exactly as described — 4-bit weights, Core ML, the workaround that makes the impossible merely annoying.","links":[],"at":1739980800,"at_iso":"2025-02-19T16:00:00.000Z"}}}