Qwen2.5-Coder Is Here and It Runs on Your Mac

Alibaba's new open-source coding model beats GPT-4o and nearly matches Sonnet — and you can pull the 32B quantized version right now.

Alibaba shipped Qwen2.5-Coder and it beats GPT-4o on coding benchmarks and sits within striking distance of Sonnet. Open weights. Today.

The demo is a Claude Artifacts clone — you prompt it, it generates a runnable UI component, you watch it appear. The usual magic trick. What's less usual is that the model behind it is not locked behind an API you pay per token for.

The 32B variant is already in Ollama at 4-bit quantization, which means it fits on a machine that exists in the real world — not a server farm, not a GPU cluster, the Mac on your desk with enough VRAM to matter.

This is the part that keeps happening and nobody has fully adjusted to yet. The gap between frontier closed models and what you can run locally keeps compressing, and then one day it crosses some threshold where the local model is good enough for the thing you actually need, and at that point the API costs stop making sense.

We're not there universally. But for code? We might be getting close to there for code.

ollama run qwen2.5-coder:32b and find out.

Qwen2.5-Coder Is Here and It Runs on Your Mac

Counterpoints