The Physics Engine Was Always Optional
AlphaFold 3 uses diffusion, which means the same trick that makes fake videos of cats look real also models how atoms fit together.
alphafold 3 was a genuine breakthrough. the diffusion model approach for molecular structure proved the thesis — the physics engine was always optional. the same architecture that generates images can generate valid protein folds. that generalization is still sinking in.
When OpenAI was explaining Sora, they called it a world simulator — the idea being that the model didn't learn a set of physics rules, it just learned what the world looks like, and that turned out to be enough. The physics fell out of the statistics.
AlphaFold 3 is that, except instead of video frames it's atoms.
Isomorphic Labs dropped it today and the architecture shift is the thing. AlphaFold 2 was a transformer doing clever geometry — it worked, obviously, it was astonishing, it won a Nobel Prize in all but name. But AlphaFold 3 is a diffusion model. The same family of models that generates your AI art. You give it noise and it learns to denoise toward something real — except here "real" means a valid molecular structure, a protein folded correctly, a small molecule docked against a binding site.
The part that gets me is the physics engine point. Classical molecular simulation works by writing down rules — this atom repels that one, this bond has this stiffness, this angle wants to be this many degrees. Decades of painstaking physical chemistry encoded into force fields that still get things wrong in ways that embarrass people at conferences.
The diffusion model doesn't know any of that. It just learned the distribution of structures that actually exist. And apparently that's the same thing, or close enough that the distinction stops mattering for most purposes.
There's something almost insulting about it — to all the physicists and chemists who spent careers codifying the rules. The rules were never the point. The point was the answer, and the answer was hiding in the data the whole time.
AlphaFold 2 could do proteins. AlphaFold 3 does proteins, DNA, RNA, small molecules, and — crucially — the interactions between all of them. The whole messy zoo of molecules that actually constitutes a living cell, or a drug binding to a target. One model, one diffusion process, no handcrafted force field required.
The Sora comparison is real and it's not just vibes. Both models learned to hallucinate forward in a space — one spatial, one molecular — by training on what forward looks like. The world-simulator framing was always a little grandiose when applied to video. Applied to protein-ligand interactions it's just a description of what the thing does.
We built physics engines because we didn't have enough data or compute to do it any other way. Now we do, and it turns out the engine was the scaffolding, not the building.
Counterpoints
Push back, extend the argument, or sharpen it. New counterpoints go through review before they show up here.
No approved counterpoints yet.