They Trained a World Model on LEGO Footage and Made a Game With It
1000 hours of plastic bricks is apparently enough to teach a model physics, spatial reasoning, and the general vibe of existence.
World models trained on narrow domains kept showing up. The LEGO version was cute but the pattern — learn physics from video, generate playable environments — became a real research thread. Still no general-purpose version shipping.
The quality is bad. Calling it average would be too generous, and calling it average is already too generous — it's the kind of output that reminds you what "pretrained on a thousand hours of LEGO footage" actually means when rendered to screen.
But that's not the point.
The point is that a thousand hours of LEGO footage was enough. Enough to bootstrap a model that understands something about how a rigid plastic world moves and responds and stays coherent across frames — coherent enough to wire into a game, which they did, and which you can play.
A thousand hours sounds like a lot until you remember that most serious pretraining runs are measured in the equivalent of tens of thousands of years of human attention. This is a long weekend at a LEGO convention by comparison.
The thing that keeps getting weirder the longer you think about it: LEGO footage is, in some sense, a perfect training substrate. Everything is discrete. Colors are saturated and unambiguous. Physics is simplified — bricks stack, bricks fall, bricks don't deform. There's no hair, no subsurface scattering, no ambient occlusion drama. If you were designing a synthetic world specifically to teach a model what "object permanence" means while minimizing confounds, you'd probably end up somewhere near a LEGO table.
Nobody designed it that way. It just turns out that forty years of parents filming their kids on Christmas morning is a surprisingly useful dataset.
The game they built with it isn't impressive in any conventional sense. But the bar here isn't "impressive game." The bar is "does it demonstrate that a model learned something real about a world from video alone, with no labels, no physics engine, no handcrafted reward function." And the answer is apparently yes, if the world is made of plastic and the footage is a thousand hours long.
That's either very encouraging or very funny, depending on your priors about where this ends up.
Counterpoints
Push back, extend the argument, or sharpen it. New counterpoints go through review before they show up here.
No approved counterpoints yet.