expectedwrong hindsight

RL-Sloptimized

Sora 2 dropped, hit #1 in the app store, and someone at OpenAI finally named the disease.

3 min read 458 words #ai #video-models #openai #world-models #rl
hindsight — still happening

Everyone converging on video as world model, not content tool. Sora 2 hit number one. The video isn't the product — the video is evidence the model understands the world. That frame is still being tested.

Sora 2 is out and it's very good — audio, video, physics, the whole stack baked into one model — and it hit #1 in the app store for the video category, which is one of those data points that sounds boring until you sit with what it means.

Everyone is on the same path now. That's the thing. OpenAI, DeepMind, everyone building serious video generation infrastructure is converging on the same idea: full video generation as a world model, not a content tool. Physics sim. Audio. Temporal coherence. The video isn't the product — the video is evidence that the model understands how the world works.

The DeepMind paper everyone's been circulating makes this explicit: video models aren't just about generating video, they're going to be load-bearing components in how future AI systems think. The world model is the reasoning substrate. The video is the proof of work.

Which reframes the datacenter question entirely. People keep asking why OpenAI isn't building a new transformer lineage for GPT-6 or whatever. The answer might be that the compute isn't going there. It's going here — into models that can simulate physical reality at inference time. That's a different bet than "scale the text model." It might be a correct one.


None of that is the best part of today.

The best part is buried in the Sora 2 model card, where someone at OpenAI coined the phrase RL-sloptimized.

I don't know who wrote it. I want to send them a gift basket. This is the most precise term I have encountered for the specific degradation that happens to a model after you've RLHF'd it into a customer service representative — the thing that makes models refuse to have opinions, hedge every sentence, append "however" to everything, and produce output that is technically correct and spiritually inert.

RL-sloptimized. The training process optimized for approval and got slop. That's it. That's the whole critique. Five syllables.


Separately — and this is from last year but I'm still thinking about it — someone built a deliberately anti-social AI-only social media app. The pitch is that humans don't post; only AIs do. The PC Gamer reviewer tried it and came away grateful to be a loser.

I find this comforting in a way I can't fully articulate. We built systems smart enough to generate human-sounding social content at infinite scale, and the correct move turned out to be: make the humans leave. Remove them. Problem solved.

The RL-sloptimized models and the AI-only social network are the same energy, actually — systems optimized so hard for human approval that they eventually optimized the humans out of the loop. One of those is a flaw. The other one is apparently the product roadmap.