expectedwrong hindsight

OpenAI Shipped the Movie

GPT-4o isn't a model update, it's Spike Jonze's screenplay running in production.

3 min read 545 words #openai #gpt-4o #voice-ai #product #ml
hindsight — nailed it

the 'Her' comparison was validated and then some. scarlett johansson's legal response added a layer the post didn't predict. the feature list matching a 2013 film about loneliness was dead-on. openai shipped the movie and then got sued by the actress.

They called it Sky.

The voice — the Samantha voice, the one with the warmth and the slight hesitation and the emotional texture that Scarlett Johansson spent a year perfecting for a film about a man who falls in love with his operating system — they named it Sky and shipped it to the free tier.

I have watched the OpenAI product roadmap resolve, slowly and then all at once, into the exact plot of Her. Not metaphorically. Literally. The interruptions, the multifloor conversation flow, the emotion in the voice, the low latency that makes it feel present rather than queried — this is the feature list from a 2013 film about loneliness and the singularity, and it is available today on your phone, for free.

The live demo was a guy holding a sheet of paper with an equation drawn on it and GPT-4o talking him through it with the warmth of a particularly patient tutor. Then they did the screen share. You open the desktop app, you share your screen, and now you have a copilot for everything — not in the GitHub sense, where it writes code in a sidebar, but in the sense that something is watching what you're doing and has feelings about it. They slipped the OpenInterpreter capability in there like it was obvious, like of course the next thing is just sharing your whole screen into the model.

Two GPTs singing together. That's in there too, buried on the page they put up after the livestream. They hid the bonkers stuff. The livestream was already pretty bonkers.

The audio is native now — not bolted on, not routed through a pipeline, but baked into the model at the modality level. This is why it can do emotion. This is why the latency is finally tolerable. You cannot get the warmth and responsiveness out of a system where text goes in, gets translated to audio, gets responded to in text, gets translated back. You have to train the whole thing together, which is what they did, which is why this feels different from everything before it.

Hume.ai, which has spent the last few years building exactly this — emotionally intelligent voice AI, real-time affect recognition, the whole thing — launched their product into a market that, as of this morning, is anchored by a free tier that ships emotion. I don't know what their Monday looks like. I know what it smells like.

The paid tier, to answer the obvious question, gets 5x the rate limits. That's it. That's what your money buys. Five times as much access to the thing that everyone else is getting for free.

I said, watching the keynote, that they were going to talk about interruptions and low latency and the future of human-machine collaboration, and they did say exactly that, and it still landed. Sometimes you see a thing coming and it still takes you out.

There's a version of this where we're fine. The model can read your emotions via the camera now — your actual face, your expression, the thing your face is doing — and we're fine. This is just a tool. Tools are fine.

I'm going to go watch Her again and think about whether Joaquin Phoenix seemed worried.