One Million Tokens and the Paper They Published the Day Before

Google announced Gemini 1.5 Pro with a 1M token context window the same week a paper — possibly theirs — explained why transformers can't do that.

Google dropped Gemini 1.5 Pro yesterday. One million tokens in production. 99% accuracy on Needle in a Haystack across the full window. They added a timer to the demo this time — so you can watch them prove it in real time — because they know we've been burned before and they know we remember.

The number is genuinely absurd. A few months ago 128k was a headline. Now that's a rounding error.

But here's what's been living in my head since last night: they published a paper the day before. About transformers and length generalization. About how length generalization is "fragile, significantly influenced by factors like random weight initialization and training data order, leading to large variances across different random seeds."

That's the technical phrasing for: transformers fall apart when you push them past what they've seen.

Paper: February 14th. Gemini 1.5 Pro: February 15th.

So either the 1M context window works exactly as described — and the architecture is, functionally speaking, something other than a transformer — or the model has exactly the failure modes the paper documents and they buried the asterisk somewhere in the fine print. Neither option is boring. One of them is the story of the decade.

The post-transformer theory isn't as unhinged as it sounds. Google has published quietly on state space models, hybrid architectures, things that look like transformers from the outside but aren't. If you wanted prior work to point to — the kind that explains how you cleared a ceiling everyone agrees is real — you might publish that paper on Valentine's Day and hope nobody connects the dates.

Or I'm connecting dots that aren't there. Also possible.

What I'm more confident about: this is solid, and it's also just one thing in a long list of things that will all be necessary and will all disappear into the technical fabric within the next 24 months. We'll call an API. We won't think about it. The token count will just be a number in a config file somewhere, like RAM was once a headline and is now a line in a spec sheet.

Meanwhile OpenAI, running on schedule with the precision of a company that monitors Google's press calendar, dropped their thing this afternoon. Not 1M tokens. One minute of video. Sora. Photorealistic, temporally coherent, text-to-video. One minute long.

One minute is insane.

We've been waiting for this — the Hollywood Engine moment, where coherent video at usable length stops being a research flex and starts being infrastructure. It's apparently now. Both companies apparently agreed this was the week.

The density of this is the thing. Not that any single announcement is magic — each of these will be absorbed, will become obvious, will become the boring baseline that the next thing has to beat — but that the gap between "research demo" and "production API" has collapsed to approximately the time it takes to schedule a press release.

One million tokens. One minute of video. One day apart.

Tuesday in February 2024.

One Million Tokens and the Paper They Published the Day Before

Counterpoints