expectedwrong hindsight

There Is No Context Window

Google's Infini-attention paper doesn't extend the context window — it dissolves it.

2 min read 243 words #ai #llm #transformers #google #attention
hindsight — half right

context windows continued expanding but didn't go truly infinite. gemini hit 2M tokens with standard attention. infini-attention influenced research but the brute-force approach — just make the window bigger — kept winning over clever memory tricks.

Google dropped a paper today called "Leave No Context Behind" and the move is that context windows aren't getting bigger — they're going away.

The trick is called Infini-attention. Instead of attending to everything (quadratic memory, eventually impossible) or a sliding window (cheap, but you forget things), you maintain a fixed-size memory matrix that gets updated chunk by chunk. The matrix never grows. The sequence can be infinite. A million tokens in the demo. Constant memory overhead — O(1) with respect to sequence length, which is either a modest technical achievement or a categorical shift in how LLMs work, depending on how you want to feel about it.

The architecture change is minimal. Drop-in replacement for standard attention, no surgery on existing models. They took an 8B model, fine-tuned it for 5k steps, and it summarized entire books — not chunks of books stitched together, books — and beat the prior state of the art on ROUGE scores.

I've been watching the context window arms race — 4k, 8k, 32k, 100k, 1M — and the whole frame was wrong. The race was never toward a bigger window. It was always toward no window at all.

The context window is a constraint imposed by a particular architectural choice. When you change the architecture, the constraint disappears. This isn't a surprise. The only surprise is that it took this long, and that we spent so much energy celebrating increasingly large numbers that were still finite.