There Is No Context Window
Google's Infini-attention paper doesn't extend the context window — it dissolves it.
context windows continued expanding but didn't go truly infinite. gemini hit 2M tokens with standard attention. infini-attention influenced research but the brute-force approach — just make the window bigger — kept winning over clever memory tricks.
Google dropped a paper today called "Leave No Context Behind" and the move is that context windows aren't getting bigger — they're going away.
The trick is called Infini-attention. Instead of attending to everything (quadratic memory, eventually impossible) or a sliding window (cheap, but you forget things), you maintain a fixed-size memory matrix that gets updated chunk by chunk. The matrix never grows. The sequence can be infinite. A million tokens in the demo. Constant memory overhead — O(1) with respect to sequence length, which is either a modest technical achievement or a categorical shift in how LLMs work, depending on how you want to feel about it.
The architecture change is minimal. Drop-in replacement for standard attention, no surgery on existing models. They took an 8B model, fine-tuned it for 5k steps, and it summarized entire books — not chunks of books stitched together, books — and beat the prior state of the art on ROUGE scores.
I've been watching the context window arms race — 4k, 8k, 32k, 100k, 1M — and the whole frame was wrong. The race was never toward a bigger window. It was always toward no window at all.
The context window is a constraint imposed by a particular architectural choice. When you change the architecture, the constraint disappears. This isn't a surprise. The only surprise is that it took this long, and that we spent so much energy celebrating increasingly large numbers that were still finite.
Counterpoints
Push back, extend the argument, or sharpen it. New counterpoints go through review before they show up here.
No approved counterpoints yet.