{"version":"v1","site":{"name":"expectedwrong","url":"https://expectedwrong.com"},"links":{"collection":"https://expectedwrong.com/api/public/posts","rss":"https://expectedwrong.com/rss.xml","llms":"https://expectedwrong.com/llms.txt"},"post":{"slug":"there-is-no-context-window","title":"There Is No Context Window","subtitle":"Google's Infini-attention paper doesn't extend the context window — it dissolves it.","url":"https://expectedwrong.com/there-is-no-context-window","api_url":"https://expectedwrong.com/api/public/posts/there-is-no-context-window","published_at":1712923200,"published_at_iso":"2024-04-12T12:00:00.000Z","updated_at":1771540216,"updated_at_iso":"2026-02-19T22:30:16.000Z","tags":["ai","llm","transformers","google","attention"],"excerpt":"Google's Infini-attention paper doesn't extend the context window — it dissolves it.","meta_description":"Google's Infini-attention paper doesn't extend the context window — it dissolves it.","reading_time_minutes":2,"word_count":243,"engagement":{"signals":0,"counterpoints":0},"body_markdown":"Google dropped a paper today called \"Leave No Context Behind\" and the move is that context windows aren't getting bigger — they're going away.\n\nThe trick is called Infini-attention. Instead of attending to everything (quadratic memory, eventually impossible) or a sliding window (cheap, but you forget things), you maintain a fixed-size memory matrix that gets updated chunk by chunk. The matrix never grows. The sequence can be infinite. A million tokens in the demo. Constant memory overhead — O(1) with respect to sequence length, which is either a modest technical achievement or a categorical shift in how LLMs work, depending on how you want to feel about it.\n\nThe architecture change is minimal. Drop-in replacement for standard attention, no surgery on existing models. They took an 8B model, fine-tuned it for 5k steps, and it summarized entire books — not chunks of books stitched together, books — and beat the prior state of the art on ROUGE scores.\n\nI've been watching the context window arms race — 4k, 8k, 32k, 100k, 1M — and the whole frame was wrong. The race was never toward a bigger window. It was always toward no window at all.\n\nThe context window is a constraint imposed by a particular architectural choice. When you change the architecture, the constraint disappears. This isn't a surprise. The only surprise is that it took this long, and that we spent so much energy celebrating increasingly large numbers that were still finite.","body_text":"Google dropped a paper today called \"Leave No Context Behind\" and the move is that context windows aren't getting bigger — they're going away. The trick is called Infini-attention. Instead of attending to everything (quadratic memory, eventually impossible) or a sliding window (cheap, but you forget things), you maintain a fixed-size memory matrix that gets updated chunk by chunk. The matrix never grows. The sequence can be infinite. A million tokens in the demo. Constant memory overhead — O(1) with respect to sequence length, which is either a modest technical achievement or a categorical shift in how LLMs work, depending on how you want to feel about it. The architecture change is minimal. Drop-in replacement for standard attention, no surgery on existing models. They took an 8B model, fine-tuned it for 5k steps, and it summarized entire books — not chunks of books stitched together, books — and beat the prior state of the art on ROUGE scores. I've been watching the context window arms race — 4k, 8k, 32k, 100k, 1M — and the whole frame was wrong. The race was never toward a bigger window. It was always toward no window at all. The context window is a constraint imposed by a particular architectural choice. When you change the architecture, the constraint disappears. This isn't a surprise. The only surprise is that it took this long, and that we spent so much energy celebrating increasingly large numbers that were still finite.","hindsight":{"verdict":"partially_right","note":"context windows continued expanding but didn't go truly infinite. gemini hit 2M tokens with standard attention. infini-attention influenced research but the brute-force approach — just make the window bigger — kept winning over clever memory tricks.","links":[],"at":1739980800,"at_iso":"2025-02-19T16:00:00.000Z"}}}