One Million Tokens and the Death of the Filing Cabinet

Google shipped a context window so large the interesting question isn't whether it works — it's what it means that it does.

Google published documentation this week on Gemini 1.5 Pro's long context capabilities, and the numbers are the kind that require a moment of just sitting with them. One million tokens. As a reference point they helpfully offer: that's roughly the entire text of Shakespeare's collected works, several times over. It's eleven hours of audio. It's an hour of video. It's a codebase so large that reading it yourself would constitute a significant life decision.

The documentation is calm about this. Here is how you use it. Here are some examples. Here is the API call.

The part that I keep coming back to is the needle-in-a-haystack evaluation, where they buried a specific fact deep inside a million-token corpus and asked the model to retrieve it. It found the needle. Reliably. Not sometimes — reliably, regardless of where in the context the fact was hidden.

Which means the thing that made context windows interesting — the constraint, the forcing function, the reason RAG pipelines and chunking strategies and embedding databases exist as entire product categories — is quietly being dismantled.

The whole retrieval augmented generation stack was built on a simple premise: models can't hold much in their heads, so you build a filing cabinet next to them and teach them to look things up. It's clever. It works. Entire companies are funded on the premise that this is just how it goes. And now Google is publishing documentation explaining how to hand the model the entire filing cabinet at once, and the model is saying thank you, got it, here's your answer.

This is not a prediction about RAG dying — there are obvious cases where a million tokens doesn't cover your data, obvious cost arguments, obvious latency arguments. It will not die quietly and it will not die soon.

But the threat model has shifted. You used to build retrieval systems because you had no choice. Now you build them because the alternative is expensive, or slow, or your context is larger than a million tokens. That's a different conversation than the one the industry has been having.

The documentation doesn't mention any of this. It just explains the feature. Five stars.

One Million Tokens and the Death of the Filing Cabinet

Counterpoints