expectedwrong hindsight

Gemini Ate My Homework

The context window won, and I spent all morning losing to it.

2 min read 397 words #ai #cloudflare #gemini #agents #llm
hindsight — evolved

Gemini did eat the homework that day. Then Claude ate it back. The leapfrogging hasn't stopped — turns out "who's the best coder" changes quarterly and the real answer is "whichever one has seen your stack before."

I've been hearing the "Gemini 2.5 Pro is the top coder now" thing for a couple weeks and mostly ignored it, because I've tried it plenty and it hadn't beaten Claude in anything I cared about.

Today it beat Claude so badly I kind of just sat there.

The problem: Cloudflare Agents SDK, which came out approximately last week, which means the documentation is whatever Cloudflare has managed to post so far — which is not nothing, but is not enough. The specific sub-problem was persisting an authenticated connection between a user and their agent inside a WebSocket Durable Object. So the agent knows who it's talking to. So Sheldon knows you. Sounds simple. Is not simple.

This is a multilayer stack — Workers, Durable Objects, the Agents SDK, WebSockets, auth tokens that live in the request context which is ephemeral and doesn't survive into the agent's session — and I spent the morning pouring through source code and half-documented internals trying to figure out how you get the auth token to live somewhere it can actually be used.

Claude wrestled with me. o1 Pro also wrestled with me. We had a good morning of wrestling.

Then I fed Gemini Cloudflare's full custom Workers prompt — the one they publish at /workers/prompt.txt, which is enormous — and the entire Agents SDK repo. Gemini took one look at all of that and apparently decided the hard work I'd been doing was decorative.

Three tries. Three. The third one it said, verbatim: "This version has a high chance of being fully functional to the core chat flow."

It was right.

The context window is doing real work here. Gemini can hold the entire SDK plus Cloudflare's own mental model of their platform simultaneously, and the answer just falls out. Claude can reason its way toward things — it's excellent at reasoning — but it's guessing at documentation it can't fully hold. That's not a knock on Claude. That's a structural fact about a model trying to work with a library that didn't exist when it was trained.

The thing I keep thinking about is how fast this moved. Cloudflare launched Agents SDK, Gemini has the context to eat the whole thing whole, and now there's a working authenticated agent chat before Cloudflare even has a chance to write the tutorial.

We're not waiting for documentation anymore. We're racing it.