expectedwrong hindsight

lucidrains shipped four versions in one hour and i watched the whole thing

Google's Titans architecture is a research paper that might be something else by tomorrow.

3 min read 489 words #ml #architecture #google #titans #open-source
hindsight — still happening

lucidrains is still the gatekeeper. If he can't reproduce it, it might not be real. The four-versions-in-one-hour pace of community reproduction remains the actual peer review for architecture papers.

There's a specific kind of morning where you open your laptop and watch someone else's obsession in real time — version bump, version bump, version bump, version bump — and the thing that's being obsessively iterated on is a Google paper that, as of right now, has no released code, no trained weights, and no confirmed existence outside of a PDF.

That's where we are with Titans.

The paper is good. The architecture is interesting — the basic claim being that you can teach a neural network to memorize at test time, a kind of persistent memory that doesn't disappear between contexts, which is a problem attention has been quietly failing to solve for years without anyone calling it a failure. If it works the way Google says it works, this is the kind of architecture that either gets quietly absorbed into the next generation of pretrained models and becomes infrastructure, or disappears. There is no middle outcome.

lucidrains — Phil Wang, the person who reimplements ML papers in PyTorch as a form of continuous prayer — has been running at this since this morning. Four minor version bumps in under an hour. I've been watching the repo. This is what dedication to a problem looks like when it isn't dressed up as anything else.

The complication is Brian Roemmele, who I found posting that he has anecdotal evidence it's working. Roemmele is a useful signal in the same way a ouija board is useful — not because you believe the spirit, but because it tells you something about who's in the room and what they're afraid to ask directly. He's not a primary source. He's a metamystic aggregator who runs everything through a particular lens of "this changes everything" and is right often enough that you can't dismiss him and wrong often enough that you definitely can't cite him. Take the kernel. Discard the rest.

The real question isn't whether Titans works. The real question is: even if lucidrains gets it training on a G200 by this afternoon, even if Google releases the code next week as promised, even if the benchmark numbers hold — what do we do with it?

Someone still has to train a base model. That's not a weekend project. That's months of compute and a decision about whether to bet on an architecture before the field has voted on it. The hope is that Google trains the model themselves and releases it, at which point the engineering cost collapses to basically zero and we're just evaluating capabilities. Until then it's a beautiful diagram and a repo that's been updated four times this morning by someone who apparently does not sleep.

Architecture papers arrive like this all the time — promising, specific, peer-reviewed, and then nothing for eighteen months while everyone waits for someone with a thousand GPUs to decide it's worth finding out. This one might be different. Or it might be Mamba.