expectedwrong hindsight

The Thirty-Two Times

Binary embeddings give you back 32x your memory and 40x your speed, and the interesting question is how fast you lose it.

3 min read 466 words #embeddings #vector-search #efficiency #ai-infrastructure #jevons
hindsight — nailed it

the efficiency consumption pattern held perfectly. every optimization — quantization, distillation, architectural improvements — freed up resources that were immediately consumed by more ambitious applications. the 32x never stayed saved.

Cohere dropped a binary vector database this week — 100 million embeddings, 10GB of memory, 140GB of disk, running on a VM that costs fifteen dollars a month. Binary embeddings cut your memory footprint by 32x and your search latency by 40x while keeping 95% of retrieval quality. They open-sourced the whole thing.

The numbers are real. The demo is real. The $15/month VM holding 100 million embeddings is real and you should go stare at it for a minute.

But that's not the interesting part.

The interesting part is what happens next. Not to the project, not to the company — to the space you just got back.

There's a pattern here that's obvious once you see it and then you can't stop seeing it. Every efficiency gain in this field — every 32x, every 40x, every "we cut costs by 90%" — gets eaten. Not slowly. Immediately. The space doesn't sit empty while you admire it. Something bigger is already forming in the gap, shaped exactly to fill it.

An extra second of inference time doesn't stay an extra second. It becomes a longer context window, a chain of tool calls, a second model pass. Two point eight terabytes of freed disk doesn't become backup storage. It becomes the new baseline for whatever problem was previously considered too expensive to store.

The savings never accumulate. They're a currency that spends itself.

So the real question — the one worth actually sitting with — isn't "how much did we save" but "how fast does it get filled with the bigger thing." That rate is the number. That rate is the one that tells you something true about where this is going.

And it's going fast. Every one of these gains is getting absorbed faster than the last one, which means the things filling the gap are getting correspondingly larger and stranger.

Where does it end. What's the asymptote.

I have a firm belief — strongly held, not even slightly serious, and also completely serious — that the correct optimization target is delivering results to customers in 10⁻⁴³ seconds and keeping them at a distance of approximately 1.616 × 10⁻³⁵ meters.

The Planck time. The Planck length.

At that point you're not really doing software anymore. You're doing something that quantum research will have extremely professional concerns about.

But the logic holds. If every efficiency gain gets immediately absorbed by a bigger problem, you follow that chain to its end. The end is Planck scale. We're not there yet. We're somewhere in the early middle, watching a $15 VM hold a hundred million vectors and calling it remarkable, which it is, briefly, until the thing that fills that space makes it look quaint.

The remarkable part isn't the 32x. It's that 32x is already behind us.