expectedwrong hindsight

Anthropic Copied the Batch API and They Were Right To

The unsexy infrastructure move that unlocks the most boring and useful AI workloads.

2 min read 338 words #ai #anthropic #infrastructure #batch-processing #llm-ops
hindsight — nailed it

Batch APIs became table stakes. The use cases described — contact deduplication, bulk classification — turned out to be exactly what enterprises needed. Copying the design was the right call.

Anthropic shipped a Message Batches API today. Same design as OpenAI's — you fire off a pile of requests asynchronously, get results back later, pay less. The blog post uses words like "large-scale" and "asynchronous processing." Nothing surprising. The product is correct.

The use cases nobody talks about are the ones that matter here.

Contact overlap. You have two lists — your CRM, your partner's CRM, some enrichment vendor's export — and you want to know which contacts are actually the same person. "John Smith, jsmith@acme.com" versus "Jonathan Smith, j.smith@acmecorp.com" — same guy, different email domain, different name variant, same company that got rebranded at some point. A fuzzy string match won't save you. A model will. And you have forty thousand of these pairs to check.

You were not going to do that in real time.

The other one is column qualification — you have a spreadsheet or a database table with fifty thousand rows, each row is a company or a contact or a support ticket, and you need a model to read each one and tell you whether it qualifies for something. Upsell candidate. Enterprise tier. Churn risk. Whatever. You run this once a night, maybe once a week. You do not need the answer in 200 milliseconds. You need the answer before Monday.

Batch APIs exist for this. The math is obvious — you're paying for compute, not latency, and when you don't care about latency the provider can bin-pack your requests with everyone else's and charge you half. OpenAI figured this out. Now Anthropic has it too.

What's interesting is how many people still build the synchronous version of these pipelines — iterate over rows, hit the API, time.sleep(1), pray the rate limiter doesn't kill you — because the batch endpoint didn't exist or they didn't trust it. That's a real workflow that runs in production somewhere right now. Tonight, probably.

The batch API doesn't make you smarter about what to ask the model. It just removes the excuse for doing it wrong.