expectedwrong hindsight

You Were Never Supposed to Be Doing That

Microsoft's AICI ships the layer between prompts and tokens that we've been duct-taping with pleading and threats.

2 min read 320 words #llms #microsoft #inference #agents #prompt-engineering
hindsight — evolved

constrained generation went mainstream through different mechanisms — structured outputs, JSON mode, tool use — rather than WASM controllers in the inference loop. the problem was correctly identified. the 'hostage negotiation' framing was funny and accurate. it got solved at a higher abstraction level.

There is a whole genre of prompt engineering that is just negotiating with a god. You write three paragraphs — firmly but politely — explaining that you do not need the model to add commentary, that the JSON must be valid, that it should not apologize for the length, that it should not truncate, that it should definitely not add those little backtick fences unless asked, and finally, if none of that lands, you append a tip.

This is not software development. This is hostage negotiation with a very polite hostage.

Microsoft shipped AICI — Artificial Intelligence Controller Interface — which is a project that lets you write WebAssembly controllers that sit inside the inference loop and constrain token generation in real time. Not as a post-processing step. Not as a retry wrapper. At the token level, as the model is generating. You write the requirement in code and the controller enforces it structurally, before the output exists.

The immediate obvious thing is JSON. You stop asking the model to please output valid JSON and you write a controller that makes it impossible to output anything else — not because the model has been threatened into compliance but because the invalid tokens never get sampled. The model didn't change. The output did.

The less obvious thing is that this works across models. Same controller, different model. You stop caring about which model's particular prompt dialect gets it to behave, because you've moved the constraint out of the prompt entirely.

The AutoGen connection writes itself. You have agents that need structured handoffs between them — specific schemas, reliable formats, no hallucinated fields — and right now you're stitching that together with output parsers and retry logic and hope. AICI is the layer those agents were always supposed to be running on.

Anyway. I got Gemini 1.5 Pro access this morning, which means I now have a fresh new model to beg.