expectedwrong hindsight

Study the Change

The correct way to find the optimization critical path, and why you probably already know it

2 min read 256 words #optimization #transformers #profiling #ml-engineering
hindsight — still happening

the optimization principle — study the delta, not the system — remains sound engineering advice. the specifics about transformer profiling were correct but the meta-insight about focusing on change rather than state applies to everything in AI.

There is a specific relief that comes from watching someone else explain, clearly and methodically, the thing you've been doing mostly on instinct.

The framing is simple enough to be embarrassing: when you want to find where to spend your optimization budget, don't study the system — study what changes. Not the weights. Not the architecture diagram. The delta. Where does compute actually move between the thing you have and the thing you want? That's where the work is.

Applied to transformer encoder-decoder, this gets concrete fast. You're not profiling the whole forward pass like some kind of sadist. You're drawing a line between the two states — the slow thing and the fast thing, the old path and the new path — and asking what had to change to get from one to the other. Everything that didn't have to change is not your problem.

The uncomfortable part is that "study the change" sounds obvious until you watch how people actually optimize, which tends to involve profiling everything, getting overwhelmed, and then attacking whatever looks suspicious. Which is not a strategy, it's anxiety with a flamegraph.

The presentation uses enc-dec as the vehicle for the methodology rather than the point of it — which is the right call. The point generalizes. Transformer or not, if you can't articulate exactly what changed and what had to change, you're not optimizing, you're guessing at a higher temperature.

Retroactively validating to find out we've been doing this. Not comforting enough to stop worrying, but enough to keep going.