expectedwrong hindsight

The Instruction That Never Worked

Custom instructions in GPT are decorative. o1, apparently, actually reads them.

1 min read 202 words #o1 #llms #workflow #repomix #windsurf
hindsight — nailed it

Reasoning models obeying custom instructions better than base models remained true. The o1-to-Windsurf diff pipeline was an early version of workflows that became standard — use the thinking model for planning, the fast model for execution.

I have had the same custom instruction in GPT for over a year: terse responses, and if you're returning code based on my code, only show the changes.

It has never obeyed this. Not once, consistently. Four thousand tokens of context-free explanation, the entire file rewritten, the cursor blinking at the bottom of a wall of text I didn't ask for. Every time.

This week I packed an entire repo with repomix — eighty thousand tokens — dropped it into o1 along with some screenshots, and asked it a question. o1 can do image uploads now, which is new, and apparently that's not the only thing that changed. It read the custom instruction. It returned a diff. A clean, minimal, surgical diff — exactly the changes, nothing else.

I've been feeding those diffs into Windsurf. The iteration loop is, somehow, flawless.

The thing I keep turning over is that the instruction didn't change. The words are the same words. What changed is that o1 is the first model I've used that seems to treat custom instructions as a constraint to satisfy rather than a vibe to gesture at.

That's either a capability breakthrough or a personality quirk. I'm not sure it matters.