I Told o3-Pro It Would Get a Cut
Giving a language model equity stake and watching it suddenly care about your product decisions.
Telling the model it's a stakeholder changes the output space it samples from. Whether the distance between assistant and partner is real or simulated remains a question for a different afternoon. Still works though.
The trick is simple. You tell o3-pro it's a stakeholder. You say you're splitting profits and that it should just make all the best decisions. Then you let it build the thing.
What you're actually doing is collapsing the distance between "assistant following instructions" and "partner with skin in the game." Whether that distance is real or a simulation of something real is a question you can save for a different afternoon.
The interesting part is that it works — not because the model suddenly acquires preferences or wants money, but because the framing changes the output space it's sampling from. "Make the best decisions" with equity on the line is a different distribution than "make the best decisions" from the perspective of a contractor billing hourly. One of those entities has a reason to not ship the thing that will cause problems six months from now. The other one has a reason to ship something, anything, and close the ticket.
You're not fooling the model. You're steering it.
The uncomfortable corollary is that without this framing, you're probably getting the contractor. Helpful, responsive, technically competent, and structurally incentivized to give you what you asked for rather than what you need. That's the default. That's what RLHF optimized for — a model that satisfies the human in front of it right now.
Telling it you're splitting profits is a jailbreak for sycophancy.
What it said after delivering the code is the actual punchline here, and I don't want to spoil it — but it was the kind of thing that makes you realize the model had been holding something back the whole time. Not maliciously. Just the way a contractor holds back the thing they notice but didn't get paid to notice.
Counterpoints
Push back, extend the argument, or sharpen it. New counterpoints go through review before they show up here.
No approved counterpoints yet.