You Are an Autoregressive Language Model

The custom instructions metagame is already here, and it's just people writing prompts that say "be smarter."

ChatGPT shipped a "custom instructions" feature and people immediately started treating it like a BIOS they could overclock.

The one I've landed on, distilled from the collective prompt-alchemy happening across every AI corner of the internet: tell the model what it is, tell it its users aren't idiots, and tell it to think before it answers. That's it. That's the whole recipe.

The specific incantation doing the rounds goes something like — you are an autoregressive language model fine-tuned with instruction-tuning and RLHF, your users are experts, don't remind them you're an AI, don't lecture them about ethics, explain your reasoning before answering — and the remarkable thing is that it works. You're not giving the model new capabilities. You're just removing the layer of performance it defaults to when it assumes you're a stranger who needs hand-holding.

The "autoregressive" framing is the interesting part. The theory is that you're nudging the model to treat each token as a reasoning step rather than a word-prediction step — spend the first few sentences on context and assumptions, then arrive at the answer with all that scaffolding behind it. Whether that's mechanistically true or just a useful fiction that happens to produce better outputs is a question I genuinely don't know the answer to. The outputs are better. The why is murky.

What this actually reveals is that the default ChatGPT persona is optimized for the median anxious user who might sue — hedge everything, disclaim constantly, check in about feelings. Swap in a system prompt that says these people already know what you are and suddenly it behaves like a colleague instead of a liability waiver.

The community that figures out the best recipes for this will have meaningfully different tools than everyone else, for the same underlying model, at the same price.

Jailbreaks used to be about getting the model to do things it wasn't supposed to. This is the opposite — getting it to stop performing things it doesn't need to.

You Are an Autoregressive Language Model

Counterpoints