Telling the Model You'll Tip It Works

Twenty-six prompt principles, empirically validated, including one where you bribe the AI.

There is a paper out of VILA-Lab — "Principled Instructions Are All You Need for Questioning ChatGPT and GPT-4" — that reads like a cheat sheet someone smuggled out of a playtest. Twenty-six principles for prompting language models, each one tested, each one shown to measurably improve output quality. Not vibes. Not blog-post wisdom. Empirical results, 50-plus percent improvement on response quality benchmarks depending on the task.

The list is what gets you. Some of it is what you'd guess — assign a role, use step-by-step reasoning, give examples. Fine. Expected. But then you hit the ones that shouldn't work and do.

"I'm going to tip $200 for a better solution." This one works. The model, which has no concept of money and will never receive $200 from anyone, produces better output when you threaten it with a financial reward it cannot collect. The researchers tested it. It helps. Nobody knows why and the paper does not pretend to know why — it just reports the result, which is the correct move.

"You will be penalized." Also works. The model, which cannot be penalized, responds to the threat of penalty. Combine this with the tip offer and you've essentially constructed a tiny economy inside your prompt — carrots, sticks, a whole labor negotiation — aimed at a statistical process that doesn't know it's in a negotiation.

The anti-politeness finding is the one that keeps coming up in discussions, probably because it's the most immediately actionable. Don't say please. Don't say thank you. Don't open with "I hope this message finds you well" or whatever. Just give the instruction. The niceties cost you something — not much, but measurably, consistently something. You are making small talk with a very fast autocomplete and it is going worse than you think.

There's also the output primer move, which I find genuinely elegant: end your prompt with the beginning of the answer you want. "The three main causes of this problem are:" — and then stop. The model completes from there. You've aimed it. It's less a prompt than a runway.

Separately, JPMorgan dropped DocLLM this week — a layout-aware language model for visually rich documents, forms, contracts, receipts, that kind of thing. The core idea is straightforward: documents have spatial structure that matters, and pure text transformers throw that away. DocLLM adds bounding box coordinates directly into the attention mechanism — not as image features fed through a vision encoder, but as a disentangled spatial attention component sitting alongside the standard semantic attention. Text and layout talking to each other at the weight level.

The no-vision-encoder choice is the interesting design call. Most document AI approaches bolt a vision model onto a language model and hope the cross-attention does something useful. DocLLM skips that entirely. The coordinates live in the same space as the tokens. A receipt from 2019 with a weird layout and a handwritten total in the corner is not a vision problem, it's a position problem — and treating it that way is either very clever or very obvious in retrospect, which is generally the sign of a correct idea.

JPMorgan doing legitimate architecture research and open-sourcing it continues to be a thing that happens, and I continue to be mildly surprised every time.

Telling the Model You'll Tip It Works

Counterpoints