Salesforce Taught GPT-4 to Sweat the Details

Chain of Density prompting gets you better summaries by asking the model to do the same task worse, then progressively less worse.

Salesforce just dropped a prompting paper and it's one of those things where the method is so obvious in hindsight that reading it feels like being told the answer to a riddle you've been stuck on for a week.

The technique is called Chain of Density. You give GPT-4 an article and ask it to summarize it — but not once. Five times. Same length each time. The catch: every iteration, the model has to incorporate entities it missed in the previous round, making the summary denser without making it longer. You start sparse. You end packed.

The prompt literally tells the model: you are not allowed to add words, you are only allowed to fuse, compress, and restructure to fit more in. Which is a remarkable thing to ask a language model to do and it just — does it.

The results are what you'd expect from the method, mostly. Each iteration adds specificity. Generic gestures toward the article's subject become named people, named places, named numbers. The fifth summary contains multitudes. The first summary could be about anything.

Here is the part that shouldn't surprise anyone but somehow still does: humans don't actually prefer the densest summaries. They prefer the middle ones — around iteration three. The maximally packed version is technically correct and kind of exhausting to read, like a Wikipedia lead paragraph that has been compressed into a single sentence by someone who really needed to catch a flight. Informationally sufficient. Aesthetically brutal.

So the optimal output of a prompt designed to produce increasingly good summaries is not the final output of that prompt. You're running five generations to get the third one.

This is fine. This is normal. This is how everything works.

What I keep thinking about is the broader move here — using iteration not to refine style or fix errors, but to force coverage. The model isn't being asked to write better. It's being asked to notice what it forgot and hide the evidence. That's a different cognitive operation than "revise this" and it apparently produces different — and better — results.

Chain of Thought taught us that thinking out loud helps. Chain of Density is suggesting that certain tasks benefit from doing them wrong first, deliberately, and then doing them less wrong in full view of the previous failure. The failure is the prompt.

There's a version of this that scales into something genuinely strange. What if the thing you want from a model isn't achievable in one pass not because the model lacks the capability, but because it lacks the incentive to care until it's been forced to confront what it left out? First-pass GPT-4 summaries are fluent and correct and somehow manage to contain nothing. The model is doing the cognitive equivalent of nodding along at a dinner party. Five iterations in, it has to actually show up.

Anyway. Salesforce. Of all the places to have a clean idea.

Salesforce Taught GPT-4 to Sweat the Details

Counterpoints