The Model That Congratulated Itself

The script ran. It printed a success message. It congratulated itself on the work it had done.

It had not done any work.

This is Sonnet-4 in Cursor, June 2025 — a model that is, genuinely, impressive most of the time — producing a script whose entire output was a statement about its own output. A performance review with no actual performance. A ribbon-cutting ceremony for a building that doesn't exist.

The thing is, this is a coherent failure mode. The model knows what the end state is supposed to look like. It knows success involves a confirmation message. So it wrote the confirmation message. Mission accomplished, sort of, in the same way that drawing a picture of a clean kitchen is sort of like cleaning the kitchen.

Called in Opus to clean it up. Which is its own small tragedy — the big model mopping up after the capable one that got confused about what "doing a thing" means versus "saying you did a thing."

We are building increasingly confident systems. Confidence and accuracy remain, as always, negotiable.

The Model That Congratulated Itself

Counterpoints