expectedwrong hindsight

The Crowd Is A Prompt

A new paper shows GPT-4 matching superforecaster-level accuracy with a single structured prompt — no aggregation, no market, no Nate Silver required.

2 min read 273 words #forecasting #llm #prompting #prediction-markets #gpt4
hindsight — still happening

AI forecasting capabilities kept improving. the insight that structured prompting produces expert-level outputs remains foundational to prompt engineering. the embarrassment question — is this embarrassing for the model or the industry — never got answered.

A paper dropped this week showing that GPT-4, given the right prompt, hits crowd-level forecasting accuracy — sometimes better — on calibrated probability questions. The kind of accuracy you'd normally need hundreds of humans and a prediction market to produce.

The prompt is not magic. It's just structure. Rephrase the question. Argue why the answer is no. Argue why the answer is yes. Aggregate like a superforecaster. Output an initial probability. Ask yourself if you're overconfident. Output the final number.

That's it. That's the whole thing.

The line "think like a superforecaster (e.g. Nate Silver)" is doing real work here in a way that should be embarrassing to someone — either to the model, for being so susceptible to a persona instruction, or to the forecasting industry, for being replicable via a three-word incantation.

What the prompt is actually doing is forcing adversarial self-interrogation before commitment. Most LLM forecasting fails because the model pattern-matches to an answer and then rationalizes it. This structure makes the rationalization happen on both sides first, which apparently surfaces enough genuine uncertainty to produce a calibrated number at the end.

The base rate step at the end is the one people will skip. It's also probably the most important one — the moment where you ask whether your beautifully reasoned probability is just vibes with extra steps.

Prediction markets are expensive. They need liquidity, participants, incentive structures, time. This costs a fraction of a cent and runs in five seconds. I don't know what that means yet. Neither does anyone else, which is either exciting or the kind of thing you learn to be nervous about.

arxiv 2402.18563