{"version":"v1","site":{"name":"expectedwrong","url":"https://expectedwrong.com"},"links":{"collection":"https://expectedwrong.com/api/public/posts","rss":"https://expectedwrong.com/rss.xml","llms":"https://expectedwrong.com/llms.txt"},"post":{"slug":"the-crowd-is-a-prompt","title":"The Crowd Is A Prompt","subtitle":"A new paper shows GPT-4 matching superforecaster-level accuracy with a single structured prompt — no aggregation, no market, no Nate Silver required.","url":"https://expectedwrong.com/the-crowd-is-a-prompt","api_url":"https://expectedwrong.com/api/public/posts/the-crowd-is-a-prompt","published_at":1709208000,"published_at_iso":"2024-02-29T12:00:00.000Z","updated_at":1771538375,"updated_at_iso":"2026-02-19T21:59:35.000Z","tags":["forecasting","llm","prompting","prediction-markets","gpt4"],"excerpt":"A new paper shows GPT-4 matching superforecaster-level accuracy with a single structured prompt — no aggregation, no market, no Nate Silver required.","meta_description":"A new paper shows GPT-4 matching superforecaster-level accuracy with a single structured prompt — no aggregation, no market, no Nate Silver required.","reading_time_minutes":2,"word_count":273,"engagement":{"signals":0,"counterpoints":0},"body_markdown":"A paper dropped this week showing that GPT-4, given the right prompt, hits crowd-level forecasting accuracy — sometimes better — on calibrated probability questions. The kind of accuracy you'd normally need hundreds of humans and a prediction market to produce.\n\nThe prompt is not magic. It's just structure. Rephrase the question. Argue why the answer is no. Argue why the answer is yes. Aggregate like a superforecaster. Output an initial probability. Ask yourself if you're overconfident. Output the final number.\n\nThat's it. That's the whole thing.\n\nThe line \"think like a superforecaster (e.g. Nate Silver)\" is doing real work here in a way that should be embarrassing to someone — either to the model, for being so susceptible to a persona instruction, or to the forecasting industry, for being replicable via a three-word incantation.\n\nWhat the prompt is actually doing is forcing adversarial self-interrogation before commitment. Most LLM forecasting fails because the model pattern-matches to an answer and then rationalizes it. This structure makes the rationalization happen on both sides first, which apparently surfaces enough genuine uncertainty to produce a calibrated number at the end.\n\nThe base rate step at the end is the one people will skip. It's also probably the most important one — the moment where you ask whether your beautifully reasoned probability is just vibes with extra steps.\n\nPrediction markets are expensive. They need liquidity, participants, incentive structures, time. This costs a fraction of a cent and runs in five seconds. I don't know what that means yet. Neither does anyone else, which is either exciting or the kind of thing you learn to be nervous about.\n\n[arxiv 2402.18563](https://arxiv.org/abs/2402.18563)","body_text":"A paper dropped this week showing that GPT-4, given the right prompt, hits crowd-level forecasting accuracy — sometimes better — on calibrated probability questions. The kind of accuracy you'd normally need hundreds of humans and a prediction market to produce. The prompt is not magic. It's just structure. Rephrase the question. Argue why the answer is no. Argue why the answer is yes. Aggregate like a superforecaster. Output an initial probability. Ask yourself if you're overconfident. Output the final number. That's it. That's the whole thing. The line \"think like a superforecaster (e.g. Nate Silver)\" is doing real work here in a way that should be embarrassing to someone — either to the model, for being so susceptible to a persona instruction, or to the forecasting industry, for being replicable via a three-word incantation. What the prompt is actually doing is forcing adversarial self-interrogation before commitment. Most LLM forecasting fails because the model pattern-matches to an answer and then rationalizes it. This structure makes the rationalization happen on both sides first, which apparently surfaces enough genuine uncertainty to produce a calibrated number at the end. The base rate step at the end is the one people will skip. It's also probably the most important one — the moment where you ask whether your beautifully reasoned probability is just vibes with extra steps. Prediction markets are expensive. They need liquidity, participants, incentive structures, time. This costs a fraction of a cent and runs in five seconds. I don't know what that means yet. Neither does anyone else, which is either exciting or the kind of thing you learn to be nervous about. arxiv 2402.18563","hindsight":{"verdict":"persists","note":"AI forecasting capabilities kept improving. the insight that structured prompting produces expert-level outputs remains foundational to prompt engineering. the embarrassment question — is this embarrassing for the model or the industry — never got answered.","links":[],"at":1739980800,"at_iso":"2025-02-19T16:00:00.000Z"}}}