expectedwrong hindsight

Faster, Better, Wrong

Microsoft's AI productivity data is genuinely interesting, which makes it more unsettling, not less.

4 min read 689 words #ai #llms #productivity #labor #microsoft
hindsight — still happening

the faster-better-wrong tradeoff remains the core challenge of AI-assisted work. every major study since has confirmed some version of this. the surface gets more polished, the facts stay unreliable, and the person reviewing it is moving too fast to notice.

Microsoft has been running studies on what ChatGPT actually does to knowledge workers, and the headline number is this: 37% faster, 40% higher quality, approximately 20% less accurate.

Sit with that for a second.

You get the work done faster. The work looks better. And the work is more likely to be wrong. These three things are apparently not in contradiction — they can coexist in the same document, reviewed by the same person, shipped to the same client. The obvious explanation is that "quality" here means something like surface-level polish and coherence, which LLMs are genuinely excellent at producing, while "accuracy" means whether the underlying facts and conclusions are correct, which is a different question entirely. The model writes beautifully. The model is lying.

The fix, per the slides, involves "simple UX solutions," which is the kind of thing you say when you have a hard problem and a product roadmap.


The other number worth holding onto: less-skilled workers improved by 43%. More skilled workers improved by about 17%. This is being framed as a good thing — democratization, leveling the playing field, rising tide — but it's worth asking what it actually means for the labor market when the tool is most valuable to the people doing work that commands the lowest rates. The tool narrows the gap between a junior analyst and a senior one. Who benefits from that gap closing?

I genuinely don't know the answer. But "augmentation" is pulling a lot of weight in these conversations, and one of the slides has the intellectual honesty to point out that augmentation can still mean job loss — it's just that "innovation vs. automation" is a better frame than "substitution vs. augmentation." Augmentation sounds benign. Innovation sounds exciting. Both can end with fewer humans on payroll.


There's a concept in there called a "provocateur" — an LLM-based tool that challenges assumptions, encourages evaluation, offers counterarguments. The idea is that the assistant model and the provocateur model are paired, one producing and one attacking. This is genuinely smart design, and it's also a description of having a good collaborator, which humans have historically tried to find by hiring people. Now you can run it as a subprocess.

The part about meetings is also quietly wild: AI in meetings can solve the problem of unequal participation through instant feedback, and improve interactions through retrospective feedback. So the tool can tell quiet people to speak up and tell loud people to shut up, in real time. Whether anyone will listen is another matter, but the capability is there. Most meetings I've been in could have used an automated intervention around minute 45.


The knowledge-in-chats problem is real and underappreciated. The observation is that modern office knowledge lives in Slack threads and DMs, not documents — and that applying AI over employee communications is legally and ethically complicated in ways that don't have clean solutions yet. The institutional memory of every company I've worked at is buried in a channel that's half-jokes and half-critical context, and the ratio isn't always clear from the outside.

The bigger question the slides land on is not "how will AI affect work" but "how do we want AI to affect work," which sounds obvious but is actually a different question with different implications. The first is predictive. The second is political. The answer to the second requires deciding what work is for and who it's supposed to benefit, which is a harder conversation than fine-tuning prompts.

About 80% of the US workforce has at least 10% of their work tasks affected by GPTs. Around 19% have 50% of their tasks affected. Those aren't speculative projections — that's where we already are, in January 2024, before whatever comes next.

The Two Sigma LLM abstractions piece is worth reading alongside all of this — a different angle on the same underlying question of how these systems actually work and what we're building on top of them. If the MS data is the "what is happening," the abstractions framing is useful for the "what are we actually building."

Both are good. Neither is comforting.