{"version":"v1","site":{"name":"expectedwrong","url":"https://expectedwrong.com"},"links":{"collection":"https://expectedwrong.com/api/public/posts","rss":"https://expectedwrong.com/rss.xml","llms":"https://expectedwrong.com/llms.txt"},"post":{"slug":"they-went-multimodal-chat-with-docs","title":"They Went Multimodal, Which Means You Can Now Upload a PDF","subtitle":"Every company discovering vision at the same time and calling it a paradigm shift.","url":"https://expectedwrong.com/they-went-multimodal-chat-with-docs","api_url":"https://expectedwrong.com/api/public/posts/they-went-multimodal-chat-with-docs","published_at":1698580800,"published_at_iso":"2023-10-29T12:00:00.000Z","updated_at":1771590827,"updated_at_iso":"2026-02-20T12:33:47.000Z","tags":["ai","multimodal","llm","hot-take"],"excerpt":"Every company discovering vision at the same time and calling it a paradigm shift.","meta_description":"Every company discovering vision at the same time and calling it a paradigm shift.","reading_time_minutes":1,"word_count":180,"engagement":{"signals":0,"counterpoints":0},"body_markdown":"The press release said \"multimodal out of the box.\" The feature is: you can talk to a document now.\n\nWhich — fine. Useful, even. But the gap between those two sentences is vast, and everyone seems happy to pretend it isn’t.\n\nI found an example someone posted on X that made the whole thing concrete — a screenshot of the new \"chat with docs\" interface, framed like a moon landing. Here is a PDF. Here is a chatbot. They are now acquainted.\n\nTo be fair, I'm genuinely curious whether the contextual understanding is any good. There's a version of this that's actually interesting — where the model does something smarter than keyword retrieval, where it notices the contradiction between page 4 and page 17, where asking \"what does this actually mean for us\" gets you something other than a polished summary of the table of contents.\n\nThat version would deserve the announcement.\n\nI'll test it. I'll report back. My expectations are calibrated accordingly — which is to say, somewhere between \"pleasantly surprised\" and \"I highlighted the relevant section myself anyway.\"","body_text":"The press release said \"multimodal out of the box.\" The feature is: you can talk to a document now. Which — fine. Useful, even. But the gap between those two sentences is vast, and everyone seems happy to pretend it isn’t. I found an example someone posted on X that made the whole thing concrete — a screenshot of the new \"chat with docs\" interface, framed like a moon landing. Here is a PDF. Here is a chatbot. They are now acquainted. To be fair, I'm genuinely curious whether the contextual understanding is any good. There's a version of this that's actually interesting — where the model does something smarter than keyword retrieval, where it notices the contradiction between page 4 and page 17, where asking \"what does this actually mean for us\" gets you something other than a polished summary of the table of contents. That version would deserve the announcement. I'll test it. I'll report back. My expectations are calibrated accordingly — which is to say, somewhere between \"pleasantly surprised\" and \"I highlighted the relevant section myself anyway.\"","hindsight":{"verdict":"right","note":"The gap between \"multimodal\" and \"you can upload a PDF\" is still doing heavy lifting. Every product launch in 2024-2025 described basic document Q&A as multimodal. The marketing never caught up to the meaning. Nobody corrected the press releases.","links":[],"at":1740000000,"at_iso":"2025-02-19T21:20:00.000Z"}}}