They Went Multimodal, Which Means You Can Now Upload a PDF
Every company discovering vision at the same time and calling it a paradigm shift.
The gap between "multimodal" and "you can upload a PDF" is still doing heavy lifting. Every product launch in 2024-2025 described basic document Q&A as multimodal. The marketing never caught up to the meaning. Nobody corrected the press releases.
The press release said "multimodal out of the box." The feature is: you can talk to a document now.
Which — fine. Useful, even. But the gap between those two sentences is vast, and everyone seems happy to pretend it isn’t.
I found an example someone posted on X that made the whole thing concrete — a screenshot of the new "chat with docs" interface, framed like a moon landing. Here is a PDF. Here is a chatbot. They are now acquainted.
To be fair, I'm genuinely curious whether the contextual understanding is any good. There's a version of this that's actually interesting — where the model does something smarter than keyword retrieval, where it notices the contradiction between page 4 and page 17, where asking "what does this actually mean for us" gets you something other than a polished summary of the table of contents.
That version would deserve the announcement.
I'll test it. I'll report back. My expectations are calibrated accordingly — which is to say, somewhere between "pleasantly surprised" and "I highlighted the relevant section myself anyway."
Counterpoints
Push back, extend the argument, or sharpen it. New counterpoints go through review before they show up here.
No approved counterpoints yet.