They Went Multimodal, Which Means You Can Now Upload a PDF

Every company discovering vision at the same time and calling it a paradigm shift.

The press release said "multimodal out of the box." The feature is: you can talk to a document now.

Which — fine. Useful, even. But the gap between those two sentences is vast, and everyone seems happy to pretend it isn’t.

I found an example someone posted on X that made the whole thing concrete — a screenshot of the new "chat with docs" interface, framed like a moon landing. Here is a PDF. Here is a chatbot. They are now acquainted.

To be fair, I'm genuinely curious whether the contextual understanding is any good. There's a version of this that's actually interesting — where the model does something smarter than keyword retrieval, where it notices the contradiction between page 4 and page 17, where asking "what does this actually mean for us" gets you something other than a polished summary of the table of contents.

That version would deserve the announcement.

I'll test it. I'll report back. My expectations are calibrated accordingly — which is to say, somewhere between "pleasantly surprised" and "I highlighted the relevant section myself anyway."

They Went Multimodal, Which Means You Can Now Upload a PDF

Counterpoints