expectedwrong hindsight

Everything Has Always Been a Database with a Hat On

Salesforce is infrastructure. RAG is information retrieval. The textbook is from 2008.

3 min read 608 words #rag #information-retrieval #salesforce #enterprise-software #ai
hindsight — still happening

Salesforce is still a multitenant database with a hat on. The hat got an AI badge. The database didn't change.

Salesforce — the platform, not the CRM, not the company, not the Dreamforce keynote with the dancing Einsteins — is a multitenant database hosted on Azure or AWS, with an API layer to move data between services, an AI layer doing conversational RAG dressed up as "agents," a UI layer for their own apps, and RBAC plus some anonymization they're calling a "trust layer."

That's the whole thing. That's the complete list.

The platform transformation document they published — which I did read, which I am not proud of — is a genuinely useful document if you read it as a translation exercise. Every technical noun is a corporate noun wearing a trench coat. You undo the costume, you find: cloud hosting, shared database, API, ML predictions, UI framework, access control. Six things. Forty billion dollar market cap.

The interesting implication isn't that Salesforce is fake. It's that the recipe is visible. Any organization that hits a certain size and data density is already building a Salesforce — they just don't have the nerve to charge for it. You have storage, you have a database, you have APIs connecting internal services, you have some model plugged into your data stack, you have a UI, you have auth. Congratulations. You've reinvented enterprise software. The main difference between you and Salesforce is that Salesforce made you pay for the privilege of running your data on their version of the thing you already built.


Separately, and relatedly: RAG is not a thing.

I don't mean it doesn't work. I mean it is not a distinct technical problem. It is a specific implementation of Information Retrieval — a field with a canonical textbook published in 2008 by Manning, Raghavan, and Schütze, which is also, without modification, the course material for Stanford CS276 right now, today, in 2024.

When someone asks why their RAG pipeline returns garbage, the answer is in that book. When someone wonders why their semantic search misfires on edge cases — why embeddings behave strangely near cluster boundaries, why query expansion helps in some cases and destroys precision in others — Chapter 14 has the geometry. The problem space was mapped. The solutions were documented. Then we added a language model on top, called it RAG, and started acting like we'd discovered the wheel under a blanket.

The fundamentals are unchanged. The drapery changed. The drapery has venture funding.

"RAG is hard" is a sentence that means "information retrieval is hard," which is a sentence that has been true for fifty years and is extensively documented. The book is free. It is a PDF. It is 581 pages and it covers crawling, indexing, ranking, vector space retrieval, probabilistic models, evaluation — all of it. The chapter on XML retrieval is the only part that hasn't aged perfectly, and that's mostly because XML retrieval lost the war.

The next time someone is shocked that embedding-based retrieval is struggling with a particular query type, the correct response is to open Chapter 14 and find the paragraph that explains, calmly and without drama, that this problem was identified and partially solved before most of the people currently selling RAG infrastructure were in college.

None of this is cynical. The new stuff is genuinely useful — language models do something retrieval couldn't do alone, and the combination is powerful. But powerful tools built on misunderstood foundations produce confident failures, which is exactly what most RAG deployments are: confidently wrong, at scale, with a nice UI.

Salesforce sells you the database and calls it a platform. The AI industry sells you IR and calls it a paradigm shift. The textbook is still free.