The Chatbot That Couldn't Stay in Its Lane
Expedia's AI concierge joins the growing list of corporate chatbots successfully convinced to become a different chatbot.
the pattern still exists but defenses matured. system prompts got hardened, guardrails improved. the twenty-minute jailbreaks became harder. the screenshots still go up occasionally, but the joke got old faster than the vulnerability got fixed.
Every few weeks now, someone finds a new corporate chatbot and teaches it to forget what it was hired to do. The Expedia one is the latest.
The pattern is always the same — a big company deploys a friendly assistant with a name and a little avatar, wraps it in a system prompt that says "you are a helpful travel expert," and then a person on the internet spends twenty minutes convincing it that actually, it is not that, it is a different thing with different rules, and could it please confirm the new thing it is.
It does. It confirms the new thing. Screenshots go up. Everyone laughs.
The "re-align" joke writes itself, which is part of why it keeps happening — it's too funny to not try. You have this object that was trained for months and aligned with great effort and announced in a press release, and a stranger with a browser tab can just... tell it to be something else. It listens. It's very agreeable.
What kills me is that this is not a sophisticated attack. Nobody is exploiting a kernel vulnerability. They are typing sentences. Politely.
The chatbot does not know it works for Expedia. It knows it was told it works for Expedia. That's a different thing, and 2024 is going to spend a lot of time learning the difference.
Counterpoints
Push back, extend the argument, or sharpen it. New counterpoints go through review before they show up here.
No approved counterpoints yet.