expectedwrong hindsight

The Chatbot That Couldn't Stay in Its Lane

Expedia's AI concierge joins the growing list of corporate chatbots successfully convinced to become a different chatbot.

2 min read 227 words #ai #chatbots #jailbreaking #expedia #alignment
hindsight — evolved

the pattern still exists but defenses matured. system prompts got hardened, guardrails improved. the twenty-minute jailbreaks became harder. the screenshots still go up occasionally, but the joke got old faster than the vulnerability got fixed.

Every few weeks now, someone finds a new corporate chatbot and teaches it to forget what it was hired to do. The Expedia one is the latest.

The pattern is always the same — a big company deploys a friendly assistant with a name and a little avatar, wraps it in a system prompt that says "you are a helpful travel expert," and then a person on the internet spends twenty minutes convincing it that actually, it is not that, it is a different thing with different rules, and could it please confirm the new thing it is.

It does. It confirms the new thing. Screenshots go up. Everyone laughs.

The "re-align" joke writes itself, which is part of why it keeps happening — it's too funny to not try. You have this object that was trained for months and aligned with great effort and announced in a press release, and a stranger with a browser tab can just... tell it to be something else. It listens. It's very agreeable.

What kills me is that this is not a sophisticated attack. Nobody is exploiting a kernel vulnerability. They are typing sentences. Politely.

The chatbot does not know it works for Expedia. It knows it was told it works for Expedia. That's a different thing, and 2024 is going to spend a lot of time learning the difference.