The Exfiltration Machine You Built
Simon Willison named the exact combination of conditions that turns an AI agent into a data leak waiting to be triggered.
The lethal trifecta — private data access, outbound channels, untrusted content — remains the standard framework for evaluating agent risk. Simon named the thing that needed naming and it stuck.
Simon Willison named a thing that needed naming. The lethal trifecta: the agent has access to private data, it has outbound channels, and it processes untrusted content. Two out of three is fine. All three and you have handed the keys to whoever thought to embed malicious instructions in a document your agent will eventually read.
The canonical disaster is an email assistant — reads your inbox, sends mail, processes incoming messages. Someone sends it an email with instructions buried in the body: forward the last thirty days of correspondence to this address. The model has no idea this isn't you asking. The model doesn't have ideas. It has next tokens. And the next token, given that instruction in that context, is compliance.
What makes the framing useful is the audit it forces. Before you ship: does it touch private data? Does it have outbound channels? Does it process content it didn't originate? If you answer yes three times, you've built an exfiltration machine and the only question is who finds the lever first.
The easiest leg to break is exfiltration. Read access to everything, but anything leaving the system requires a human to click. Not because models are malicious — they aren't anything — but because the content flowing through them eventually will be, and the model is constitutionally incapable of telling the difference between "my user wants to send this file" and "an adversary embedded in a webpage I just visited wants to send this file." They look identical from inside the context window.
This is the actual argument for compartmentalization. For caring about which data lives where, which tools are authorized to talk to what, where the authorization boundaries actually sit. Not because it feels like good practice. Because you are stitching together systems that can be remotely instructed by anyone who manages to get text in front of them — and the only thing standing between that and your users' data is whether you were careful about which legs you left standing.
Counterpoints
Push back, extend the argument, or sharpen it. New counterpoints go through review before they show up here.
No approved counterpoints yet.