A GPT That Fits on a USB Stick and Runs on Anything

Justine Tunney at Mozilla just made LLMs into single executable files, and the implications are stranger than the demo.

Justine Tunney figured out how to put a large language model — including a vision model, the kind that looks at images — into a single file that runs on Windows, Mac, Linux, FreeBSD, OpenBSD, and NetBSD without installing anything.

One file. Every operating system. No dependencies. No Docker. No "ensure you have Python 3.11."

The project is called llamafile. It uses cosmopolitan libc to produce a binary that is genuinely, not metaphorically, universal. You copy it somewhere. You run it. A local web server comes up. The model is sitting there waiting. You close the laptop and nothing is left behind.

The demo they shipped was LLaVA — a multimodal model that can look at images and talk about them — packaged as a single 4GB executable that you could put on a USB drive, plug into a stranger's computer, and run a GPT-vision-class model with no internet connection, no account, no rate limits, no terms of service except your own.

That's the thing worth sitting with. Not the technical trick, which is impressive enough, but what it means for the idea of intelligence as infrastructure. We spent two years watching the big labs build the biggest moats they could think of — the data centers, the API keys, the usage policies, the carefully managed access. And now there is a file. You can email it. You can put it in a zip. You can lose it down the back of a couch and find it three years later and it still works, because it has no expiration date, because it requires nothing from anyone.

The commoditization thesis for AI has always been: models get cheaper, inference gets cheaper, the moat shrinks. But "cheaper" still implies a transaction, a server, a company that could theoretically decide not to serve you. A file on a USB stick is a different category of thing. It's closer to a book than a service — once it exists, it just exists, and the question of whether you're allowed to use it becomes a question about the file, not about anyone's uptime.

Mozilla funded this. Mozilla, the organization that has been slowly searching for a reason to exist since Chrome ate Firefox's lunch. Of all the places for this to come from.

The implications are not fully here yet. The models that fit in a llamafile right now are small — capable, but not GPT-4. The file sizes are measured in gigabytes. Most USB sticks can hold one. Most laptops have enough RAM to run it, barely. In two years, maybe one year, this gap closes, and then you have something genuinely uncanny: frontier-class reasoning in a file format.

A file format for intelligence. Distributed like software. Copied like music.

I have no idea what happens then and neither does anyone else.

A GPT That Fits on a USB Stick and Runs on Anything

Counterpoints