65,000 Tokens of Open Source, Assuming You Have the RAM

MosaicML ships a 7B model that can read a novel — you just need a server to run it on.

The entire appeal of a 7B model is that you can run it. Your laptop. A cheap GPU. A box under your desk that's technically a gaming rig but you tell people it's a workstation. That's the deal. You give up capability, you get freedom.

MosaicML just released MPT-7B-StoryWriter with a 65,536-token context window, which is not how you keep that deal intact.

65k tokens is a novel. It's a long novel. GPT-4 launched with 32k and everyone treated that like a civic event. The open-source ecosystem has been crawling along at 2k, maybe 4k if you were lucky and didn't mind the quality falling off a cliff after the first few pages. This is a genuine leap — not incremental, not "we fine-tuned it a little," but an order-of-magnitude jump in what an open model can hold in its head at once.

The catch is RAM. Always RAM. The context window is free until you fill it, and when you fill it you need somewhere to put all those keys and values, and that somewhere is your memory, and you don't have that memory, and neither do I.

So what we have is an open-source model with genuinely impressive capabilities that most people who want open-source models — because they want to run things locally, because they don't have cloud budgets — cannot actually use at full capacity. It's the sports car that fits in a normal parking space but requires premium fuel you can't buy in your city.

Still. It exists. The weights are there. The context length is real. Someone will figure out how to make it fit, and then someone will fine-tune it, and then six months from now this will feel obvious and everyone will have forgotten that it used to be impossible.

That's how this goes.

65,000 Tokens of Open Source, Assuming You Have the RAM

Counterpoints