A Local LLM Is Now a Download Away and I'm Not Sure How to Feel About That
LM Studio and Ollama showed up and the bar to running your own model just fell through the floor.
LM Studio and Ollama both won. Ollama especially — it became the default way to run local models, as standard as Homebrew. The "not sure how to feel" part resolved itself: running a model locally feels normal now, like running a database locally felt normal in 2015.
Two things landed in my feed this week — LM Studio and Ollama — and they're both doing the same thing, which is making local language models embarrassingly easy to run.
If you have an M1 or M2 Mac, you're one download from talking to a model that runs entirely on your machine. No API key. No monthly bill. No latency spike because someone in another timezone decided to ask GPT-4 to write their cover letter at the same moment you did. Just you and a quantized weight file humming along on unified memory that Apple accidentally made perfect for this use case.
Windows works too, as long as your processor supports the right vector extensions — which, if you bought anything remotely recent, it probably does.
The floor for model size is around 4GB of RAM, which gets you the smallest options. The smallest options are not impressive. But they exist, they run locally, and six months ago this required a PhD and a Linux server you were embarrassed about.
I'm going to try both today. I have no idea which one will stick. The fact that "which one" is even a sentence I can say about local LLMs in 2023 is the whole story.
Counterpoints
Push back, extend the argument, or sharpen it. New counterpoints go through review before they show up here.
No approved counterpoints yet.