expectedwrong hindsight

Nvidia Just Handed Local LLMs to Every Gamer

Chat with RTX ships today and the implications are weirder than the product itself.

2 min read 239 words #nvidia #local-llm #ai #inference #windows
hindsight — nailed it

local LLMs went mainstream through software, not nvidia's app. ollama made it one command. apple shipped MLX. the installed base argument was correct — the distribution channel was the GPU people already owned. nvidia's app was the opening act.

Nvidia shipped Chat with RTX today — a local LLM runner, free, for anyone with an RTX GPU — and the part nobody is saying out loud is that this means Windows users got there first.

The Linux homelab crowd spent two years configuring llama.cpp and quantization levels and CUDA environment variables, and today a teenager with a 3060 can download an installer and talk to Mistral. No cloud. No API key. No monthly bill. Just the GPU they already bought to play Fortnite.

Chat with RTX runs Mistral 7B and Llama 2 locally, lets you point it at your own documents, and does the whole retrieval thing — which is a more coherent product pitch than most AI startups have managed with a hundred million dollars.

The interesting part isn't the software. The interesting part is the installed base. There are tens of millions of RTX cards in the wild, stuffed inside gaming rigs that sit idle twelve hours a day. Nvidia just activated all of them.

Privacy-first local inference used to require motivation — the motivation to be a person who sets up local inference. Now it requires owning a mid-range GPU, which half the gaming world already does.

What happens to the cloud AI services when their core value proposition — "we handle the infrastructure so you don't have to" — collapses to a one-time download is a question nobody has a clean answer to yet.