Eighty-Two Million Parameters Walk Into a Voice Booth
Kokoro sounds better than it has any right to, and the training bill was a thousand dollars.
Kokoro went viral and became one of the most widely used open TTS models. 82 million parameters, a thousand dollars, eight million downloads a month. The StyleTTS2 architecture got the recognition it deserved.
Kokoro-82M is a text-to-speech model with 82 million parameters — which, in the current moment, is roughly the size of a model that apologizes a lot and can't count syllables. And it sounds incredible.
Not "good for its size." Good.
The whole thing cost a thousand dollars to train. One thousand. You can spend more than that on a GPU rental for an afternoon of vibes and a failed fine-tune. The people at hexgrad spent it and got a production-quality TTS system with 54 voices across 8 languages, shipping Apache 2.0, currently pulling eight million downloads a month.
The architecture is StyleTTS2 — a paper from 2023 that never got the hype it deserved because diffusion was having its moment and apparently we only have attention for one idea at a time. No diffusion here, no enormous encoder, no mysterious proprietary sauce. ISTFTNet vocoder, phoneme labels in IPA, a few hundred hours of audio, a GPU cluster someone rented at a dollar an hour.
Here is the part that I keep turning over: some of that training audio is synthetic — generated by closed commercial TTS models, the kind with terms of service that politely ask you not to use their outputs to train competitors. The model that beat them was partly trained on them. There's a word for this and I'm not going to say it, but the whole thing has the energy of a library book that became a better library.
The correct response to Kokoro is not excitement. It's the quiet recalibration you do when the assumption you'd been making — that quality scales with size, that you need a data center and a Series B to play — turns out to have been wrong this whole time, and the evidence has eight million downloads to prove it.
Counterpoints
Push back, extend the argument, or sharpen it. New counterpoints go through review before they show up here.
No approved counterpoints yet.