expectedwrong hindsight

Whisper, In Your Browser, Right Now

Real-time speech recognition that never touches a server, because WebGPU finally got fast enough to make this embarrassingly obvious.

1 min read 217 words #webgpu #whisper #privacy #browser #speech-recognition
hindsight — nailed it

browser-based transcription became real. privacy-first audio processing without server roundtrips is now available from multiple projects. the observation about 'your voice may be used to improve our services' being the thing this kills was the right emphasis.

Called it.

Not in a smug way — more like watching a slow-motion train arrive at a station you've been standing at for a while. The train is Whisper running in the browser, in real time, via WebGPU, and it's already here.

No server. No API key. No audio leaving your machine. You open a tab, grant microphone access, and your GPU — the one in your laptop, the one you bought to play games or do whatever — transcribes your voice as fast as you can produce it.

The privacy angle is the thing that doesn't get said loudly enough. Every dictation tool you've ever used — every "your voice may be used to improve our services" checkbox you've scrolled past — was a server somewhere collecting the raw audio of whatever you were saying. What you dictated to your phone at 2am. What you mumbled into a note about someone you work with. Gone now, as a problem. Just gone.

This is what WebGPU was actually for. Not 3D demos in CodePen. Not slightly faster canvas rendering. The ability to run a real model locally, in a tab, with zero infrastructure, available to anyone with a browser built in the last two years.

The corner I said it was right around — we've turned it.