Whisper, In Your Browser, Right Now

Real-time speech recognition that never touches a server, because WebGPU finally got fast enough to make this embarrassingly obvious.

Called it.

Not in a smug way — more like watching a slow-motion train arrive at a station you've been standing at for a while. The train is Whisper running in the browser, in real time, via WebGPU, and it's already here.

No server. No API key. No audio leaving your machine. You open a tab, grant microphone access, and your GPU — the one in your laptop, the one you bought to play games or do whatever — transcribes your voice as fast as you can produce it.

The privacy angle is the thing that doesn't get said loudly enough. Every dictation tool you've ever used — every "your voice may be used to improve our services" checkbox you've scrolled past — was a server somewhere collecting the raw audio of whatever you were saying. What you dictated to your phone at 2am. What you mumbled into a note about someone you work with. Gone now, as a problem. Just gone.

This is what WebGPU was actually for. Not 3D demos in CodePen. Not slightly faster canvas rendering. The ability to run a real model locally, in a tab, with zero infrastructure, available to anyone with a browser built in the last two years.

The corner I said it was right around — we've turned it.

Whisper, In Your Browser, Right Now

Counterpoints