The Rubber Duck That Takes Notes

Voice mode's first genuinely useful job has nothing to do with any of the demos.

The setup I was dreading: Fly.io with a static IP, Cloudflare DNS with the proxy temporarily disabled so Fly.io can issue the cert, re-enable it after, Cloudflare Zero Trust with a self-hosted app, a Caddyfile with JWT validation pointed at the Cloudflare Access cert endpoint, deploy. Maybe a dozen steps, none of them individually hard, all of them requiring you to remember which cloud dashboard you're in, what you just copied to the clipboard, and whether you already released the shared IPv4 or just thought about it.

Documenting this kind of thing is its own hell — not because it's complicated, it isn't really, but because switching between doing and writing breaks both. You either document perfectly and take twice as long, or you move efficiently and produce nothing.

So this morning I opened GPT voice mode and just started narrating. Not giving instructions. Just talking at it the way you'd talk to a colleague watching over your shoulder. "Okay, allocating static IPv4 — running fly ips allocate-v4, now releasing the shared IP..." All the way through the DNS records, the grey-cloud-to-orange-cloud moment, the cert wait, the Zero Trust app setup, the audience tag, the Caddyfile.

Then at the end: make me the documentation for what I just described.

Perfect markdown. All the steps in order. The note about waiting 30-60 seconds for the cert. The placeholder for the audience tag. The detail about disabling the Cloudflare proxy temporarily — the step that's obvious when you're doing it and invisible when you're writing from memory two days later. It got everything, because I had said everything.

The thing voice mode is good at is not the thing anyone talks about. Nobody demoed "narrate your workflow at me while you work and I'll produce the docs afterward." They showed cooking help, language learning, emotional support, tutoring. Things that feel futuristic and slightly uncomfortable.

Documentation is the thing everyone dreads and nobody does well, and the reason isn't that people can't write — it's that writing while doing requires you to hold two different cognitive modes at once, and most people can only sustain that for about twelve minutes before one of them collapses. Voice removes the problem entirely. You're not writing. You're narrating. Narrating is almost free, especially when there's something listening that clearly cares.

There's also something about the format that keeps you honest. When you're typing docs you skip the obvious steps — the ones that feel too small to mention — because your fingers know they're obvious. When you're talking, the obvious steps just come out. You can't not say them. The voice just carries them along.

The full Fly.io and Cloudflare setup — static IP, DNS-only while the cert issues, Zero Trust app, Caddyfile with JWT auth against the Cloudflare Access cert endpoint, fly deploy — that's the kind of thing I've done enough times that I forget it requires documentation at all. The next person who needs to do it does not share my muscle memory.

Now there's a doc. It took no extra time.

OpenAI also launched the Realtime API today — low-latency voice, interruption handling, audio piped in and out programmatically. Sama posted. The demos look like the future.

I already found the use case, though. It's a stenographer. That's it. The most boring possible answer, and somehow the one that actually saves an afternoon.

The Rubber Duck That Takes Notes

Counterpoints