When 1ms Startup Eliminates My Daemon
I maintain dictate, a voice transcription tool for Linux and macOS. I run it, speak, and the transcribed text lands on my clipboard. The original TypeScript architecture looked like this:
┌──────────────┐ Unix socket ┌──────────────┐ WebSocket ┌─────────┐│ dictatectl │ ───────────────▶│ dictated │ ◀─────────▶│ OpenAI ││ (bridge) │ JSONL IPC │ (daemon) │ Realtime │ RT API │└──────────────┘ └──────────────┘ └─────────┘ ▲ │ │ ▼ Neovim plugin wl-copy (clipboard)A persistent daemon (dictated) listened on a Unix socket. A bridge client (dictatectl) sent start/stop commands. The daemon held a WebSocket connection to OpenAI’s GPT-4o Realtime Transcribe API. The daemon managed audio capture, transcription, clipboard — the full lifecycle. It had auto-start, idle timeout, crash recovery, and a JSONL protocol.
This was all written in TypeScript running on Bun.
The hidden tax
Every tool in that diagram existed to compensate for one constraint: cold-start latency. Bun starts in roughly 150–300ms (measured on my hardware). That’s fast for a JavaScript runtime, but perceptible when I want to press a key and immediately start speaking. The daemon amortized that cost — the process was already warm, waiting on a socket.
But the daemon introduced its own costs:
- IPC protocol design and maintenance — message framing, error codes, versioning
- Process lifecycle management — auto-start, idle timeout, PID files, crash recovery
- Socket file management — permissions, stale socket cleanup, filesystem conventions
- State synchronization — “is it recording?”, “is the daemon alive?”, “did my last command arrive?”
- Two binaries instead of one —
dictatedanddictatectl, with different error surfaces
I could solve each of these. But solving them all took time, and I wasn’t sure if the complexity was worth it. And I started wondering: what if a different language didn’t have this startup constraint in the first place?
The rewrite
I rewrote dictate in Rust. Not for performance ideology — for concrete reasons: I wanted a single static binary, proper error types, and cpal for audio capture without JavaScript native module friction.
The architecture collapsed to this:
┌──────────┐ HTTPS ┌───────┐│ dictate │ ────────▶│ Groq ││ (CLI) │ Whisper │ API │└──────────┘ └───────┘ │ ▼ wl-copy (clipboard)One binary. No daemon. No socket. No protocol.
Then I measured the cold-start:
$ time ./target/release/dictate --versiondictate 0.3.00.00s user 0.00s system 94% cpu 0.001 totalOne millisecond for the binary to start. I had to run it a few times to believe it. Even when I tested actual audio capture (not just --version), the full initialization including PipeWire happens fast enough that I can press a key and start speaking without noticing any delay.
What this actually means
The 1ms startup didn’t just make dictate faster. It removed my reason for having the daemon in the first place. The daemon existed to hide cold-start latency. Without that latency, I didn’t need it anymore.
Consider the original workflow with the daemon:
- Press global hotkey →
dictatectl start(connects to socket, sends command) - Daemon receives command, starts audio capture
- Press hotkey again →
dictatectl stop - Daemon transcribes, copies to clipboard
And the new workflow:
- Press global hotkey →
dictatestarts (1ms), begins audio capture immediately - Press Enter or Ctrl+C → transcribes, copies to clipboard, process exits
The user experience is identical. The implementation is dramatically simpler.
But what about editor integration?
The daemon’s other justification was programmatic control — specifically, Neovim integration. An editor plugin needs to start/stop recording and receive text back without going through the clipboard.
This also doesn’t require a daemon. With a --stdout flag (output transcribed text to stdout instead of clipboard), Neovim’s jobstart() handles it:
local job = vim.fn.jobstart({'dictate', '--stdout'}, { on_stdout = function(_, data) -- insert transcribed text at cursor vim.api.nvim_put(data, 'c', true, true) end,})
-- later, to stop:vim.fn.jobstop(job) -- sends SIGTERM, triggers flush + transcribeThe CLI already handles signals correctly — SIGTERM flushes the final audio chunk and transcribes it (a second SIGTERM forces immediate exit). The editor gets text back on stdout. No socket, no protocol, no daemon.
When a daemon would be justified
This isn’t an argument against daemons. It’s an argument against daemons that exist to hide startup cost. In my case, I think a daemon would have been worth keeping if I’d needed to:
- Listen continuously — if I wanted Voice Activity Detection (VAD) that transcribes on silence detection, that requires a process that’s always capturing audio, not one that starts on keypress.
- Stream partial results — if I wanted real-time transcription where words appear as I speak, that needs a persistent connection to a streaming API.
- Multiplex clients — if multiple tools needed simultaneous access to the same audio stream with different configurations.
None of these were in the v0.2 daemon’s scope. It existed to be warm. Rust made “cold” indistinguishable from “warm.”
A pattern I’ve started noticing
After this, I started wondering if I’d seen this pattern before. I think I have — choosing SQLite over Postgres because the query isn’t the bottleneck, the network hop is. Or using static binaries instead of containers because deployment is just “copy one file.”
I’m not sure if this generalizes, but it seems like performance improvements don’t just make the same architecture faster. Sometimes they make a simpler architecture viable. The daemon wasn’t slow. It was compensating for slowness elsewhere. When I removed that slowness, the compensation just became overhead.
What I almost did
I almost spent weeks implementing the Rust daemon — proper async IPC with tokio, Unix socket handling, protocol versioning, graceful shutdown coordination between daemon and clients. It would have been solid engineering. But it would have been solving a problem I no longer had.
Running time once told me more about the right architecture than I’d figured out in weeks of design thinking. One measurement — 0.001 seconds — made an entire layer of the system unnecessary.
Next time, I’ll measure first. I spent a month maintaining a daemon to work around a constraint that vanished the moment I changed languages. I just hadn’t checked.