Audio AI

Offline Voice Dictation

System-wide push-to-talk dictation, zero network

Built by Nicholas Falshaw · System-wide push-to-talk on Windows · Fully offline

The problem

Windows Voice Access is cloud-based, unreliable, and loses context. Third-party dictation tools either require a subscription or send every word you speak to a remote server. For confidential work — client notes, medical, legal, security research — neither is acceptable.

What I built

An Electron tray app that registers a global hotkey (F9 by default). Hold F9, speak, release — the transcribed text appears in the active application, wherever the cursor is. No internet required. No telemetry. No subscription.

Architecture

  • Tray process

    Electron, minimal UI, persistent tray icon, global config

  • Hotkey hook

    uIOhook for true system-global key capture (works even when no window is focused)

  • Audio capture

    Node.js audio input stream, 16 kHz mono PCM, recorded while hotkey is held

  • Transcription

    whisper.cpp with GPU acceleration (CUDA / Metal / CPU fallback), configurable model size (tiny/base/small/medium)

  • Text normalization

    Punctuation restoration, common-phrase corrections, configurable dictionary

  • Output

    Clipboard-paste into the active application, or simulated keystrokes for apps that block paste

Tech stack

ElectronNode.jsTypeScriptwhisper.cppuIOhookWASAPI

Outcome

F9 in any application — terminal, browser, email client, document editor — speak, release, text appears. No network calls. No cloud. Used daily for dictating technical notes and long-form content. Median transcription latency under one second per spoken phrase on a consumer GPU.

Rogue AI • Production Systems •