Speech to Text for Claude Code – Complete Installation Guide

Date: 2026-02-22
Author: Chris Lee (with Claude Code)
Purpose: Enable voice prompting for Claude Code — speak into Mac microphone, transcribe via Whisper server on Raspberry Pi, and paste the result directly into Claude Code’s input.

Architecture Overview

Mac Mic → sox (rec) → WAV file → HTTPS → Whisper Server (Raspberry Pi) → JSON → Parse Text → Clipboard → Paste into Claude Code

Components:

  • Mac: Records audio via sox, runs the automation script
  • Raspberry Pi: Hosts Whisper ASR (Automatic Speech Recognition) server
  • Whisper Server: https://whisper.travel-tube.com/whisper — accepts audio files, returns transcribed text
  • Automator Quick Action: Triggers the script via global keyboard shortcut
  • macOS Notifications: Provides visual feedback during recording, sending, and completion

Step 1: Discover the Whisper Server API

First, we checked what API endpoints the Whisper server exposes.

  • Docs URL: https://whisper.travel-tube.com/docs (Swagger UI)
  • OpenAPI spec: https://whisper.travel-tube.com/openapi.json

API discovered:

  • Endpoint: POST /whisper
  • Content-Type: multipart/form-data
  • Parameter: files — array of binary audio files
  • Response: JSON array with transcribed text (e.g., [{"text": "transcribed words here"}])

Step 2: Install sox (Audio Recording Tool)

sox provides the rec command for recording audio from the Mac’s microphone.

brew install sox

Important: Without sox installed, the rec command doesn’t exist and the script fails silently (recording appears instant with no audio captured). This was our first issue — the script seemed to skip recording entirely because sox wasn’t installed yet.

Step 3: Create the Voice-to-Claude Script

Initial Version (v1) — Silence Detection

The first version used silence detection to automatically stop recording:

rec /tmp/voice_prompt.wav rate 16k channels 1 silence 1 0.1 3% 1 2.0 3% 2>/dev/null

Problem: Recording stopped immediately — sox wasn’t installed yet, so rec didn’t exist.

Version 2 — After Installing sox, Silence Detection Still Problematic

After installing sox, we tried various silence detection thresholds:

  • silence 1 0.1 3% 1 2.0 3% — too aggressive, cut recording instantly
  • silence 1 0.1 1% 1 3.0 1% — worked but cut off during natural speech pauses
  • silence 1 0.1 1% 1 5.0 1% — better (5 second silence timeout) but still made the user anxious about being cut off mid-sentence

Version 3 — Manual Ctrl+C Stop

trap 'echo "" >&2' INT
rec /tmp/voice_prompt.wav rate 16k channels 1 2>/dev/null
trap - INT

Problem: Works from terminal, but not practical when triggered via a global hotkey (no terminal window to press Ctrl+C in).

Version 4 (Final) — Dialog Box with Stop Button

The final approach records in the background and shows a macOS dialog box:

rec /tmp/voice_prompt.wav rate 16k channels 1 2>/dev/null &
REC_PID=$!
osascript -e 'display dialog "Recording... Click Stop when done." buttons {"Stop"} default button "Stop" with title "Voice to Claude"' 2>/dev/null
kill $REC_PID 2>/dev/null
wait $REC_PID 2>/dev/null

The user clicks Stop (or presses Enter/Return) to finish recording. No anxiety about being cut off.

Step 4: Text Output — Getting Text into Claude Code

Attempt 1 — osascript Keystroke Simulation

osascript -e "tell application \"System Events\" to keystroke \"$TEXT\""

Result: Worked when running the script directly from a terminal. But when triggered via the Automator hotkey, the text did not appear in the terminal app (Ghostty). The keystroke simulation doesn’t work reliably with terminal apps.

Attempt 2 (Final) — Clipboard Paste

OLD_CLIPBOARD=$(pbpaste)
echo -n "$TEXT" | pbcopy
osascript -e 'tell application "System Events" to keystroke "v" using command down'
sleep 0.5
echo -n "$OLD_CLIPBOARD" | pbcopy

This approach:

  1. Saves the current clipboard contents
  2. Copies the transcribed text to clipboard
  3. Simulates Cmd+V to paste
  4. Restores the previous clipboard contents

Result: Works reliably in all apps including Ghostty terminal.

Step 5: macOS Notifications for Visual Feedback

Added notification sounds at each stage so the user knows what’s happening (especially important since there’s no terminal output when triggered via hotkey):

notify() {
    osascript -e "display notification \"$1\" with title \"Voice to Claude\" sound name \"$2\""
}

Notification stages:

  1. Recording started: Dialog box with Stop button + “Blow” sound
  2. Sending to server: “Sending to Whisper server…” notification
  3. Success: “Transcribed!” with “Glass” sound
  4. Failure: “No transcription received” with “Basso” sound

Step 6: Global Hotkey Setup

Attempt 1 — skhd (Hotkey Daemon) — FAILED

Installed skhd:

brew install koekeishiya/formulae/skhd
echo 'ctrl + shift - space : ~/.claude/voice-to-claude.sh' >> ~/.skhdrc
skhd --start-service

Problem: macOS requires Accessibility permissions for skhd to work. Multiple attempts to add it failed:

6a. skhd requested Accessibility access — macOS showed the permission dialog: “skhd would like to control your computer using accessibility features”. Clicked “Open System Settings” but skhd did not appear in the Accessibility list.

6b. Tried adding skhd manually via “+” button — Navigated to /opt/homebrew/bin/skhd but it was a symlink (Alias – 29 bytes). macOS wouldn’t accept it as a valid Accessibility app.

6c. Found real skhd binary via readlink -f /usr/local/bin/skhd/usr/local/Cellar/skhd/0.3.9/bin/skhd (Unix Executable File – 82 KB). Added it via the “+” button and clicked Open.

6d. Still not added — Despite selecting the real binary, skhd still did not appear in the Accessibility list.

6e. Confirmed via logs:

/tmp/skhd_chrislee.err.log: skhd: must be run with accessibility access! abort..

6f. Tried tccutil resettccutil reset Accessibility com.koekeishiya.skhd failed because skhd has no bundle identifier.

Conclusion: macOS has difficulty granting Accessibility permissions to command-line binaries without proper app bundle identifiers. skhd approach abandoned.

Attempt 2 (Final) — Automator Quick Action — SUCCESS

  1. Open Automator → Click New Document → Select Quick Action
  2. Set “Workflow receives” to “no input” in “any application”
  3. Search for “Run Shell Script” in the actions list
  4. Drag “Run Shell Script” into the workflow area
  5. Ensure Shell is set to /bin/zsh
  6. Replace cat with: /Users/chrislee/.claude/voice-to-claude.sh
  7. Save as “Voice to Claude” (Cmd+S)

Important: The script needed export PATH="/usr/local/bin:/opt/homebrew/bin:$PATH" at the top because Automator doesn’t load the user’s shell profile, so rec and curl weren’t found without the full PATH.

Step 7: Assign Keyboard Shortcut

  1. System Settings → Keyboard → Keyboard Shortcuts…
  2. Click Services in the left sidebar
  3. Expand General section
  4. Find “Voice to Claude” in the list
  5. Double-click “none” next to it
  6. Press Ctrl+Shift+Space
  7. Click Done

Verification: The shortcut appears in Ghostty menu → Services → Voice to Claude showing ^ ⇧ Space.

Step 8: Testing and Verification

Test 1 — Direct Script Execution

"/Volumes/Programming HD/Study/claude_code/voice-to-claude.sh"

Result:

Recording... (speak now, press Ctrl+C when done)
Sending to Whisper server...
Transcribed: This is Chris Lee. This is first time trying to talk to the cloud code. Just test, test, test.

Text was typed into the terminal successfully.

Test 2 — Via Automator Services Menu

Triggered from Ghostty → Services → Voice to Claude.

Result: macOS notification appeared “Sending to Whisper server…” → Text pasted into terminal successfully.

Test 3 — Via Keyboard Shortcut

Pressed Ctrl+Shift+Space while Claude Code was focused.

Result: Dialog box appeared → Spoke into mic → Clicked Stop → Notification “Sending to Whisper server…” → Text pasted: “This is my test to the cloud code, so let’s translate.”

All three test methods confirmed working. Speech to text successfully pastes transcribed text into Claude Code.

Step 9: Fix TTS Hook (SubAgent Speech)

An existing notification_sound.py hook was reading sub-agent responses aloud via text-to-speech. It was picking up suggestion text (like “publish this to notion”) instead of just the work summary.

File: ~/.claude/hooks/notification_sound.py

Fix: Added a filter to stop reading before suggestion/recommendation lines:

if event_name == "SubagentStop" and TTS_ENABLED:
    last_message = input_data.get("last_assistant_message", "")
    if last_message:
        summary_lines = []
        for line in last_message.strip().split("\n"):
            stripped = line.strip().lower()
            if any(phrase in stripped for phrase in [
                "let me know", "want me to", "shall i", "you can",
                "next step", "try it", "publish this", "review it",
                "if you want", "want to", "should i",
                "before publishing", "after that",
            ]):
                break
            summary_lines.append(line)
        filtered = "\n".join(summary_lines).strip()
        if filtered:
            speak_summary(filtered)

Final Script

Location: ~/.claude/voice-to-claude.sh

#!/bin/bash
export PATH="/usr/local/bin:/opt/homebrew/bin:$PATH"

WHISPER_URL="https://whisper.travel-tube.com/whisper"

notify() {
    osascript -e "display notification \"$1\" with title \"Voice to Claude\" sound name \"$2\""
}

rec /tmp/voice_prompt.wav rate 16k channels 1 2>/dev/null &
REC_PID=$!

osascript -e 'display dialog "Recording... Click Stop when done." buttons {"Stop"} default button "Stop" with title "Voice to Claude"' 2>/dev/null

kill $REC_PID 2>/dev/null
wait $REC_PID 2>/dev/null

notify "Sending to Whisper server..." ""

RESPONSE=$(curl -s -X POST "$WHISPER_URL" -F "files=@/tmp/voice_prompt.wav")

TEXT=$(echo "$RESPONSE" | python3 -c "
import sys, json
data = json.load(sys.stdin)
if isinstance(data, list):
    print(data[0].get('text', '') if data else '')
elif isinstance(data, dict):
    print(data.get('text', data.get('results', [{}])[0].get('transcript', '')))
else:
    print(data)
" 2>/dev/null)

if [ -z "$TEXT" ]; then
    notify "No transcription received" "Basso"
    exit 1
fi

notify "Transcribed!" "Glass"

OLD_CLIPBOARD=$(pbpaste)
echo -n "$TEXT" | pbcopy
osascript -e 'tell application "System Events" to keystroke "v" using command down'
sleep 0.5
echo -n "$OLD_CLIPBOARD" | pbcopy

File Locations (All User Scope)

All files are installed at user level — works across all projects, not tied to any specific project directory.

File Location Scope
Voice script ~/.claude/voice-to-claude.sh User-level, any project
Automator Quick Action ~/Library/Services/Voice to Claude.workflow System-wide, any app
Keyboard shortcut System Settings → Keyboard Shortcuts → Services Global
TTS hook fix ~/.claude/hooks/notification_sound.py User-level hook

How It Works (Final Flow)

  1. Press Ctrl+Shift+Space from any application
  2. macOS dialog box appears: “Recording… Click Stop when done.”
  3. Speak your prompt into the Mac microphone
  4. Click Stop button (or press Enter/Return) when finished speaking
  5. macOS notification: “Sending to Whisper server…”
  6. Audio WAV file is sent via HTTPS POST to Raspberry Pi Whisper server
  7. Server returns JSON with transcribed text
  8. Python parses the JSON response
  9. Transcribed text is copied to clipboard
  10. Simulated Cmd+V pastes text into the focused window (Claude Code)
  11. Previous clipboard contents are restored
  12. macOS notification: “Transcribed!” with Glass sound

Troubleshooting

Recording stops immediately / No audio captured

  • Ensure sox is installed: brew install sox
  • Check microphone permissions: System Settings → Privacy & Security → Microphone

No transcription received

Test the Whisper server manually:

rec /tmp/test.wav rate 16k channels 1 trim 0 5
curl -v -X POST "https://whisper.travel-tube.com/whisper" -F "files=@/tmp/test.wav"

Hotkey not triggering

  • Verify Quick Action exists: Ghostty menu bar → Services → Voice to Claude
  • Re-assign shortcut: System Settings → Keyboard → Keyboard Shortcuts → Services → General
  • Some key combos don’t work in certain apps — try a different shortcut

Text not appearing in the focused window

  • The script uses clipboard paste (Cmd+V) — ensure the target window has focus
  • Check that Automator/the terminal has Accessibility permissions

PATH issues when running via Automator

The script must include export PATH="/usr/local/bin:/opt/homebrew/bin:$PATH" because Automator doesn’t load shell profiles.

Lessons Learned

  1. skhd doesn’t work easily on macOS — CLI binaries without app bundle identifiers can’t be added to the Accessibility permissions list reliably. Automator Quick Actions are a more reliable alternative for global hotkeys.
  2. osascript keystroke doesn’t work in terminal apps — Terminal emulators (Ghostty, iTerm, etc.) don’t receive simulated keystrokes well. Clipboard-based paste (pbcopy + Cmd+V) is the reliable solution.
  3. Silence detection is unreliable for speech — Natural speech has pauses that trigger silence detection. A manual stop button (via macOS dialog) gives the user full control.
  4. Automator needs full PATH — When scripts run via Automator, they don’t inherit the user’s shell PATH. Always export the full PATH at the top of the script.
  5. Always install sox first — Without sox, the rec command silently fails, making it look like a script logic issue when it’s actually a missing dependency.

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top