Date: 2026-02-22
Author: Chris Lee (with Claude Code)
Purpose: Enable voice prompting for Claude Code — speak into Mac microphone, transcribe via Whisper server on Raspberry Pi, and paste the result directly into Claude Code’s input.
Architecture Overview
Mac Mic → sox (rec) → WAV file → HTTPS → Whisper Server (Raspberry Pi) → JSON → Parse Text → Clipboard → Paste into Claude Code
Components:
- Mac: Records audio via
sox, runs the automation script - Raspberry Pi: Hosts Whisper ASR (Automatic Speech Recognition) server
- Whisper Server:
https://whisper.travel-tube.com/whisper— accepts audio files, returns transcribed text - Automator Quick Action: Triggers the script via global keyboard shortcut
- macOS Notifications: Provides visual feedback during recording, sending, and completion
Step 1: Discover the Whisper Server API
First, we checked what API endpoints the Whisper server exposes.
- Docs URL:
https://whisper.travel-tube.com/docs(Swagger UI) - OpenAPI spec:
https://whisper.travel-tube.com/openapi.json
API discovered:
- Endpoint:
POST /whisper - Content-Type:
multipart/form-data - Parameter:
files— array of binary audio files - Response: JSON array with transcribed text (e.g.,
[{"text": "transcribed words here"}])
Step 2: Install sox (Audio Recording Tool)
sox provides the rec command for recording audio from the Mac’s microphone.
brew install sox
Important: Without sox installed, the rec command doesn’t exist and the script fails silently (recording appears instant with no audio captured). This was our first issue — the script seemed to skip recording entirely because sox wasn’t installed yet.
Step 3: Create the Voice-to-Claude Script
Initial Version (v1) — Silence Detection
The first version used silence detection to automatically stop recording:
rec /tmp/voice_prompt.wav rate 16k channels 1 silence 1 0.1 3% 1 2.0 3% 2>/dev/null
Problem: Recording stopped immediately — sox wasn’t installed yet, so rec didn’t exist.
Version 2 — After Installing sox, Silence Detection Still Problematic
After installing sox, we tried various silence detection thresholds:
silence 1 0.1 3% 1 2.0 3%— too aggressive, cut recording instantlysilence 1 0.1 1% 1 3.0 1%— worked but cut off during natural speech pausessilence 1 0.1 1% 1 5.0 1%— better (5 second silence timeout) but still made the user anxious about being cut off mid-sentence
Version 3 — Manual Ctrl+C Stop
trap 'echo "" >&2' INT
rec /tmp/voice_prompt.wav rate 16k channels 1 2>/dev/null
trap - INT
Problem: Works from terminal, but not practical when triggered via a global hotkey (no terminal window to press Ctrl+C in).
Version 4 (Final) — Dialog Box with Stop Button
The final approach records in the background and shows a macOS dialog box:
rec /tmp/voice_prompt.wav rate 16k channels 1 2>/dev/null &
REC_PID=$!
osascript -e 'display dialog "Recording... Click Stop when done." buttons {"Stop"} default button "Stop" with title "Voice to Claude"' 2>/dev/null
kill $REC_PID 2>/dev/null
wait $REC_PID 2>/dev/null
The user clicks Stop (or presses Enter/Return) to finish recording. No anxiety about being cut off.
Step 4: Text Output — Getting Text into Claude Code
Attempt 1 — osascript Keystroke Simulation
osascript -e "tell application \"System Events\" to keystroke \"$TEXT\""
Result: Worked when running the script directly from a terminal. But when triggered via the Automator hotkey, the text did not appear in the terminal app (Ghostty). The keystroke simulation doesn’t work reliably with terminal apps.
Attempt 2 (Final) — Clipboard Paste
OLD_CLIPBOARD=$(pbpaste)
echo -n "$TEXT" | pbcopy
osascript -e 'tell application "System Events" to keystroke "v" using command down'
sleep 0.5
echo -n "$OLD_CLIPBOARD" | pbcopy
This approach:
- Saves the current clipboard contents
- Copies the transcribed text to clipboard
- Simulates Cmd+V to paste
- Restores the previous clipboard contents
Result: Works reliably in all apps including Ghostty terminal.
Step 5: macOS Notifications for Visual Feedback
Added notification sounds at each stage so the user knows what’s happening (especially important since there’s no terminal output when triggered via hotkey):
notify() {
osascript -e "display notification \"$1\" with title \"Voice to Claude\" sound name \"$2\""
}
Notification stages:
- Recording started: Dialog box with Stop button + “Blow” sound
- Sending to server: “Sending to Whisper server…” notification
- Success: “Transcribed!” with “Glass” sound
- Failure: “No transcription received” with “Basso” sound
Step 6: Global Hotkey Setup
Attempt 1 — skhd (Hotkey Daemon) — FAILED
Installed skhd:
brew install koekeishiya/formulae/skhd
echo 'ctrl + shift - space : ~/.claude/voice-to-claude.sh' >> ~/.skhdrc
skhd --start-service
Problem: macOS requires Accessibility permissions for skhd to work. Multiple attempts to add it failed:
6a. skhd requested Accessibility access — macOS showed the permission dialog: “skhd would like to control your computer using accessibility features”. Clicked “Open System Settings” but skhd did not appear in the Accessibility list.
6b. Tried adding skhd manually via “+” button — Navigated to /opt/homebrew/bin/skhd but it was a symlink (Alias – 29 bytes). macOS wouldn’t accept it as a valid Accessibility app.
6c. Found real skhd binary via readlink -f /usr/local/bin/skhd → /usr/local/Cellar/skhd/0.3.9/bin/skhd (Unix Executable File – 82 KB). Added it via the “+” button and clicked Open.
6d. Still not added — Despite selecting the real binary, skhd still did not appear in the Accessibility list.
6e. Confirmed via logs:
/tmp/skhd_chrislee.err.log: skhd: must be run with accessibility access! abort..
6f. Tried tccutil reset — tccutil reset Accessibility com.koekeishiya.skhd failed because skhd has no bundle identifier.
Conclusion: macOS has difficulty granting Accessibility permissions to command-line binaries without proper app bundle identifiers. skhd approach abandoned.
Attempt 2 (Final) — Automator Quick Action — SUCCESS
- Open Automator → Click New Document → Select Quick Action
- Set “Workflow receives” to “no input” in “any application”
- Search for “Run Shell Script” in the actions list
- Drag “Run Shell Script” into the workflow area
- Ensure Shell is set to /bin/zsh
- Replace
catwith:/Users/chrislee/.claude/voice-to-claude.sh - Save as “Voice to Claude” (Cmd+S)
Important: The script needed export PATH="/usr/local/bin:/opt/homebrew/bin:$PATH" at the top because Automator doesn’t load the user’s shell profile, so rec and curl weren’t found without the full PATH.
Step 7: Assign Keyboard Shortcut
- System Settings → Keyboard → Keyboard Shortcuts…
- Click Services in the left sidebar
- Expand General section
- Find “Voice to Claude” in the list
- Double-click “none” next to it
- Press Ctrl+Shift+Space
- Click Done
Verification: The shortcut appears in Ghostty menu → Services → Voice to Claude showing ^ ⇧ Space.
Step 8: Testing and Verification
Test 1 — Direct Script Execution
"/Volumes/Programming HD/Study/claude_code/voice-to-claude.sh"
Result:
Recording... (speak now, press Ctrl+C when done)
Sending to Whisper server...
Transcribed: This is Chris Lee. This is first time trying to talk to the cloud code. Just test, test, test.
Text was typed into the terminal successfully.
Test 2 — Via Automator Services Menu
Triggered from Ghostty → Services → Voice to Claude.
Result: macOS notification appeared “Sending to Whisper server…” → Text pasted into terminal successfully.
Test 3 — Via Keyboard Shortcut
Pressed Ctrl+Shift+Space while Claude Code was focused.
Result: Dialog box appeared → Spoke into mic → Clicked Stop → Notification “Sending to Whisper server…” → Text pasted: “This is my test to the cloud code, so let’s translate.”
All three test methods confirmed working. Speech to text successfully pastes transcribed text into Claude Code.
Step 9: Fix TTS Hook (SubAgent Speech)
An existing notification_sound.py hook was reading sub-agent responses aloud via text-to-speech. It was picking up suggestion text (like “publish this to notion”) instead of just the work summary.
File: ~/.claude/hooks/notification_sound.py
Fix: Added a filter to stop reading before suggestion/recommendation lines:
if event_name == "SubagentStop" and TTS_ENABLED:
last_message = input_data.get("last_assistant_message", "")
if last_message:
summary_lines = []
for line in last_message.strip().split("\n"):
stripped = line.strip().lower()
if any(phrase in stripped for phrase in [
"let me know", "want me to", "shall i", "you can",
"next step", "try it", "publish this", "review it",
"if you want", "want to", "should i",
"before publishing", "after that",
]):
break
summary_lines.append(line)
filtered = "\n".join(summary_lines).strip()
if filtered:
speak_summary(filtered)
Final Script
Location: ~/.claude/voice-to-claude.sh
#!/bin/bash
export PATH="/usr/local/bin:/opt/homebrew/bin:$PATH"
WHISPER_URL="https://whisper.travel-tube.com/whisper"
notify() {
osascript -e "display notification \"$1\" with title \"Voice to Claude\" sound name \"$2\""
}
rec /tmp/voice_prompt.wav rate 16k channels 1 2>/dev/null &
REC_PID=$!
osascript -e 'display dialog "Recording... Click Stop when done." buttons {"Stop"} default button "Stop" with title "Voice to Claude"' 2>/dev/null
kill $REC_PID 2>/dev/null
wait $REC_PID 2>/dev/null
notify "Sending to Whisper server..." ""
RESPONSE=$(curl -s -X POST "$WHISPER_URL" -F "files=@/tmp/voice_prompt.wav")
TEXT=$(echo "$RESPONSE" | python3 -c "
import sys, json
data = json.load(sys.stdin)
if isinstance(data, list):
print(data[0].get('text', '') if data else '')
elif isinstance(data, dict):
print(data.get('text', data.get('results', [{}])[0].get('transcript', '')))
else:
print(data)
" 2>/dev/null)
if [ -z "$TEXT" ]; then
notify "No transcription received" "Basso"
exit 1
fi
notify "Transcribed!" "Glass"
OLD_CLIPBOARD=$(pbpaste)
echo -n "$TEXT" | pbcopy
osascript -e 'tell application "System Events" to keystroke "v" using command down'
sleep 0.5
echo -n "$OLD_CLIPBOARD" | pbcopy
File Locations (All User Scope)
All files are installed at user level — works across all projects, not tied to any specific project directory.
| File | Location | Scope |
|---|---|---|
| Voice script | ~/.claude/voice-to-claude.sh |
User-level, any project |
| Automator Quick Action | ~/Library/Services/Voice to Claude.workflow |
System-wide, any app |
| Keyboard shortcut | System Settings → Keyboard Shortcuts → Services | Global |
| TTS hook fix | ~/.claude/hooks/notification_sound.py |
User-level hook |
How It Works (Final Flow)
- Press Ctrl+Shift+Space from any application
- macOS dialog box appears: “Recording… Click Stop when done.”
- Speak your prompt into the Mac microphone
- Click Stop button (or press Enter/Return) when finished speaking
- macOS notification: “Sending to Whisper server…”
- Audio WAV file is sent via HTTPS POST to Raspberry Pi Whisper server
- Server returns JSON with transcribed text
- Python parses the JSON response
- Transcribed text is copied to clipboard
- Simulated Cmd+V pastes text into the focused window (Claude Code)
- Previous clipboard contents are restored
- macOS notification: “Transcribed!” with Glass sound
Troubleshooting
Recording stops immediately / No audio captured
- Ensure sox is installed:
brew install sox - Check microphone permissions: System Settings → Privacy & Security → Microphone
No transcription received
Test the Whisper server manually:
rec /tmp/test.wav rate 16k channels 1 trim 0 5
curl -v -X POST "https://whisper.travel-tube.com/whisper" -F "files=@/tmp/test.wav"
Hotkey not triggering
- Verify Quick Action exists: Ghostty menu bar → Services → Voice to Claude
- Re-assign shortcut: System Settings → Keyboard → Keyboard Shortcuts → Services → General
- Some key combos don’t work in certain apps — try a different shortcut
Text not appearing in the focused window
- The script uses clipboard paste (Cmd+V) — ensure the target window has focus
- Check that Automator/the terminal has Accessibility permissions
PATH issues when running via Automator
The script must include export PATH="/usr/local/bin:/opt/homebrew/bin:$PATH" because Automator doesn’t load shell profiles.
Lessons Learned
- skhd doesn’t work easily on macOS — CLI binaries without app bundle identifiers can’t be added to the Accessibility permissions list reliably. Automator Quick Actions are a more reliable alternative for global hotkeys.
- osascript keystroke doesn’t work in terminal apps — Terminal emulators (Ghostty, iTerm, etc.) don’t receive simulated keystrokes well. Clipboard-based paste (pbcopy + Cmd+V) is the reliable solution.
- Silence detection is unreliable for speech — Natural speech has pauses that trigger silence detection. A manual stop button (via macOS dialog) gives the user full control.
- Automator needs full PATH — When scripts run via Automator, they don’t inherit the user’s shell PATH. Always export the full PATH at the top of the script.
- Always install sox first — Without sox, the
reccommand silently fails, making it look like a script logic issue when it’s actually a missing dependency.
