Local STT Bridge (faster-whisper)

Self-hosted Speech-to-Text bridge for the Chrome extension.

Primary project documentation lives in README.md.

1) Install

Use Python 3.11 or 3.12 (recommended). Python 3.13 may force source builds for audio deps.

cd local_stt_bridge
python3.11 -m venv .venv
source .venv/bin/activate
pip install --upgrade pip setuptools wheel
pip install -r requirements.txt

macOS build prerequisites (required if `av`/PyAV tries to build)

brew install pkg-config ffmpeg

If install still fails on PyAV, recreate the venv with Python 3.11 and retry.

2) Run

cd local_stt_bridge
source .venv/bin/activate
export STT_MODEL=small
export STT_DEVICE=auto
export STT_COMPUTE_TYPE=int8
# Optional auth key:
# export STT_API_KEY=your_local_key
uvicorn server:app --host 0.0.0.0 --port 8790

3) Verify

curl http://localhost:8790/health

4) Extension Setup

In side panel:

Assistant Setup -> Speech-to-Text Provider: Local faster-whisper bridge
STT Model: small (start here)
Local STT endpoint: http://localhost:8790/transcribe
Optional Local STT API key if STT_API_KEY is set on server
Optional quality/language controls:
- Language Mode: Auto-detect or Force language
- Forced language: e.g. en, fr, de, ar
- Task: transcribe or translate
- VAD filter: on/off
- Beam size: integer (default 5)
Click Test STT Connection from the extension to validate endpoint reachability.

API contract expected by the extension

POST /transcribe with multipart/form-data:

file (required): uploaded audio chunk (webm/mp4/wav)
task (optional): transcribe or translate
vad_filter (optional): true/false
beam_size (optional): integer
language (optional): language code
model (optional): model hint

Optional auth headers when enabled:

Authorization: Bearer <token>
x-api-key: <token>

GET /health is used by extension Test STT Connection.

Public domain + HTTPS note

If you expose this service on a public domain, use HTTPS via reverse proxy. Chrome may auto-upgrade http:// on HSTS domains to https://, which causes plain HTTP Uvicorn ports to fail with Invalid HTTP request received.

Notes

faster-whisper relies on FFmpeg for many input formats.
For best CPU cost/performance, use small or medium.
large-v3 improves quality but uses significantly more compute.

2.4 KiB Raw Blame History