2.4 KiB
2.4 KiB
Local STT Bridge (faster-whisper)
Self-hosted Speech-to-Text bridge for the Chrome extension.
Primary project documentation lives in README.md.
1) Install
Use Python 3.11 or 3.12 (recommended). Python 3.13 may force source builds for audio deps.
cd local_stt_bridge
python3.11 -m venv .venv
source .venv/bin/activate
pip install --upgrade pip setuptools wheel
pip install -r requirements.txt
macOS build prerequisites (required if av/PyAV tries to build)
brew install pkg-config ffmpeg
If install still fails on PyAV, recreate the venv with Python 3.11 and retry.
2) Run
cd local_stt_bridge
source .venv/bin/activate
export STT_MODEL=small
export STT_DEVICE=auto
export STT_COMPUTE_TYPE=int8
# Optional auth key:
# export STT_API_KEY=your_local_key
uvicorn server:app --host 0.0.0.0 --port 8790
3) Verify
curl http://localhost:8790/health
4) Extension Setup
In side panel:
- Assistant Setup -> Speech-to-Text Provider:
Local faster-whisper bridge - STT Model:
small(start here) - Local STT endpoint:
http://localhost:8790/transcribe - Optional Local STT API key if
STT_API_KEYis set on server - Optional quality/language controls:
- Language Mode:
Auto-detectorForce language - Forced language: e.g.
en,fr,de,ar - Task:
transcribeortranslate - VAD filter: on/off
- Beam size: integer (default
5)
- Language Mode:
- Click
Test STT Connectionfrom the extension to validate endpoint reachability.
API contract expected by the extension
POST /transcribe with multipart/form-data:
file(required): uploaded audio chunk (webm/mp4/wav)task(optional):transcribeortranslatevad_filter(optional):true/falsebeam_size(optional): integerlanguage(optional): language codemodel(optional): model hint
Optional auth headers when enabled:
Authorization: Bearer <token>x-api-key: <token>
GET /health is used by extension Test STT Connection.
Public domain + HTTPS note
If you expose this service on a public domain, use HTTPS via reverse proxy.
Chrome may auto-upgrade http:// on HSTS domains to https://, which causes plain HTTP Uvicorn ports to fail with Invalid HTTP request received.
Notes
faster-whisperrelies on FFmpeg for many input formats.- For best CPU cost/performance, use
smallormedium. large-v3improves quality but uses significantly more compute.