89 lines
2.4 KiB
Markdown
89 lines
2.4 KiB
Markdown
# Local STT Bridge (faster-whisper)
|
|
|
|
Self-hosted Speech-to-Text bridge for the Chrome extension.
|
|
|
|
Primary project documentation lives in `README.md`.
|
|
|
|
## 1) Install
|
|
|
|
Use Python 3.11 or 3.12 (recommended). Python 3.13 may force source builds for audio deps.
|
|
|
|
```bash
|
|
cd local_stt_bridge
|
|
python3.11 -m venv .venv
|
|
source .venv/bin/activate
|
|
pip install --upgrade pip setuptools wheel
|
|
pip install -r requirements.txt
|
|
```
|
|
|
|
### macOS build prerequisites (required if `av`/PyAV tries to build)
|
|
|
|
```bash
|
|
brew install pkg-config ffmpeg
|
|
```
|
|
|
|
If install still fails on `PyAV`, recreate the venv with Python 3.11 and retry.
|
|
|
|
## 2) Run
|
|
|
|
```bash
|
|
cd local_stt_bridge
|
|
source .venv/bin/activate
|
|
export STT_MODEL=small
|
|
export STT_DEVICE=auto
|
|
export STT_COMPUTE_TYPE=int8
|
|
# Optional auth key:
|
|
# export STT_API_KEY=your_local_key
|
|
uvicorn server:app --host 0.0.0.0 --port 8790
|
|
```
|
|
|
|
## 3) Verify
|
|
|
|
```bash
|
|
curl http://localhost:8790/health
|
|
```
|
|
|
|
## 4) Extension Setup
|
|
|
|
In side panel:
|
|
- Assistant Setup -> Speech-to-Text Provider: `Local faster-whisper bridge`
|
|
- STT Model: `small` (start here)
|
|
- Local STT endpoint: `http://localhost:8790/transcribe`
|
|
- Optional Local STT API key if `STT_API_KEY` is set on server
|
|
- Optional quality/language controls:
|
|
- Language Mode: `Auto-detect` or `Force language`
|
|
- Forced language: e.g. `en`, `fr`, `de`, `ar`
|
|
- Task: `transcribe` or `translate`
|
|
- VAD filter: on/off
|
|
- Beam size: integer (default `5`)
|
|
- Click `Test STT Connection` from the extension to validate endpoint reachability.
|
|
|
|
## API contract expected by the extension
|
|
|
|
`POST /transcribe` with `multipart/form-data`:
|
|
|
|
- `file` (required): uploaded audio chunk (`webm`/`mp4`/`wav`)
|
|
- `task` (optional): `transcribe` or `translate`
|
|
- `vad_filter` (optional): `true`/`false`
|
|
- `beam_size` (optional): integer
|
|
- `language` (optional): language code
|
|
- `model` (optional): model hint
|
|
|
|
Optional auth headers when enabled:
|
|
|
|
- `Authorization: Bearer <token>`
|
|
- `x-api-key: <token>`
|
|
|
|
`GET /health` is used by extension `Test STT Connection`.
|
|
|
|
## Public domain + HTTPS note
|
|
|
|
If you expose this service on a public domain, use HTTPS via reverse proxy.
|
|
Chrome may auto-upgrade `http://` on HSTS domains to `https://`, which causes plain HTTP Uvicorn ports to fail with `Invalid HTTP request received`.
|
|
|
|
## Notes
|
|
|
|
- `faster-whisper` relies on FFmpeg for many input formats.
|
|
- For best CPU cost/performance, use `small` or `medium`.
|
|
- `large-v3` improves quality but uses significantly more compute.
|