V2
This commit is contained in:
88
local_stt_bridge/LOCAL_STT_BRIDGE_GUIDE.md
Normal file
88
local_stt_bridge/LOCAL_STT_BRIDGE_GUIDE.md
Normal file
@@ -0,0 +1,88 @@
|
||||
# Local STT Bridge (faster-whisper)
|
||||
|
||||
Self-hosted Speech-to-Text bridge for the Chrome extension.
|
||||
|
||||
Primary project documentation lives in `README.md`.
|
||||
|
||||
## 1) Install
|
||||
|
||||
Use Python 3.11 or 3.12 (recommended). Python 3.13 may force source builds for audio deps.
|
||||
|
||||
```bash
|
||||
cd local_stt_bridge
|
||||
python3.11 -m venv .venv
|
||||
source .venv/bin/activate
|
||||
pip install --upgrade pip setuptools wheel
|
||||
pip install -r requirements.txt
|
||||
```
|
||||
|
||||
### macOS build prerequisites (required if `av`/PyAV tries to build)
|
||||
|
||||
```bash
|
||||
brew install pkg-config ffmpeg
|
||||
```
|
||||
|
||||
If install still fails on `PyAV`, recreate the venv with Python 3.11 and retry.
|
||||
|
||||
## 2) Run
|
||||
|
||||
```bash
|
||||
cd local_stt_bridge
|
||||
source .venv/bin/activate
|
||||
export STT_MODEL=small
|
||||
export STT_DEVICE=auto
|
||||
export STT_COMPUTE_TYPE=int8
|
||||
# Optional auth key:
|
||||
# export STT_API_KEY=your_local_key
|
||||
uvicorn server:app --host 0.0.0.0 --port 8790
|
||||
```
|
||||
|
||||
## 3) Verify
|
||||
|
||||
```bash
|
||||
curl http://localhost:8790/health
|
||||
```
|
||||
|
||||
## 4) Extension Setup
|
||||
|
||||
In side panel:
|
||||
- Assistant Setup -> Speech-to-Text Provider: `Local faster-whisper bridge`
|
||||
- STT Model: `small` (start here)
|
||||
- Local STT endpoint: `http://localhost:8790/transcribe`
|
||||
- Optional Local STT API key if `STT_API_KEY` is set on server
|
||||
- Optional quality/language controls:
|
||||
- Language Mode: `Auto-detect` or `Force language`
|
||||
- Forced language: e.g. `en`, `fr`, `de`, `ar`
|
||||
- Task: `transcribe` or `translate`
|
||||
- VAD filter: on/off
|
||||
- Beam size: integer (default `5`)
|
||||
- Click `Test STT Connection` from the extension to validate endpoint reachability.
|
||||
|
||||
## API contract expected by the extension
|
||||
|
||||
`POST /transcribe` with `multipart/form-data`:
|
||||
|
||||
- `file` (required): uploaded audio chunk (`webm`/`mp4`/`wav`)
|
||||
- `task` (optional): `transcribe` or `translate`
|
||||
- `vad_filter` (optional): `true`/`false`
|
||||
- `beam_size` (optional): integer
|
||||
- `language` (optional): language code
|
||||
- `model` (optional): model hint
|
||||
|
||||
Optional auth headers when enabled:
|
||||
|
||||
- `Authorization: Bearer <token>`
|
||||
- `x-api-key: <token>`
|
||||
|
||||
`GET /health` is used by extension `Test STT Connection`.
|
||||
|
||||
## Public domain + HTTPS note
|
||||
|
||||
If you expose this service on a public domain, use HTTPS via reverse proxy.
|
||||
Chrome may auto-upgrade `http://` on HSTS domains to `https://`, which causes plain HTTP Uvicorn ports to fail with `Invalid HTTP request received`.
|
||||
|
||||
## Notes
|
||||
|
||||
- `faster-whisper` relies on FFmpeg for many input formats.
|
||||
- For best CPU cost/performance, use `small` or `medium`.
|
||||
- `large-v3` improves quality but uses significantly more compute.
|
||||
BIN
local_stt_bridge/__pycache__/server.cpython-313.pyc
Normal file
BIN
local_stt_bridge/__pycache__/server.cpython-313.pyc
Normal file
Binary file not shown.
3
local_stt_bridge/requirements.txt
Normal file
3
local_stt_bridge/requirements.txt
Normal file
@@ -0,0 +1,3 @@
|
||||
fastapi==0.115.0
|
||||
uvicorn[standard]==0.30.6
|
||||
faster-whisper==1.0.3
|
||||
92
local_stt_bridge/server.py
Normal file
92
local_stt_bridge/server.py
Normal file
@@ -0,0 +1,92 @@
|
||||
import base64
|
||||
import os
|
||||
import tempfile
|
||||
from typing import Optional
|
||||
|
||||
from fastapi import FastAPI, Header, HTTPException
|
||||
from fastapi.middleware.cors import CORSMiddleware
|
||||
from pydantic import BaseModel
|
||||
|
||||
try:
|
||||
from faster_whisper import WhisperModel
|
||||
except ImportError as exc: # pragma: no cover
|
||||
raise RuntimeError("faster-whisper is required. Install dependencies from requirements.txt") from exc
|
||||
|
||||
|
||||
STT_MODEL = os.getenv("STT_MODEL", "small")
|
||||
STT_DEVICE = os.getenv("STT_DEVICE", "auto")
|
||||
STT_COMPUTE_TYPE = os.getenv("STT_COMPUTE_TYPE", "int8")
|
||||
STT_API_KEY = os.getenv("STT_API_KEY", "").strip()
|
||||
|
||||
app = FastAPI(title="Local STT Bridge", version="1.0.0")
|
||||
app.add_middleware(
|
||||
CORSMiddleware,
|
||||
allow_origins=["*"],
|
||||
allow_credentials=False,
|
||||
allow_methods=["*"],
|
||||
allow_headers=["*"],
|
||||
)
|
||||
|
||||
model = WhisperModel(STT_MODEL, device=STT_DEVICE, compute_type=STT_COMPUTE_TYPE)
|
||||
|
||||
|
||||
class TranscribeRequest(BaseModel):
|
||||
audioBase64: str
|
||||
mimeType: Optional[str] = "audio/webm"
|
||||
captureMode: Optional[str] = "tab"
|
||||
model: Optional[str] = None
|
||||
|
||||
|
||||
@app.get("/health")
|
||||
def health():
|
||||
return {
|
||||
"ok": True,
|
||||
"engine": "faster-whisper",
|
||||
"model": STT_MODEL,
|
||||
"device": STT_DEVICE,
|
||||
"computeType": STT_COMPUTE_TYPE,
|
||||
}
|
||||
|
||||
|
||||
@app.post("/transcribe")
|
||||
def transcribe(payload: TranscribeRequest, x_stt_api_key: Optional[str] = Header(default=None)):
|
||||
if STT_API_KEY and x_stt_api_key != STT_API_KEY:
|
||||
raise HTTPException(status_code=401, detail="Invalid STT API key")
|
||||
|
||||
try:
|
||||
audio_bytes = base64.b64decode(payload.audioBase64)
|
||||
except Exception as exc:
|
||||
raise HTTPException(status_code=400, detail=f"Invalid base64 audio payload: {exc}") from exc
|
||||
|
||||
suffix = ".webm"
|
||||
if payload.mimeType and "mp4" in payload.mimeType:
|
||||
suffix = ".mp4"
|
||||
elif payload.mimeType and "wav" in payload.mimeType:
|
||||
suffix = ".wav"
|
||||
|
||||
with tempfile.NamedTemporaryFile(suffix=suffix, delete=False) as tmp:
|
||||
tmp.write(audio_bytes)
|
||||
tmp_path = tmp.name
|
||||
|
||||
try:
|
||||
segments, info = model.transcribe(
|
||||
tmp_path,
|
||||
vad_filter=True,
|
||||
beam_size=1,
|
||||
language=None,
|
||||
)
|
||||
text = " ".join(segment.text.strip() for segment in segments).strip()
|
||||
return {
|
||||
"success": True,
|
||||
"text": text,
|
||||
"language": info.language,
|
||||
"duration": info.duration,
|
||||
}
|
||||
except Exception as exc:
|
||||
raise HTTPException(status_code=500, detail=f"Transcription failed: {exc}") from exc
|
||||
finally:
|
||||
try:
|
||||
os.remove(tmp_path)
|
||||
except OSError:
|
||||
pass
|
||||
|
||||
Reference in New Issue
Block a user