This commit is contained in:
2026-02-13 19:24:20 +01:00
parent 56d56395ee
commit 9dc2dff84a
22 changed files with 5897 additions and 528 deletions

View File

@@ -1,84 +1,66 @@
# AI Providers Guide
## Supported AI Providers
## Scope
Your AI Interview Assistant now supports multiple AI providers! Here's how to set up and use each one:
This guide covers **chat/response providers** used by the extension after transcription.
## 🤖 **OpenAI (GPT)**
- **Models Available**: GPT-4o, GPT-4o-mini, GPT-4-turbo, GPT-3.5-turbo
- **API Key**: Get from [OpenAI Platform](https://platform.openai.com/account/api-keys)
- **Recommended Model**: GPT-4o-mini (good balance of speed and quality)
- **Cost**: Pay per token usage
Note: Speech-to-text is configured separately in Assistant Setup (`STT Provider`, `STT Model`, language/task/VAD/beam settings).
## 🧠 **Anthropic (Claude)**
- **Models Available**: Claude-3.5-Sonnet, Claude-3.5-Haiku, Claude-3-Opus
- **API Key**: Get from [Anthropic Console](https://console.anthropic.com/)
- **Recommended Model**: Claude-3.5-Sonnet (excellent reasoning)
- **Cost**: Pay per token usage
## Supported Chat Providers
## 🔍 **Google (Gemini)**
- **Models Available**: Gemini-1.5-Pro, Gemini-1.5-Flash, Gemini-Pro
- **API Key**: Get from [Google AI Studio](https://aistudio.google.com/app/apikey)
- **Recommended Model**: Gemini-1.5-Flash (fast and efficient)
- **Cost**: Free tier available, then pay per token
### OpenAI
- Default models in UI: `gpt-4o`, `gpt-4o-mini`, `gpt-4-turbo`, `gpt-3.5-turbo`
- API key: https://platform.openai.com/account/api-keys
- Good default: `gpt-4o-mini` (speed/cost balance)
## 🌊 **DeepSeek**
- **Models Available**: DeepSeek-Chat, DeepSeek-Reasoner
- **API Key**: Get from [DeepSeek Platform](https://platform.deepseek.com/)
- **Recommended Model**: DeepSeek-Chat (general use)
- **Cost**: Pay per token usage
### Anthropic
- Default models in UI: `claude-3-5-sonnet-20241022`, `claude-3-5-haiku-20241022`, `claude-3-opus-20240229`
- API key: https://console.anthropic.com/
- Good default: `claude-3-5-sonnet-20241022`
## 🏠 **Ollama (Local)**
- **Models Available**: Llama3.2, Llama3.1, Mistral, CodeLlama, Phi3
- **Setup**: Install [Ollama](https://ollama.ai/) locally
- **No API Key Required**: Runs completely on your machine
- **Cost**: Free (uses your computer's resources)
### Google Gemini
- Default models in UI: `gemini-1.5-pro`, `gemini-1.5-flash`, `gemini-pro`
- API key: https://aistudio.google.com/app/apikey
- Good default: `gemini-1.5-flash`
## 🚀 **How to Setup**
### DeepSeek
- Default models in UI: `deepseek-chat`, `deepseek-reasoner`
- API key: https://platform.deepseek.com/
- Good default: `deepseek-chat`
### 1. **Choose Your Provider**
- Open the extension side panel
- Select your preferred AI provider from the dropdown
### Ollama (local)
- Default models in UI: `llama3.2`, `llama3.1`, `mistral`, `codellama`, `phi3`
- API key: not required
- Endpoint used by extension: `http://localhost:11434`
### 2. **Select Model**
- Choose the specific model you want to use
- Different models have different capabilities and speeds
## Model List Behavior
### 3. **Add API Key** (if required)
- Enter your API key for the selected provider
- Ollama doesn't require an API key
- Keys are stored securely in Chrome's storage
- For cloud providers, if an API key is saved, the extension attempts to fetch live model lists.
- If model fetch fails, the extension falls back to the built-in default model list above.
- For Ollama, the extension reads models from `/api/tags`.
### 4. **Start Using**
- Click "Start Listening" to begin audio capture
- The extension will use your selected AI provider for responses
## Setup Steps
## 💡 **Tips**
1. Open side panel -> `Assistant Setup`.
2. Choose `AI Provider`.
3. Save provider API key (not needed for Ollama).
4. Select model.
5. Start listening.
- **For Speed**: Use GPT-4o-mini, Gemini-1.5-Flash, or Claude-3.5-Haiku
- **For Quality**: Use GPT-4o, Claude-3.5-Sonnet, or Gemini-1.5-Pro
- **For Privacy**: Use Ollama (runs locally, no data sent to servers)
- **For Free Usage**: Try Google Gemini's free tier or set up Ollama
## Recommended Defaults
## 🔧 **Ollama Setup**
- Fastest general: `gpt-4o-mini` / `gemini-1.5-flash` / `claude-3-5-haiku-20241022`
- Highest quality: `gpt-4o` / `claude-3-5-sonnet-20241022` / `gemini-1.5-pro`
- Local-only privacy: `ollama` + local STT
If you want to use Ollama (local AI):
## Troubleshooting
1. Install Ollama from [ollama.ai](https://ollama.ai/)
2. Run: `ollama pull llama3.2` (or your preferred model)
3. Make sure Ollama is running: `ollama serve`
4. Select "Ollama (Local)" in the extension
- `API key not set`: save provider key in Assistant Setup.
- `Failed to fetch models`: key may be invalid, provider API unavailable, or network blocked. Default model list is used as fallback.
- `Ollama connection failed`: ensure `ollama serve` is running and model is pulled.
- Slow or expensive responses: switch to smaller/faster model and enable Speed mode.
## 🆘 **Troubleshooting**
## Storage Note
- **"API key not set"**: Make sure you've entered a valid API key
- **"Failed to connect"**: Check your internet connection (or Ollama service for local)
- **"Invalid API key"**: Verify your API key is correct and has sufficient credits
- **Slow responses**: Try switching to a faster model like GPT-4o-mini or Gemini-1.5-Flash
## 🔒 **Privacy & Security**
- API keys are stored locally in Chrome's secure storage
- Only the selected provider receives your audio transcriptions
- Ollama option keeps everything completely local
- No audio data is stored permanently
- Provider API keys are stored in extension sync storage (`chrome.storage.sync`).
- Keep least-privilege keys where possible and rotate keys regularly.

View File

@@ -10,7 +10,7 @@ Context management allows you to provide additional information (like your CV, j
#### 1. **Upload Files**
- Click the "Upload Files" tab in the Context Management section
- Click "📁 Upload CV/Job Description"
- Select your files (supports TXT, PDF, DOC, DOCX)
- Select your files (supports TXT, PDF, DOCX)
- Files will be automatically processed and saved
#### 2. **Add Text Directly**
@@ -44,7 +44,7 @@ AI Response: *"Based on your background, you have 3 years of Python experience a
## 📱 Multi-Device Listening
### What is Multi-Device Listening?
This feature allows you to use the AI Interview Assistant from other devices (phones, tablets, other computers) while keeping the main processing on your primary Chrome browser.
This feature allows you to use the AI Assistant from other devices (phones, tablets, other computers) while keeping the main processing on your primary Chrome browser.
### How to Enable Multi-Device Access
@@ -94,7 +94,7 @@ This feature allows you to use the AI Interview Assistant from other devices (ph
### 1. **Reload the Extension**
After the updates, reload the extension in Chrome:
- Go to `chrome://extensions/`
- Find "AI Interview Assistant"
- Find "AI Assistant"
- Click the reload button 🔄
### 2. **Configure Context**
@@ -177,4 +177,4 @@ After the updates, reload the extension in Chrome:
4. **Practice** - Use the enhanced features in mock interviews
5. **Customize** - Adjust context for different types of interviews
The AI Interview Assistant is now much more powerful and flexible. Use these features to get more personalized, relevant responses that truly reflect your background and the specific role you're interviewing for!
The AI Assistant is now much more powerful and flexible. Use these features to get more personalized, relevant responses that truly reflect your background and the specific role you're interviewing for!

View File

@@ -1,78 +1,111 @@
# Personal Browser Companion - Plans & To-Do
## Classification
- Core = works in extension-only mode (no extra server required).
- Optional = requires extra server/services (MCP, cloud sync, external APIs) and is opt-in per user.
## Goals
- Start local-first with an option to sync to cloud.
- Online-only operation (LLM required for decisions).
- Auto-start mode during meetings.
- Integrations: calendar, email, Discord, Nextcloud.
- [ ] [Core] Start local-first with an option to sync to cloud.
- [ ] [Core] Online-only operation (LLM required for decisions).
- [ ] [Core] Auto-start mode during meetings.
- [ ] [Optional] Integrations: calendar, email, Discord, Nextcloud.
## Phase Plan
### Phase 1: Local MVP (Foundation)
- Local storage for sessions, summaries, and user profile.
- Meeting/interview modes with manual start and overlay UI.
- Basic memory retrieval: recent session summaries + user profile.
- Audio capture + STT pipeline (mic + tab) and transcript display.
- Privacy controls: store/forget, per-session toggle.
### Phase 1: Local MVP (Foundation) [Core]
- [x] [Core] Local storage for sessions, summaries, and user profile.
- [x] [Core] Meeting/interview modes with manual start and overlay UI.
- [x] [Core] Basic memory retrieval: recent session summaries + user profile.
- [ ] [Core] Audio capture + STT pipeline (mic + tab) and transcript display.
- [x] [Core] Privacy controls: store/forget, per-session toggle.
### Phase 2: Smart Auto-Start
- Detect meeting tabs (Google Meet, Zoom, Teams) and prompt to start.
- Auto-start rules (domain allowlist, time-based, calendar hints).
- Lightweight on-device heuristics for meeting detection.
### Phase 2: Smart Auto-Start [Core]
- [ ] [Core] Detect meeting tabs (Google Meet, Zoom, Teams) and prompt to start.
- [ ] [Core] Auto-start rules (domain allowlist, time-based, calendar hints).
- [ ] [Core] Lightweight on-device heuristics for meeting detection.
### Phase 3: Cloud Sync (Optional)
- Opt-in cloud sync for memory + settings.
- Conflict resolution strategy (last-write wins + merge for summaries).
- Encryption at rest, user-controlled delete/export.
### Phase 3: Cloud Sync (Optional) [Optional]
- [ ] [Optional] Opt-in cloud sync for memory + settings.
- [ ] [Optional] Conflict resolution strategy (last-write wins + merge for summaries).
- [ ] [Optional] Encryption at rest, user-controlled delete/export.
### Phase 4: Integrations (MCP)
- Calendar: read upcoming meetings, attach context.
- Email: draft follow-ups, summaries.
- Discord: post meeting summary or action items to a channel.
- Nextcloud: store meeting notes, transcripts, and attachments.
### Phase 4: Integrations (MCP) [Optional]
- [ ] [Optional] Calendar: read upcoming meetings, attach context.
- [ ] [Optional] Email: draft follow-ups, summaries.
- [ ] [Optional] Discord: post meeting summary or action items to a channel.
- [ ] [Optional] Nextcloud: store meeting notes, transcripts, and attachments.
## MVP To-Do (Local)
### Core
- Define memory schema (profile, session, summary, action items).
- Implement local RAG: index summaries + profile into embeddings.
- Add session lifecycle: start, pause, end, summarize.
- [x] [Core] Define memory schema (profile, session, summary, action items).
- [x] [Core] Implement local RAG: index summaries + profile into embeddings.
- [x] [Core] Add session lifecycle: start, pause, end, summarize.
### Audio + STT
- Implement reliable STT for tab audio (server-side if needed).
- Keep mic-only STT as fallback.
- Add device selection + live mic monitor.
- [x] [Core] Implement reliable STT for tab audio (OpenAI Whisper chunk transcription from tab/mixed audio).
- [x] [Core] Keep mic-only STT as fallback.
- [x] [Core] Add device selection + live mic monitor.
- [x] [Core] Add separate STT settings (provider/model/API key) independent from chat provider.
- [x] [Optional] Add local STT bridge support (self-hosted faster-whisper endpoint).
- [x] [Core] Add STT "Test Connection" action in Assistant Setup.
- [x] [Core] Add multilingual STT controls (auto/forced language, task, VAD, beam size) with session language lock in auto mode.
### UI/UX
- Overlay controls: resize, hide/show, minimize.
- Auto-start toggle in side panel.
- Session summary view with “save to memory” toggle.
- [x] [Core] Overlay controls: resize, hide/show, minimize.
- [x] [Core] Auto-start toggle in side panel.
- [x] [Core] Session summary view with “save to memory” toggle.
- [x] [Core] Sidebar automation preset selector before Start Listening.
- [x] [Core] One-click session context selector before Start Listening.
- [x] [Core] Profile-scoped context loading to reduce cross-session prompt leakage.
- [x] [Core] Profile manager UI (create/edit/delete profile with mode + prompt).
- [ ] [Core] Import/export context profiles.
### Privacy
- Per-session storage consent prompt.
- “Forget session” button.
- [x] [Core] Per-session storage consent prompt.
- [x] [Core] “Forget session” button.
### Advanced Settings (Core)
- [x] [Core] Open full settings window from side panel (⚙️).
- [x] [Core] Webhook test: send sample payload and show status.
- [x] [Core] MCP connection test (basic reachability).
- [x] [Core] Cloud endpoint validation (basic reachability).
- [ ] [Core] Automation framework: triggers + actions + approval flow.
## Integration To-Do (MCP)
### MCP Server Options
- Build a local MCP server as a bridge for integrations.
- Use MCP tool registry for calendar/email/Discord/Nextcloud.
- [ ] [Optional] Build a local MCP server as a bridge for integrations.
- [ ] [Optional] Use MCP tool registry for calendar/email/Discord/Nextcloud.
### Automation (Rules Engine)
- [ ] [Core] Configure triggers (session start/end/manual, meeting domain filters).
- [ ] [Core] Configure actions per trigger (MCP tool + args).
- [ ] [Core] Approval mode: auto-send or review before send.
- [ ] [Core] Run actions on session end (hook into session lifecycle).
- [ ] [Core] Manual “Run Actions” button.
### Calendar
- Read upcoming meetings and titles.
- Auto-attach relevant context packs.
- [ ] [Optional] Read upcoming meetings and titles.
- [ ] [Optional] Auto-attach relevant context packs.
### Email
- Generate follow-up drafts from summary + action items.
- [ ] [Optional] Generate follow-up drafts from summary + action items.
### Discord
- Post meeting summary/action items to a selected channel.
- [ ] [Optional] Post meeting summary/action items to a selected channel.
### Nextcloud
- Upload meeting notes and transcripts.
- [ ] [Optional] Upload meeting notes and transcripts.
## Open Questions
- Preferred cloud provider for sync?
- How long should session memories persist by default?
- Should auto-start be opt-in per domain or global?
- What data should be redacted before sync?
- [Core] How do we isolate interview vs meeting prompts/contexts safely?
- Best solution: Use explicit context profiles (e.g., Interview, Standup, Sales) with separate prompt + context store per profile, and require users to pick one profile before Start Listening.
- [Optional] Preferred cloud provider for sync?
- Best solution: Start with Supabase (Postgres + Auth + Storage) for fastest MVP, then add S3-compatible storage as an optional backend for enterprise/self-hosting.
- [Core] How long should session memories persist by default?
- Best solution: 90 days by default with per-session “keep forever” and a global retention slider (7/30/90/365 days).
- [Core] Should auto-start be opt-in per domain or global?
- Best solution: Opt-in per domain, with a one-click “trust this site” prompt on first detection.
- [Optional] What data should be redacted before sync?
- Best solution: Default to redacting emails, phone numbers, calendar IDs, and detected secrets (API keys/tokens) while letting users add custom redaction rules.

119
README.md
View File

@@ -1,13 +1,66 @@
# AI Interview Assistant Chrome Extension
# AI Assistant Chrome Extension
## Overview
The AI Interview Assistant is a Chrome extension designed to help users during interviews or meetings by providing real-time AI-powered responses to questions. It listens to the audio from the current tab, transcribes the speech, identifies questions, and generates concise answers using OpenAI's GPT model.
AI Assistant is a Chrome extension for live meeting/interview support. It captures audio, transcribes speech, and generates concise AI responses with configurable chat and STT providers.
Current extension version: `1.1.0`
<div align="center">
<img src="screenshot.png">
<img src="Screenshot.png" alt="AI Assistant side panel">
</div>
## Screenshots
### Main side panel
<div align="center">
<img src="Screenshot.png" alt="Main side panel">
</div>
### Advanced setup
<div align="center">
<img src="Screenshot-advanced.png" alt="Advanced settings">
</div>
## Table of Contents
- [Documentation Index](#documentation-index)
- [Quick Start (2 Minutes)](#quick-start-2-minutes)
- [Features](#features)
- [Installation](#installation)
- [Usage](#usage)
- [Custom Sessions (Context Profiles)](#custom-sessions-context-profiles)
- [Automation in Side Panel](#automation-in-side-panel)
- [Plans & Roadmap](#plans--roadmap)
- [Recent Improvements](#recent-improvements)
- [Privacy and Security](#privacy-and-security)
- [Troubleshooting](#troubleshooting)
- [Contributing](#contributing)
- [License](#license)
- [Disclaimer](#disclaimer)
## Documentation Index
Use this `README.md` as the main entrypoint. Additional docs:
- Product roadmap and task tracking: `Plans_and_Todo.md`
- AI provider setup/details: `AI_PROVIDERS_GUIDE.md`
- New features and updates: `NEW_FEATURES_GUIDE.md`
- Local self-hosted STT bridge: `local_stt_bridge/LOCAL_STT_BRIDGE_GUIDE.md`
## Quick Start (2 Minutes)
1. Load the extension in `chrome://extensions` (Developer Mode → Load unpacked).
2. Open the side panel and set **AI Provider**, **Model**, and **API key**.
3. In **Assistant Setup**, choose **Speech-to-Text Provider** (`OpenAI`, `Local faster-whisper`, or `Browser`).
4. Configure STT quality controls (`Language Mode`, optional `Forced language`, `Task`, `VAD`, `Beam size`).
5. Use **Test STT Connection** to validate STT endpoint/key.
6. In **Session Context**, pick a profile (or create one in **Context → Manage Profiles**).
7. (Optional) Pick an **Automation Preset**.
8. Click **Start Listening**.
## Features
- Real-time audio capture (tab, mic, or mixed mode)
@@ -15,8 +68,12 @@ The AI Interview Assistant is a Chrome extension designed to help users during i
- AI-powered responses with multiple providers (OpenAI, Anthropic, Google, DeepSeek, Ollama)
- Persistent side panel interface
- Secure API key storage
- Context management (upload or paste documents for better answers)
- Context profiles (prebuilt + custom) with profile-scoped context isolation
- Context management (upload or paste documents per profile)
- Speed mode (faster, shorter responses)
- Automation preset selector in side panel (automatic or one selected automation)
- Separate STT settings (OpenAI Whisper, Browser STT, or local faster-whisper bridge)
- Multilingual STT controls (auto/forced language, task mode, VAD, beam size)
- Multi-device demo mode for remote access
- Overlay controls: drag, resize, minimize, detach, hide/show
- Mic monitor with input device selection and live level meter
@@ -38,21 +95,56 @@ The AI Interview Assistant is a Chrome extension designed to help users during i
4. Click on "Load unpacked" and select the directory containing the extension files.
5. The AI Interview Assistant extension should now appear in your list of installed extensions.
5. The AI Assistant extension should now appear in your list of installed extensions.
## Usage
1. Click on the AI Interview Assistant icon in the Chrome toolbar to open the side panel.
1. Click on the AI Assistant icon in the Chrome toolbar to open the side panel.
2. Enter your OpenAI API key in the provided input field and click "Save API Key".
2. Select your provider/model and save the provider API key.
3. Click "Start Listening" to begin capturing audio from the current tab.
3. In **Assistant Setup**, configure **Speech-to-Text Provider**:
- `OpenAI Whisper` for hosted tab/mixed transcription
- `Local faster-whisper bridge` for self-hosted STT (`local_stt_bridge/LOCAL_STT_BRIDGE_GUIDE.md`)
- `Browser SpeechRecognition` for mic-oriented local recognition
- Tune multilingual/quality options:
- `Language Mode`: `Auto-detect` or `Force language`
- `Forced language`: language code (for example `en`, `fr`, `de`, `ar`)
- `Task`: `Transcribe` or `Translate to English`
- `VAD`: enable/disable silence filtering
- `Beam size`: decoding quality/performance tradeoff (default `5`)
- Click **Test STT Connection** before starting live capture
4. As questions are detected in the audio, they will appear in the "Transcript" section.
4. In **Session Context**, choose a profile (Interview/Standup/Sales or your custom profile).
5. AI-generated responses will appear in the "AI Response" section.
5. (Optional) In **Automation Preset**, choose:
- `Automatic` to run all enabled automations that match each trigger, or
- a single automation to run only that one for session start/end.
6. Click "Stop Listening" to end the audio capture.
6. Click **Start Listening** to begin capturing audio from the current tab.
7. Click **Stop Listening** to end the audio capture.
## Custom Sessions (Context Profiles)
Custom session behavior is configured through **profiles**.
1. Open side panel → **Context****Manage Profiles**.
2. Click **New Profile**.
3. Set:
- Profile name (for example: `Interview (Backend)` or `Meeting (Sales Discovery)`)
- Mode (`interview`, `meeting`, `standup`, or `custom`)
- System prompt (instructions specific to this profile)
4. Click **Save Profile**.
5. Back in **Session**, select that profile in **Session Context** before clicking **Start Listening**.
Each profile uses its own scoped context store to reduce prompt/context leakage between use cases.
## Automation in Side Panel
- Use **Automation Preset** to choose how automations run for the current session.
- Use **Run Selected Automation Now** to manually test from the side panel.
- Use **Advanced Settings (⚙️)** for full automation editing (actions, MCP tools, webhook args, triggers, approval behavior).
## Plans & Roadmap
@@ -76,11 +168,14 @@ The AI Interview Assistant is a Chrome extension designed to help users during i
- Ensure you have granted the necessary permissions for the extension to access tab audio.
- If you're not seeing responses, check that your API key is entered correctly and that you have sufficient credits on your OpenAI account.
- If local STT on a public domain keeps failing with `Invalid HTTP request received`, check protocol mismatch:
- `http://` endpoints on HSTS domains may be auto-upgraded to `https://` by Chrome.
- Use a proper HTTPS reverse proxy in front of the STT service, or use localhost/IP for plain HTTP testing.
- For any issues, please check the Chrome developer console for error messages.
## Contributing
Contributions to the AI Interview Assistant are welcome! Please feel free to submit pull requests or create issues for bugs and feature requests.
Contributions to the AI Assistant are welcome! Please feel free to submit pull requests or create issues for bugs and feature requests.
## License

BIN
Screenshot-advanced.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 398 KiB

View File

@@ -3,12 +3,12 @@
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>AI Interview Assistant</title>
<title>AI Assistant</title>
<link rel="stylesheet" href="style.css">
</head>
<body>
<div id="app">
<h3>AI Interview Assistant</h3>
<h3>AI Assistant</h3>
<div class="status-message">Detached view</div>
<div id="transcript"></div>
<div id="aiResponse"></div>

File diff suppressed because it is too large Load Diff

View File

@@ -9,6 +9,12 @@ let overlayHidden = false;
let analyserNode = null;
let meterSource = null;
let meterRaf = null;
let transcriptionRecorder = null;
let mixedTabStream = null;
let mixedMicStream = null;
let mixedOutputStream = null;
let lastTranscriptionErrorAt = 0;
let transcriptionWindowTimer = null;
chrome.runtime.onMessage.addListener((request, sender, sendResponse) => {
if (request.action === 'startCapture') {
@@ -80,10 +86,7 @@ function startCapture(streamId) {
overlayListening = true;
ensureOverlay();
updateOverlayIndicator();
updateOverlay(
'response',
'Tab audio is captured, but speech recognition uses the microphone. Use mic or mixed mode if you want transcription.'
);
updateOverlay('response', 'Capturing tab audio and transcribing meeting audio...');
navigator.mediaDevices.getUserMedia({
audio: {
chromeMediaSource: 'tab',
@@ -93,9 +96,7 @@ function startCapture(streamId) {
mediaStream = stream;
audioContext = new AudioContext();
createAudioMeter(stream);
if (ensureSpeechRecognitionAvailable()) {
startRecognition();
}
startTranscriptionRecorder(stream, 'tab');
}).catch((error) => {
console.error('Error starting capture:', error);
let errorMessage = 'Failed to start audio capture. ';
@@ -147,18 +148,39 @@ function startMixedCapture(streamId) {
overlayListening = true;
ensureOverlay();
updateOverlayIndicator();
updateOverlay('response', 'Capturing mixed audio (tab + mic) and transcribing...');
navigator.mediaDevices.getUserMedia({
audio: {
chromeMediaSource: 'tab',
chromeMediaSourceId: streamId
}
}).then((stream) => {
mediaStream = stream;
}).then(async (tabStream) => {
mixedTabStream = tabStream;
audioContext = new AudioContext();
createAudioMeter(stream);
if (ensureSpeechRecognitionAvailable()) {
startRecognition();
try {
mixedMicStream = await navigator.mediaDevices.getUserMedia({ audio: true });
} catch (error) {
console.warn('Mixed mode mic unavailable, falling back to tab-only capture:', error);
mixedMicStream = null;
chrome.runtime.sendMessage({
action: 'updateAIResponse',
response: 'Mic permission denied in mixed mode. Continuing with tab audio only.'
});
}
const destination = audioContext.createMediaStreamDestination();
const tabSource = audioContext.createMediaStreamSource(tabStream);
tabSource.connect(destination);
if (mixedMicStream) {
const micSource = audioContext.createMediaStreamSource(mixedMicStream);
micSource.connect(destination);
}
mixedOutputStream = destination.stream;
mediaStream = mixedOutputStream;
createAudioMeter(mixedOutputStream);
startTranscriptionRecorder(mixedOutputStream, 'mixed');
}).catch((error) => {
console.error('Error starting mixed capture:', error);
chrome.runtime.sendMessage({action: 'updateAIResponse', response: 'Failed to start mixed capture.'});
@@ -235,20 +257,148 @@ function ensureSpeechRecognitionAvailable() {
return true;
}
function stopTranscriptionRecorder() {
if (transcriptionWindowTimer) {
clearTimeout(transcriptionWindowTimer);
transcriptionWindowTimer = null;
}
if (transcriptionRecorder && transcriptionRecorder.state !== 'inactive') {
try {
transcriptionRecorder.stop();
} catch (error) {
console.warn('Failed to stop transcription recorder:', error);
}
}
transcriptionRecorder = null;
}
function blobToBase64(blob) {
return new Promise((resolve, reject) => {
const reader = new FileReader();
reader.onloadend = () => {
const result = reader.result || '';
const base64 = String(result).split(',')[1] || '';
resolve(base64);
};
reader.onerror = () => reject(new Error('Failed to read recorded audio chunk.'));
reader.readAsDataURL(blob);
});
}
function startTranscriptionRecorder(stream, mode) {
stopTranscriptionRecorder();
const mimeType = MediaRecorder.isTypeSupported('audio/webm;codecs=opus')
? 'audio/webm;codecs=opus'
: 'audio/webm';
const WINDOW_MS = 6000;
const sendBlobForTranscription = async (blob, currentMimeType) => {
if (!isCapturing || !blob || blob.size < 1024) return;
try {
const base64Audio = await blobToBase64(blob);
chrome.runtime.sendMessage(
{
action: 'transcribeAudioChunk',
audioBase64: base64Audio,
mimeType: currentMimeType || mimeType,
captureMode: mode
},
(response) => {
if (chrome.runtime.lastError) return;
if (!response || !response.success) {
const now = Date.now();
if (response && response.error && now - lastTranscriptionErrorAt > 6000) {
lastTranscriptionErrorAt = now;
chrome.runtime.sendMessage({ action: 'updateAIResponse', response: response.error });
updateOverlay('response', response.error);
}
return;
}
if (!response.transcript) return;
updateOverlay('transcript', response.transcript);
}
);
} catch (error) {
console.warn('Audio chunk transcription failed:', error);
}
};
const startWindow = () => {
if (!isCapturing) return;
const recorder = new MediaRecorder(stream, { mimeType });
transcriptionRecorder = recorder;
const chunks = [];
recorder.ondataavailable = (event) => {
if (event.data && event.data.size > 0) {
chunks.push(event.data);
}
};
recorder.onerror = (event) => {
const message = `Audio recorder error: ${event.error ? event.error.message : 'unknown'}`;
chrome.runtime.sendMessage({ action: 'updateAIResponse', response: message });
updateOverlay('response', message);
};
recorder.onstop = async () => {
transcriptionRecorder = null;
if (!chunks.length) {
if (isCapturing) startWindow();
return;
}
const blob = new Blob(chunks, { type: recorder.mimeType || mimeType });
await sendBlobForTranscription(blob, recorder.mimeType || mimeType);
if (isCapturing) {
startWindow();
}
};
recorder.start();
transcriptionWindowTimer = setTimeout(() => {
transcriptionWindowTimer = null;
if (recorder.state !== 'inactive') {
recorder.stop();
}
}, WINDOW_MS);
};
startWindow();
}
function stopCapture() {
isCapturing = false;
overlayListening = false;
updateOverlayIndicator();
stopTranscriptionRecorder();
stopAudioMeter();
if (mediaStream) {
mediaStream.getTracks().forEach(track => track.stop());
mediaStream = null;
}
if (mixedTabStream) {
mixedTabStream.getTracks().forEach(track => track.stop());
mixedTabStream = null;
}
if (mixedMicStream) {
mixedMicStream.getTracks().forEach(track => track.stop());
mixedMicStream = null;
}
if (mixedOutputStream) {
mixedOutputStream.getTracks().forEach(track => track.stop());
mixedOutputStream = null;
}
if (audioContext) {
audioContext.close();
audioContext = null;
}
if (recognition) {
try {
recognition.stop();
} catch (error) {
console.warn('Failed to stop recognition:', error);
}
recognition = null;
}
}
@@ -385,7 +535,7 @@ function ensureOverlay() {
<div id="ai-interview-header">
<div id="ai-interview-title">
<span id="ai-interview-indicator"></span>
<span>AI Interview Assistant</span>
<span>AI Assistant</span>
</div>
<div id="ai-interview-controls">
<button class="ai-interview-btn" id="ai-interview-detach">Detach</button>

View File

@@ -1,7 +1,7 @@
function createDraggableUI() {
const uiHTML = `
<div id="ai-assistant-ui" class="ai-assistant-container">
<div id="ai-assistant-header">AI Interview Assistant</div>
<div id="ai-assistant-header">AI Assistant</div>
<div id="ai-assistant-content">
<input type="password" id="apiKeyInput" placeholder="Enter your OpenAI API Key here">
<button id="saveApiKey">Save API Key</button>

View File

@@ -0,0 +1,88 @@
# Local STT Bridge (faster-whisper)
Self-hosted Speech-to-Text bridge for the Chrome extension.
Primary project documentation lives in `README.md`.
## 1) Install
Use Python 3.11 or 3.12 (recommended). Python 3.13 may force source builds for audio deps.
```bash
cd local_stt_bridge
python3.11 -m venv .venv
source .venv/bin/activate
pip install --upgrade pip setuptools wheel
pip install -r requirements.txt
```
### macOS build prerequisites (required if `av`/PyAV tries to build)
```bash
brew install pkg-config ffmpeg
```
If install still fails on `PyAV`, recreate the venv with Python 3.11 and retry.
## 2) Run
```bash
cd local_stt_bridge
source .venv/bin/activate
export STT_MODEL=small
export STT_DEVICE=auto
export STT_COMPUTE_TYPE=int8
# Optional auth key:
# export STT_API_KEY=your_local_key
uvicorn server:app --host 0.0.0.0 --port 8790
```
## 3) Verify
```bash
curl http://localhost:8790/health
```
## 4) Extension Setup
In side panel:
- Assistant Setup -> Speech-to-Text Provider: `Local faster-whisper bridge`
- STT Model: `small` (start here)
- Local STT endpoint: `http://localhost:8790/transcribe`
- Optional Local STT API key if `STT_API_KEY` is set on server
- Optional quality/language controls:
- Language Mode: `Auto-detect` or `Force language`
- Forced language: e.g. `en`, `fr`, `de`, `ar`
- Task: `transcribe` or `translate`
- VAD filter: on/off
- Beam size: integer (default `5`)
- Click `Test STT Connection` from the extension to validate endpoint reachability.
## API contract expected by the extension
`POST /transcribe` with `multipart/form-data`:
- `file` (required): uploaded audio chunk (`webm`/`mp4`/`wav`)
- `task` (optional): `transcribe` or `translate`
- `vad_filter` (optional): `true`/`false`
- `beam_size` (optional): integer
- `language` (optional): language code
- `model` (optional): model hint
Optional auth headers when enabled:
- `Authorization: Bearer <token>`
- `x-api-key: <token>`
`GET /health` is used by extension `Test STT Connection`.
## Public domain + HTTPS note
If you expose this service on a public domain, use HTTPS via reverse proxy.
Chrome may auto-upgrade `http://` on HSTS domains to `https://`, which causes plain HTTP Uvicorn ports to fail with `Invalid HTTP request received`.
## Notes
- `faster-whisper` relies on FFmpeg for many input formats.
- For best CPU cost/performance, use `small` or `medium`.
- `large-v3` improves quality but uses significantly more compute.

Binary file not shown.

View File

@@ -0,0 +1,3 @@
fastapi==0.115.0
uvicorn[standard]==0.30.6
faster-whisper==1.0.3

View File

@@ -0,0 +1,92 @@
import base64
import os
import tempfile
from typing import Optional
from fastapi import FastAPI, Header, HTTPException
from fastapi.middleware.cors import CORSMiddleware
from pydantic import BaseModel
try:
from faster_whisper import WhisperModel
except ImportError as exc: # pragma: no cover
raise RuntimeError("faster-whisper is required. Install dependencies from requirements.txt") from exc
STT_MODEL = os.getenv("STT_MODEL", "small")
STT_DEVICE = os.getenv("STT_DEVICE", "auto")
STT_COMPUTE_TYPE = os.getenv("STT_COMPUTE_TYPE", "int8")
STT_API_KEY = os.getenv("STT_API_KEY", "").strip()
app = FastAPI(title="Local STT Bridge", version="1.0.0")
app.add_middleware(
CORSMiddleware,
allow_origins=["*"],
allow_credentials=False,
allow_methods=["*"],
allow_headers=["*"],
)
model = WhisperModel(STT_MODEL, device=STT_DEVICE, compute_type=STT_COMPUTE_TYPE)
class TranscribeRequest(BaseModel):
audioBase64: str
mimeType: Optional[str] = "audio/webm"
captureMode: Optional[str] = "tab"
model: Optional[str] = None
@app.get("/health")
def health():
return {
"ok": True,
"engine": "faster-whisper",
"model": STT_MODEL,
"device": STT_DEVICE,
"computeType": STT_COMPUTE_TYPE,
}
@app.post("/transcribe")
def transcribe(payload: TranscribeRequest, x_stt_api_key: Optional[str] = Header(default=None)):
if STT_API_KEY and x_stt_api_key != STT_API_KEY:
raise HTTPException(status_code=401, detail="Invalid STT API key")
try:
audio_bytes = base64.b64decode(payload.audioBase64)
except Exception as exc:
raise HTTPException(status_code=400, detail=f"Invalid base64 audio payload: {exc}") from exc
suffix = ".webm"
if payload.mimeType and "mp4" in payload.mimeType:
suffix = ".mp4"
elif payload.mimeType and "wav" in payload.mimeType:
suffix = ".wav"
with tempfile.NamedTemporaryFile(suffix=suffix, delete=False) as tmp:
tmp.write(audio_bytes)
tmp_path = tmp.name
try:
segments, info = model.transcribe(
tmp_path,
vad_filter=True,
beam_size=1,
language=None,
)
text = " ".join(segment.text.strip() for segment in segments).strip()
return {
"success": True,
"text": text,
"language": info.language,
"duration": info.duration,
}
except Exception as exc:
raise HTTPException(status_code=500, detail=f"Transcription failed: {exc}") from exc
finally:
try:
os.remove(tmp_path)
except OSError:
pass

View File

@@ -1,11 +1,10 @@
{
"manifest_version": 3,
"name": "AI Interview Assistant",
"version": "1.0",
"name": "AI Assistant",
"version": "1.1.0",
"description": "Monitors audio and answers questions in real-time using AI",
"permissions": [
"tabCapture",
"audioCapture",
"storage",
"activeTab",
"scripting",

View File

@@ -3,12 +3,12 @@
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>AI Interview Assistant</title>
<title>AI Assistant</title>
<link rel="stylesheet" href="style.css">
</head>
<body>
<div id="app">
<h3>AI Interview Assistant</h3>
<h3>AI Assistant</h3>
<input type="password" id="apiKeyInput" placeholder="Enter your OpenAI API Key here">
<button id="saveApiKey">Save API Key</button>
<button id="toggleListening">Start Listening</button>

View File

@@ -3,7 +3,7 @@
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>AI Interview Assistant - Remote Access</title>
<title>AI Assistant - Remote Access</title>
<style>
body {
font-family: 'Segoe UI', Tahoma, Geneva, Verdana, sans-serif;
@@ -160,7 +160,7 @@
<body>
<div class="container">
<div class="logo">🤖</div>
<h1>AI Interview Assistant</h1>
<h1>AI Assistant</h1>
<div class="subtitle">Remote Access Portal</div>
<div id="status" class="status disconnected">

Binary file not shown.

Before

Width:  |  Height:  |  Size: 36 KiB

After

Width:  |  Height:  |  Size: 182 KiB

222
settings.html Normal file
View File

@@ -0,0 +1,222 @@
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>AI Assistant - Advanced Settings</title>
<link rel="stylesheet" href="style.css">
</head>
<body>
<div id="app">
<div class="app-header">
<h3>Advanced Settings</h3>
<button id="closeSettings" class="icon-button" title="Close"></button>
</div>
<details class="panel-group" open>
<summary>General</summary>
<div class="active-state-section">
<label>
<input type="checkbox" id="advancedModeToggle">
Enable advanced mode
</label>
<div class="status-message">
Advanced mode unlocks integrations, webhook tests, and developer tooling.
</div>
</div>
</details>
<details class="panel-group">
<summary>Webhooks</summary>
<div class="context-section">
<input type="text" id="webhookUrl" placeholder="https://example.com/webhook">
<textarea id="webhookPayload" class="code-input" placeholder='Custom payload JSON (optional)'></textarea>
<label class="summary-save-toggle">
<input type="checkbox" id="debugModeToggle">
Enable debug output
</label>
<div class="status-message" id="webhookStatus"></div>
<button id="testWebhook">Send Test Payload</button>
</div>
</details>
<details class="panel-group">
<summary>MCP Connection</summary>
<div class="context-section">
<input type="text" id="mcpServerUrl" placeholder="http://localhost:3000">
<div class="mcp-key-row">
<input type="password" id="mcpApiKey" placeholder="MCP API key (optional)">
<button id="toggleMcpKey" type="button">Show</button>
</div>
<div class="status-message" id="mcpStatus"></div>
<button id="testMcp">Test MCP Connection</button>
</div>
</details>
<details class="panel-group">
<summary>Cloud Sync (Optional)</summary>
<div class="context-section">
<input type="text" id="cloudProvider" placeholder="Supabase / S3 / Custom">
<input type="text" id="cloudEndpoint" placeholder="https://your-cloud-endpoint">
<div class="status-message" id="cloudStatus"></div>
<button id="testCloud">Validate Cloud Config</button>
</div>
</details>
<details class="panel-group">
<summary>Google Calendar (via MCP)</summary>
<div class="context-section">
<input type="text" id="gcalToolName" placeholder="Tool name (e.g. google.calendar.listEvents)">
<input type="text" id="gcalCalendarId" placeholder="Calendar ID (default: primary)">
<input type="text" id="gcalMaxResults" placeholder="Max results (default: 5)">
<div class="status-message" id="gcalStatus"></div>
<button id="fetchCalendar">Fetch Upcoming Meetings</button>
<pre id="gcalOutput"></pre>
</div>
</details>
<details class="panel-group">
<summary>MCP Tools (Preview)</summary>
<div class="context-section">
<button id="refreshMcpTools">Fetch Tools</button>
<span id="mcpToolsStatus" class="status-message"></span>
<select id="mcpToolSelect"></select>
<textarea id="mcpToolArgs" class="code-input" placeholder='Tool arguments as JSON. Example: {"maxResults": 5}'></textarea>
<button id="runMcpTool">Run Tool</button>
<pre id="mcpToolOutput"></pre>
</div>
</details>
<details class="panel-group">
<summary>Custom Tool Presets</summary>
<div class="context-section">
<input type="text" id="presetName" placeholder="Preset name (e.g. Weekly Standup Notes)">
<input type="text" id="presetDescription" placeholder="Short description (optional)">
<input type="text" id="presetTags" placeholder="Tags (comma-separated)">
<input type="text" id="presetToolName" placeholder="Tool name (e.g. discord.postMessage)">
<textarea id="presetArgs" class="code-input" placeholder='Arguments JSON (e.g. {"channelId":"abc","message":"Hello"})'></textarea>
<button id="savePreset">Save Preset</button>
<button id="clearPreset">Clear</button>
<div id="presetStatus" class="status-message"></div>
<select id="presetSelect"></select>
<button id="runPreset">Run Preset</button>
<button id="deletePreset" class="danger-btn">Delete Preset</button>
<pre id="presetOutput"></pre>
</div>
</details>
<details class="panel-group" open>
<summary>Automation Manager</summary>
<div class="context-section automation-manager">
<div class="automation-layout">
<div class="automation-list">
<button id="addAutomation" type="button">New Automation</button>
<div id="automationList"></div>
</div>
<div class="automation-editor">
<div class="status-message" id="automationEditorStatus">Select an automation to edit.</div>
<div id="automationEditor" class="automation-editor-fields">
<input type="text" id="automationName" placeholder="Automation name">
<select id="automationType">
<option value="actions">MCP Actions</option>
<option value="standup">Daily Standup</option>
</select>
<label class="summary-save-toggle">
<input type="checkbox" id="automationEnabled">
Enabled
</label>
<div class="status-message">Triggers</div>
<label class="summary-save-toggle">
<input type="checkbox" id="automationTriggerStart">
On session start
</label>
<label class="summary-save-toggle">
<input type="checkbox" id="automationTriggerEnd">
On session end
</label>
<label class="summary-save-toggle">
<input type="checkbox" id="automationTriggerManual">
Manual only
</label>
<label class="summary-save-toggle">
<input type="checkbox" id="automationRequireApproval">
Require confirmation before running actions
</label>
<div class="status-message">Changes are saved automatically.</div>
<div id="automationActionsSection">
<div class="status-message">Actions</div>
<input type="text" id="automationActionLabel" placeholder="Action label">
<select id="automationActionType">
<option value="mcp">MCP Tool</option>
<option value="webhook">Webhook</option>
</select>
<div id="automationActionMcpFields">
<input type="text" id="automationActionTool" placeholder="MCP tool name">
<textarea id="automationActionArgs" class="code-input" placeholder='Args JSON (e.g. {"channelId":"abc","message":"Hello"})'></textarea>
</div>
<div id="automationActionWebhookFields" style="display:none;">
<input type="text" id="automationActionWebhookUrl" placeholder="Webhook URL (optional, uses global webhook if empty)">
<select id="automationActionWebhookMethod">
<option value="POST">POST</option>
<option value="PUT">PUT</option>
<option value="PATCH">PATCH</option>
</select>
<textarea id="automationActionWebhookHeaders" class="code-input" placeholder='Headers JSON (optional, e.g. {"Authorization":"Bearer ...","X-Trace":"1"})'></textarea>
<textarea id="automationActionWebhookBody" class="code-input" placeholder='Body template (JSON string with placeholders, e.g. {"message":"{{summary}}","date":"{{date}}"} )'></textarea>
<input type="number" id="automationActionWebhookRetryCount" min="0" max="5" step="1" placeholder="Retry count (0-5)">
</div>
<button id="addAutomationAction" type="button">Add Action</button>
<select id="automationActionSelect"></select>
<button id="removeAutomationAction" class="danger-btn" type="button">Remove Action</button>
</div>
<div id="automationStandupSection">
<div class="status-message">Discord</div>
<input type="text" id="standupDiscordTool" placeholder="Tool name (e.g. discord.postMessage)">
<textarea id="standupDiscordArgs" class="code-input" placeholder='Args JSON with placeholders. Example: {"channelId":"123","message":"{{summary}}"}'></textarea>
<div class="status-message">Nextcloud</div>
<input type="text" id="standupNextcloudTool" placeholder="Tool name (e.g. nextcloud.createNote)">
<textarea id="standupNextcloudArgs" class="code-input" placeholder='Args JSON with placeholders. Default if empty uses notes/daily/standup-{{date}}.txt'></textarea>
<div class="placeholder-help">
<div class="status-message">Available placeholders</div>
<div class="placeholder-list">
<button type="button" class="placeholder-chip" data-placeholder="{{summary}}">{{summary}}</button>
<button type="button" class="placeholder-chip" data-placeholder="{{summary_brief}}">{{summary_brief}}</button>
<button type="button" class="placeholder-chip" data-placeholder="{{summary_full}}">{{summary_full}}</button>
<button type="button" class="placeholder-chip" data-placeholder="{{action_items}}">{{action_items}}</button>
<button type="button" class="placeholder-chip" data-placeholder="{{action_items_json}}">{{action_items_json}}</button>
<button type="button" class="placeholder-chip" data-placeholder="{{blockers}}">{{blockers}}</button>
<button type="button" class="placeholder-chip" data-placeholder="{{decisions}}">{{decisions}}</button>
<button type="button" class="placeholder-chip" data-placeholder="{{date}}">{{date}}</button>
<button type="button" class="placeholder-chip" data-placeholder="{{date_human}}">{{date_human}}</button>
<button type="button" class="placeholder-chip" data-placeholder="{{date_compact}}">{{date_compact}}</button>
<button type="button" class="placeholder-chip" data-placeholder="{{datetime}}">{{datetime}}</button>
<button type="button" class="placeholder-chip" data-placeholder="{{time}}">{{time}}</button>
<button type="button" class="placeholder-chip" data-placeholder="{{weekday}}">{{weekday}}</button>
</div>
<div class="status-message" id="placeholderStatus"></div>
</div>
</div>
<button id="saveAutomation" type="button">Save Automation</button>
<button id="testAutomation" type="button">Test Automation</button>
<button id="runAutomationNow" type="button">Run Selected Automation</button>
<button id="deleteAutomation" class="danger-btn" type="button">Delete Automation</button>
<div id="automationRunStatus" class="status-message"></div>
<pre id="automationOutput"></pre>
</div>
</div>
</div>
</div>
</details>
</div>
<script src="settings.js"></script>
</body>
</html>

1296
settings.js Normal file

File diff suppressed because it is too large Load Diff

View File

@@ -3,13 +3,68 @@
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>AI Interview Assistant</title>
<title>AI Assistant</title>
<link rel="stylesheet" href="style.css">
</head>
<body>
<div id="app">
<h3>AI Interview Assistant</h3>
<div class="app-header">
<h3>AI Assistant</h3>
<button id="openSettings" class="icon-button" title="Open Advanced Settings">⚙️</button>
</div>
<details class="panel-group" open>
<summary>Session</summary>
<div class="session-context-section">
<label for="sessionContextProfile">Session Context:</label>
<select id="sessionContextProfile"></select>
<div id="sessionContextHint" class="status-message"></div>
</div>
<div class="capture-mode-section">
<label for="captureModeSelect">Audio Input Mode:</label>
<select id="captureModeSelect">
<option value="tab">Tab-only (all speakers in the meeting tab)</option>
<option value="mic">Mic-only (your voice)</option>
<option value="mixed">Mixed (tab + mic)</option>
</select>
<div class="status-message">Use Tab-only or Mixed if you need other speakers in the meeting.</div>
</div>
<div class="session-automation-section">
<label for="sessionAutomationSelect">Automation Preset:</label>
<select id="sessionAutomationSelect"></select>
<div id="sessionAutomationHint" class="status-message"></div>
</div>
<button id="toggleListening">Start Listening</button>
<button id="pauseListening" disabled>Pause Listening</button>
<details class="session-advanced">
<summary>Advanced Session Options</summary>
<div class="active-state-section">
<label>
<input type="checkbox" id="autoOpenAssistantWindow">
Auto-open assistant window after Start Listening
</label>
</div>
<div class="active-state-section">
<label>
<input type="checkbox" id="autoStartToggle">
Auto-start listening on meeting sites
</label>
</div>
<div class="active-state-section">
<label>
<input type="checkbox" id="storeSessionToggle" disabled>
Store this session to memory
</label>
<button id="forgetSession" class="danger-btn" disabled>Forget Session</button>
</div>
<div class="session-automation-section">
<button id="runSelectedAutomationNow" type="button">Run Selected Automation Now</button>
</div>
</details>
</details>
<details class="panel-group">
<summary>Assistant Setup</summary>
<div class="ai-provider-section">
<label for="aiProvider">AI Provider:</label>
<select id="aiProvider">
@@ -28,58 +83,100 @@
</select>
</div>
<div class="stt-settings-section">
<label for="sttProvider">Speech-to-Text Provider:</label>
<select id="sttProvider">
<option value="openai">OpenAI Whisper (recommended for Tab/Mixed)</option>
<option value="local">Local faster-whisper bridge (self-hosted)</option>
<option value="browser">Browser SpeechRecognition (mic-oriented)</option>
</select>
<label for="sttModel">STT Model:</label>
<select id="sttModel">
<option value="whisper-1">whisper-1</option>
<option value="small">small</option>
<option value="medium">medium</option>
<option value="large-v3">large-v3</option>
</select>
<label for="sttLanguageMode">Language Mode:</label>
<select id="sttLanguageMode">
<option value="auto">Auto-detect (recommended)</option>
<option value="forced">Force language</option>
</select>
<input type="text" id="sttForcedLanguage" placeholder="Forced language code (e.g. en, fr, de, ar)">
<label for="sttTask">STT Task:</label>
<select id="sttTask">
<option value="transcribe">Transcribe</option>
<option value="translate">Translate to English</option>
</select>
<label class="inline-toggle" for="sttVadFilter">
<input type="checkbox" id="sttVadFilter" checked>
Enable VAD filter
</label>
<label for="sttBeamSize">Beam Size:</label>
<input type="number" id="sttBeamSize" min="1" max="10" step="1" value="5">
<input type="url" id="sttEndpointInput" placeholder="Local STT endpoint (e.g. http://localhost:8790/transcribe)">
<input type="password" id="sttApiKeyInput" placeholder="Enter STT API key">
<button id="saveSttApiKey" type="button">Save STT API Key</button>
<button id="testSttConnection" type="button">Test STT Connection</button>
<div id="sttStatus" class="status-message"></div>
</div>
<div class="api-key-section">
<input type="password" id="apiKeyInput" placeholder="Enter your API Key here">
<button id="saveApiKey">Save API Key</button>
<div id="apiKeyStatus" class="status-message"></div>
</div>
<div class="capture-mode-section">
<label for="captureModeSelect">Capture mode:</label>
<select id="captureModeSelect">
<option value="tab">Tab-only (default)</option>
<option value="mic">Mic-only</option>
<option value="mixed">Mixed (experimental)</option>
</select>
</div>
<div class="active-state-section">
<label>
<input type="checkbox" id="extensionActiveToggle" checked>
Extension Active
</label>
</div>
<div class="overlay-visibility-section">
<label>
<input type="checkbox" id="autoOpenAssistantWindow">
Auto-open assistant window after Start Listening
</label>
<div class="status-message" id="sidepanelTip">
Tip: You can close this side panel while listening; the in-tab overlay will keep running.
</div>
</div>
<div class="mic-monitor-section">
<h4>🎙️ Mic Monitor</h4>
<label for="inputDeviceSelect">Input device:</label>
<select id="inputDeviceSelect"></select>
<div id="inputDeviceStatus" class="status-message"></div>
<div class="mic-level">
<div class="mic-level-bar" id="micLevelBar"></div>
</div>
<button id="startMicMonitor">Enable Mic Monitor</button>
</div>
<div class="performance-section">
<label>
<input type="checkbox" id="speedModeToggle">
Optimize for speed (faster, shorter answers)
</label>
</div>
</details>
<details class="panel-group">
<summary>Inputs</summary>
<div class="mic-monitor-section">
<h4>Mic Monitor</h4>
<label for="inputDeviceSelect">Input device:</label>
<select id="inputDeviceSelect"></select>
<div id="inputDeviceStatus" class="status-message"></div>
<div class="mic-level">
<div class="mic-level-bar" id="micLevelBar"></div>
</div>
<button id="startMicMonitor">Enable Mic Monitor</button>
</div>
</details>
<details class="panel-group">
<summary>Context</summary>
<div class="context-section">
<h4>📄 Context Management</h4>
<h4>Context Management</h4>
<div class="session-context-section">
<label for="contextProfileSelect">Editing Profile:</label>
<select id="contextProfileSelect"></select>
<div id="contextProfileHint" class="status-message"></div>
</div>
<details class="profile-manager">
<summary>Manage Profiles</summary>
<div class="profile-manager-content">
<input type="text" id="profileNameInput" placeholder="Profile name (e.g., Interview - Backend)">
<select id="profileModeSelect">
<option value="interview">Interview</option>
<option value="meeting">Meeting</option>
<option value="standup">Standup</option>
<option value="custom">Custom</option>
</select>
<textarea id="profilePromptInput" placeholder="System prompt for this profile..."></textarea>
<div class="profile-actions-row">
<button id="newProfileBtn" type="button">New Profile</button>
<button id="saveProfileBtn" type="button">Save Profile</button>
<button id="deleteProfileBtn" type="button" class="danger-btn">Delete Profile</button>
</div>
<div id="profileManagerStatus" class="status-message"></div>
</div>
</details>
<div class="context-tabs">
<button class="tab-button active" data-tab="upload">Upload Files</button>
<button class="tab-button" data-tab="text">Add Text</button>
@@ -87,9 +184,9 @@
</div>
<div id="uploadTab" class="tab-content active">
<input type="file" id="contextFileInput" multiple accept=".txt,.pdf,.doc,.docx" style="display: none;">
<button id="uploadContextBtn">📁 Upload CV/Job Description</button>
<div class="upload-info">Supports: PDF, DOC, DOCX, TXT</div>
<input type="file" id="contextFileInput" multiple accept=".txt,.pdf,.docx" style="display: none;">
<button id="uploadContextBtn">Upload CV/Job Description</button>
<div class="upload-info">Supports: PDF, DOCX, TXT (.doc is not supported)</div>
</div>
<div id="textTab" class="tab-content">
@@ -101,37 +198,42 @@
<option value="job_description">Job description</option>
</select>
<input type="text" id="contextTitleInput" placeholder="Context title (e.g., 'My CV', 'Job Description')">
<button id="addContextBtn">💾 Save Context</button>
<button id="addContextBtn">Save Context</button>
</div>
<div id="manageTab" class="tab-content">
<div id="contextList"></div>
<button id="clearAllContextBtn" class="danger-btn">🗑️ Clear All Context</button>
<button id="clearAllContextBtn" class="danger-btn">Clear All Context</button>
</div>
</div>
</details>
<details class="panel-group">
<summary>Multi-Device</summary>
<div class="device-section">
<h4>📱 Multi-Device Listening</h4>
<h4>Remote Listening</h4>
<div class="device-options">
<button id="enableRemoteListening">🌐 Enable Remote Access</button>
<button id="enableRemoteListening">Enable Remote Access</button>
<div id="remoteStatus" class="status-message"></div>
<div id="deviceInfo" class="device-info" style="display: none;">
<div>📱 <strong>Access from any device:</strong></div>
<div><strong>Access from any device:</strong></div>
<div class="access-url" id="accessUrl"></div>
<button id="copyUrlBtn">📋 Copy Link</button>
<button id="copyUrlBtn">Copy Link</button>
<div class="qr-code" id="qrCode"></div>
</div>
</div>
</div>
</details>
<button id="toggleListening">Start Listening</button>
<button id="showOverlay">Show Overlay</button>
<details class="panel-group">
<summary>Permissions</summary>
<button id="showOverlay" type="button">Show Transcription Overlay</button>
<button id="requestMicPermission">Request Microphone Permission</button>
<button id="grantTabAccess">Grant Tab Access</button>
<div id="overlayStatus" class="status-message"></div>
<div id="micPermissionStatus" class="status-message"></div>
<div id="tabAccessStatus" class="status-message"></div>
<div id="transcript"></div>
<div id="aiResponse"></div>
</details>
</div>
<script src="sidepanel.js"></script>
</body>

File diff suppressed because it is too large Load Diff

544
style.css
View File

@@ -1,51 +1,348 @@
:root {
--ink: #0f172a;
--muted: #64748b;
--panel: #ffffff;
--panel-soft: #f8fafc;
--stroke: rgba(148, 163, 184, 0.28);
--accent: #0ea5e9;
--accent-2: #22d3ee;
--success: #22c55e;
--danger: #ef4444;
}
body {
width: 100%;
height: 100%;
padding: 20px;
font-family: 'Segoe UI', Tahoma, Geneva, Verdana, sans-serif;
background-color: #f0f4f8;
color: #333;
padding: 18px;
font-family: "Space Grotesk", "IBM Plex Sans", "Helvetica Neue", sans-serif;
background: radial-gradient(circle at top, #f0f9ff 0%, #e2e8f0 45%, #e8ecf3 100%);
color: var(--ink);
margin: 0;
box-sizing: border-box;
}
#app {
background-color: white;
border-radius: 8px;
box-shadow: 0 4px 6px rgba(0, 0, 0, 0.1);
background: linear-gradient(145deg, rgba(255, 255, 255, 0.96), rgba(248, 250, 252, 0.92));
border-radius: 16px;
border: 1px solid var(--stroke);
box-shadow: 0 18px 40px rgba(15, 23, 42, 0.12);
padding: 20px;
height: calc(100% - 40px);
min-width: 76vw;
height: calc(100% - 36px);
min-width: 0;
display: flex;
flex-direction: column;
gap: 16px;
}
.panel-group {
border: 1px solid var(--stroke);
border-radius: 14px;
background: var(--panel);
box-shadow: 0 10px 20px rgba(15, 23, 42, 0.06);
overflow: hidden;
}
.panel-group > summary {
list-style: none;
cursor: pointer;
padding: 12px 14px;
font-weight: 600;
color: var(--ink);
background: rgba(14, 165, 233, 0.06);
border-bottom: 1px solid transparent;
}
.panel-group > summary::-webkit-details-marker {
display: none;
}
.panel-group[open] > summary {
border-bottom-color: var(--stroke);
}
.panel-group > :not(summary) {
padding: 14px;
}
.app-header {
display: flex;
align-items: center;
justify-content: space-between;
gap: 12px;
margin-bottom: 4px;
}
h3 {
font-size: 22px;
margin-bottom: 20px;
color: #2c3e50;
text-align: center;
margin: 0;
color: var(--ink);
text-align: left;
}
input[type="password"], select {
input[type="password"],
input[type="text"],
input[type="number"],
input[type="url"],
select {
width: 100%;
padding: 10px;
margin-bottom: 15px;
border: 1px solid #ddd;
border-radius: 4px;
padding: 10px 12px;
margin-bottom: 12px;
border: 1px solid var(--stroke);
border-radius: 10px;
font-size: 14px;
background-color: white;
background-color: var(--panel);
color: var(--ink);
box-sizing: border-box;
}
.ai-provider-section, .model-selection, .api-key-section, .capture-mode-section, .performance-section, .overlay-visibility-section, .mic-monitor-section, .active-state-section {
margin-bottom: 20px;
input[type="password"]:focus,
input[type="text"]:focus,
input[type="number"]:focus,
input[type="url"]:focus,
select:focus,
textarea:focus {
outline: none;
border-color: rgba(14, 165, 233, 0.6);
box-shadow: 0 0 0 3px rgba(14, 165, 233, 0.12);
}
.ai-provider-section label, .model-selection label, .capture-mode-section label {
textarea {
width: 100%;
min-height: 90px;
padding: 10px;
border: 1px solid var(--stroke);
border-radius: 10px;
resize: vertical;
font-family: inherit;
margin-bottom: 10px;
background: var(--panel);
color: var(--ink);
box-sizing: border-box;
}
textarea.code-input {
font-family: "JetBrains Mono", "SFMono-Regular", Consolas, monospace;
}
.mcp-key-row {
display: flex;
gap: 8px;
align-items: center;
}
.automation-manager {
padding: 20px;
}
.automation-layout {
display: grid;
grid-template-columns: minmax(160px, 1fr) minmax(0, 2fr);
gap: 16px;
}
.automation-list {
display: flex;
flex-direction: column;
gap: 10px;
}
.automation-list button {
margin-bottom: 0;
}
.automation-item {
padding: 10px 12px;
border-radius: 10px;
border: 1px solid var(--stroke);
background: var(--panel-soft);
cursor: pointer;
font-size: 14px;
text-align: left;
}
.automation-item.active {
border-color: rgba(14, 165, 233, 0.6);
box-shadow: 0 0 0 2px rgba(14, 165, 233, 0.12);
}
.automation-editor {
display: flex;
flex-direction: column;
gap: 12px;
}
.automation-editor-fields {
display: flex;
flex-direction: column;
gap: 10px;
}
.placeholder-help {
padding: 10px;
border-radius: 10px;
border: 1px dashed var(--stroke);
background: var(--panel-soft);
}
.placeholder-list {
display: flex;
flex-wrap: wrap;
gap: 6px;
}
.placeholder-chip {
background: rgba(14, 165, 233, 0.1);
border: 1px solid rgba(14, 165, 233, 0.25);
padding: 4px 8px;
border-radius: 999px;
font-size: 12px;
color: var(--ink);
width: auto;
margin: 0;
box-shadow: none;
}
.placeholder-chip:hover {
transform: none;
box-shadow: none;
background: rgba(14, 165, 233, 0.2);
}
.profile-manager {
border: 1px solid var(--stroke);
border-radius: 10px;
background: var(--panel-soft);
margin-bottom: 12px;
}
.profile-manager > summary {
cursor: pointer;
list-style: none;
padding: 10px 12px;
font-weight: 600;
}
.profile-manager > summary::-webkit-details-marker {
display: none;
}
.profile-manager-content {
padding: 0 12px 12px 12px;
}
.profile-actions-row {
display: grid;
grid-template-columns: repeat(3, minmax(0, 1fr));
gap: 8px;
}
.profile-actions-row button {
margin-bottom: 0;
font-size: 14px;
padding: 9px 10px;
}
.session-advanced {
margin-top: 6px;
border: 1px dashed var(--stroke);
border-radius: 10px;
background: var(--panel-soft);
}
.session-advanced > summary {
cursor: pointer;
list-style: none;
padding: 10px 12px;
font-weight: 600;
color: var(--ink);
}
.session-advanced > summary::-webkit-details-marker {
display: none;
}
.session-advanced[open] {
border-style: solid;
}
.session-advanced > :not(summary) {
padding: 0 12px 12px 12px;
}
@media (max-width: 720px) {
.automation-layout {
grid-template-columns: 1fr;
}
.profile-actions-row {
grid-template-columns: 1fr;
}
}
.mcp-key-row input[type="password"],
.mcp-key-row input[type="text"] {
margin-bottom: 0;
}
.mcp-key-row button {
width: auto;
padding: 8px 12px;
margin-bottom: 0;
font-size: 14px;
}
pre {
background: var(--panel-soft);
border: 1px solid var(--stroke);
border-radius: 10px;
padding: 12px;
max-height: 220px;
overflow: auto;
font-size: 12px;
color: var(--ink);
white-space: pre-wrap;
}
.ai-provider-section,
.model-selection,
.stt-settings-section,
.api-key-section,
.capture-mode-section,
.session-context-section,
.session-automation-section,
.performance-section,
.overlay-visibility-section,
.mic-monitor-section,
.active-state-section,
.session-summary-section,
.context-section,
.device-section {
margin: 0 0 14px 0;
padding: 0;
border-radius: 0;
border: none;
background: transparent;
box-shadow: none;
}
.panel-group > :not(summary) > div:last-child {
margin-bottom: 0;
}
.ai-provider-section label,
.model-selection label,
.stt-settings-section label,
.capture-mode-section label,
.session-context-section label,
.session-automation-section label {
display: block;
margin-bottom: 5px;
font-weight: 600;
color: #2c3e50;
color: var(--ink);
}
.stt-settings-section .inline-toggle {
display: flex;
align-items: center;
gap: 8px;
}
.performance-section label,
@@ -55,53 +352,69 @@ input[type="password"], select {
align-items: center;
gap: 8px;
font-weight: 600;
color: #2c3e50;
color: var(--ink);
}
.status-message {
font-size: 12px;
margin-top: 5px;
padding: 5px;
border-radius: 3px;
padding: 6px 8px;
border-radius: 8px;
background: var(--panel-soft);
color: var(--muted);
}
.status-message.success {
color: #27ae60;
background-color: #d5f4e6;
color: var(--success);
background-color: rgba(34, 197, 94, 0.12);
}
.status-message.error {
color: #e74c3c;
background-color: #fdf2f2;
color: var(--danger);
background-color: rgba(239, 68, 68, 0.12);
}
/* Context Management Styles */
.context-section, .device-section {
margin-bottom: 20px;
border: 1px solid #e0e6ed;
border-radius: 6px;
padding: 15px;
background-color: #f8fafc;
}
.mic-monitor-section {
border: 1px solid #e0e6ed;
border-radius: 6px;
padding: 15px;
background-color: #f8fafc;
}
.mic-monitor-section h4 {
.session-summary-section h4,
.mic-monitor-section h4,
.context-section h4,
.device-section h4 {
margin: 0 0 12px 0;
color: #2c3e50;
color: var(--ink);
font-size: 16px;
}
#sessionSummaryInput,
#contextTextInput {
width: 100%;
min-height: 90px;
padding: 10px;
border: 1px solid var(--stroke);
border-radius: 10px;
resize: vertical;
font-family: inherit;
margin-bottom: 10px;
background: var(--panel);
color: var(--ink);
}
#contextTextInput {
min-height: 100px;
}
.summary-save-toggle {
display: flex;
align-items: center;
gap: 8px;
font-weight: 600;
color: var(--ink);
margin-bottom: 10px;
}
.mic-level {
height: 8px;
width: 100%;
border-radius: 999px;
background-color: #e8eef5;
background-color: #e2e8f0;
overflow: hidden;
margin: 6px 0 12px 0;
}
@@ -109,20 +422,14 @@ input[type="password"], select {
.mic-level-bar {
height: 100%;
width: 0%;
background: linear-gradient(90deg, #2ecc71, #1abc9c);
background: linear-gradient(90deg, #22c55e, #0ea5e9);
transition: width 80ms linear;
}
.context-section h4, .device-section h4 {
margin: 0 0 15px 0;
color: #2c3e50;
font-size: 16px;
}
.context-tabs {
display: flex;
margin-bottom: 15px;
border-bottom: 1px solid #ddd;
border-bottom: 1px solid var(--stroke);
}
.tab-button {
@@ -132,18 +439,18 @@ input[type="password"], select {
cursor: pointer;
border-bottom: 2px solid transparent;
font-size: 14px;
color: #666;
color: var(--muted);
margin: 0;
width: auto;
}
.tab-button.active {
color: #3498db;
border-bottom-color: #3498db;
color: var(--accent);
border-bottom-color: var(--accent);
}
.tab-button:hover {
background-color: #f0f4f8;
background-color: rgba(14, 165, 233, 0.08);
}
.tab-content {
@@ -154,37 +461,24 @@ input[type="password"], select {
display: block;
}
#contextTextInput {
width: 100%;
min-height: 100px;
padding: 10px;
border: 1px solid #ddd;
border-radius: 4px;
resize: vertical;
font-family: inherit;
margin-bottom: 10px;
}
#contextTypeSelect {
width: 100%;
padding: 8px;
margin-bottom: 10px;
border: 1px solid #ddd;
border-radius: 4px;
background-color: white;
#contextTypeSelect,
#contextTitleInput,
#sessionSummaryInput {
background-color: var(--panel);
border: 1px solid var(--stroke);
border-radius: 10px;
}
#contextTypeSelect,
#contextTitleInput {
width: 100%;
padding: 8px;
margin-bottom: 10px;
border: 1px solid #ddd;
border-radius: 4px;
}
.upload-info {
font-size: 12px;
color: #666;
color: var(--muted);
margin-top: 5px;
}
@@ -193,10 +487,10 @@ input[type="password"], select {
justify-content: space-between;
align-items: center;
padding: 10px;
border: 1px solid #ddd;
border-radius: 4px;
border: 1px solid var(--stroke);
border-radius: 10px;
margin-bottom: 8px;
background-color: white;
background-color: var(--panel);
}
.context-item-info {
@@ -205,12 +499,12 @@ input[type="password"], select {
.context-item-title {
font-weight: 600;
color: #2c3e50;
color: var(--ink);
}
.context-item-preview {
font-size: 12px;
color: #666;
color: var(--muted);
margin-top: 2px;
}
@@ -227,14 +521,14 @@ input[type="password"], select {
}
.danger-btn {
background-color: #e74c3c !important;
background-color: var(--danger) !important;
color: #fff !important;
}
.danger-btn:hover {
background-color: #c0392b !important;
background-color: #dc2626 !important;
}
/* Device Section Styles */
.device-options {
display: flex;
flex-direction: column;
@@ -242,14 +536,14 @@ input[type="password"], select {
}
.access-url {
background-color: #f0f4f8;
background-color: var(--panel-soft);
padding: 10px;
border-radius: 4px;
border-radius: 10px;
font-family: monospace;
font-size: 12px;
word-break: break-all;
margin: 10px 0;
border: 1px solid #ddd;
border: 1px solid var(--stroke);
}
.qr-code {
@@ -258,81 +552,103 @@ input[type="password"], select {
}
.device-info {
background-color: white;
background-color: var(--panel);
padding: 15px;
border-radius: 4px;
border: 1px solid #ddd;
border-radius: 10px;
border: 1px solid var(--stroke);
margin-top: 10px;
}
button {
width: 100%;
padding: 10px;
margin-bottom: 15px;
background-color: #3498db;
color: white;
margin-bottom: 12px;
background: linear-gradient(135deg, var(--accent), var(--accent-2));
color: #0f172a;
border: none;
border-radius: 4px;
border-radius: 10px;
cursor: pointer;
font-size: 16px;
transition: background-color 0.3s ease;
transition: transform 0.2s ease, box-shadow 0.2s ease;
box-shadow: 0 10px 18px rgba(14, 165, 233, 0.18);
}
.icon-button {
width: auto;
padding: 6px 10px;
margin: 0;
font-size: 16px;
background-color: rgba(14, 165, 233, 0.1);
color: var(--ink);
border: 1px solid rgba(14, 165, 233, 0.3);
box-shadow: none;
}
.icon-button:hover {
background-color: rgba(14, 165, 233, 0.2);
}
button:hover {
background-color: #2980b9;
transform: translateY(-1px);
box-shadow: 0 12px 24px rgba(14, 165, 233, 0.24);
}
#saveApiKey {
background-color: #2ecc71;
background: linear-gradient(135deg, #22c55e, #16a34a);
color: #052e16;
}
#saveApiKey:hover {
background-color: #27ae60;
box-shadow: 0 12px 24px rgba(34, 197, 94, 0.28);
}
#transcript, #aiResponse {
#transcript,
#aiResponse {
margin-top: 15px;
border: 1px solid #ddd;
border: 1px solid var(--stroke);
padding: 15px;
min-height: 60px;
max-height: 150px;
overflow-y: auto;
border-radius: 4px;
border-radius: 10px;
font-size: 14px;
line-height: 1.5;
background: var(--panel);
}
#transcript {
background-color: #ecf0f1;
background: #eef6ff;
}
#aiResponse {
background-color: #e8f6fd;
background: #f0fdfa;
}
/* Scrollbar styling */
::-webkit-scrollbar {
width: 8px;
}
::-webkit-scrollbar-track {
background: #f1f1f1;
background: #e2e8f0;
}
::-webkit-scrollbar-thumb {
background: #888;
background: #94a3b8;
border-radius: 4px;
}
::-webkit-scrollbar-thumb:hover {
background: #555;
background: #64748b;
}
button:disabled {
background-color: #95a5a6;
background: #cbd5f5;
color: #475569;
box-shadow: none;
cursor: not-allowed;
}
button:disabled:hover {
background-color: #95a5a6;
transform: none;
box-shadow: none;
}