diff --git a/AI_PROVIDERS_GUIDE.md b/AI_PROVIDERS_GUIDE.md index 482943c..c780dc1 100644 --- a/AI_PROVIDERS_GUIDE.md +++ b/AI_PROVIDERS_GUIDE.md @@ -1,84 +1,66 @@ # AI Providers Guide -## Supported AI Providers +## Scope -Your AI Interview Assistant now supports multiple AI providers! Here's how to set up and use each one: +This guide covers **chat/response providers** used by the extension after transcription. -## 🤖 **OpenAI (GPT)** -- **Models Available**: GPT-4o, GPT-4o-mini, GPT-4-turbo, GPT-3.5-turbo -- **API Key**: Get from [OpenAI Platform](https://platform.openai.com/account/api-keys) -- **Recommended Model**: GPT-4o-mini (good balance of speed and quality) -- **Cost**: Pay per token usage +Note: Speech-to-text is configured separately in Assistant Setup (`STT Provider`, `STT Model`, language/task/VAD/beam settings). -## 🧠 **Anthropic (Claude)** -- **Models Available**: Claude-3.5-Sonnet, Claude-3.5-Haiku, Claude-3-Opus -- **API Key**: Get from [Anthropic Console](https://console.anthropic.com/) -- **Recommended Model**: Claude-3.5-Sonnet (excellent reasoning) -- **Cost**: Pay per token usage +## Supported Chat Providers -## 🔍 **Google (Gemini)** -- **Models Available**: Gemini-1.5-Pro, Gemini-1.5-Flash, Gemini-Pro -- **API Key**: Get from [Google AI Studio](https://aistudio.google.com/app/apikey) -- **Recommended Model**: Gemini-1.5-Flash (fast and efficient) -- **Cost**: Free tier available, then pay per token +### OpenAI +- Default models in UI: `gpt-4o`, `gpt-4o-mini`, `gpt-4-turbo`, `gpt-3.5-turbo` +- API key: https://platform.openai.com/account/api-keys +- Good default: `gpt-4o-mini` (speed/cost balance) -## 🌊 **DeepSeek** -- **Models Available**: DeepSeek-Chat, DeepSeek-Reasoner -- **API Key**: Get from [DeepSeek Platform](https://platform.deepseek.com/) -- **Recommended Model**: DeepSeek-Chat (general use) -- **Cost**: Pay per token usage +### Anthropic +- Default models in UI: `claude-3-5-sonnet-20241022`, `claude-3-5-haiku-20241022`, `claude-3-opus-20240229` +- API key: https://console.anthropic.com/ +- Good default: `claude-3-5-sonnet-20241022` -## 🏠 **Ollama (Local)** -- **Models Available**: Llama3.2, Llama3.1, Mistral, CodeLlama, Phi3 -- **Setup**: Install [Ollama](https://ollama.ai/) locally -- **No API Key Required**: Runs completely on your machine -- **Cost**: Free (uses your computer's resources) +### Google Gemini +- Default models in UI: `gemini-1.5-pro`, `gemini-1.5-flash`, `gemini-pro` +- API key: https://aistudio.google.com/app/apikey +- Good default: `gemini-1.5-flash` -## 🚀 **How to Setup** +### DeepSeek +- Default models in UI: `deepseek-chat`, `deepseek-reasoner` +- API key: https://platform.deepseek.com/ +- Good default: `deepseek-chat` -### 1. **Choose Your Provider** -- Open the extension side panel -- Select your preferred AI provider from the dropdown +### Ollama (local) +- Default models in UI: `llama3.2`, `llama3.1`, `mistral`, `codellama`, `phi3` +- API key: not required +- Endpoint used by extension: `http://localhost:11434` -### 2. **Select Model** -- Choose the specific model you want to use -- Different models have different capabilities and speeds +## Model List Behavior -### 3. **Add API Key** (if required) -- Enter your API key for the selected provider -- Ollama doesn't require an API key -- Keys are stored securely in Chrome's storage +- For cloud providers, if an API key is saved, the extension attempts to fetch live model lists. +- If model fetch fails, the extension falls back to the built-in default model list above. +- For Ollama, the extension reads models from `/api/tags`. -### 4. **Start Using** -- Click "Start Listening" to begin audio capture -- The extension will use your selected AI provider for responses +## Setup Steps -## 💡 **Tips** +1. Open side panel -> `Assistant Setup`. +2. Choose `AI Provider`. +3. Save provider API key (not needed for Ollama). +4. Select model. +5. Start listening. -- **For Speed**: Use GPT-4o-mini, Gemini-1.5-Flash, or Claude-3.5-Haiku -- **For Quality**: Use GPT-4o, Claude-3.5-Sonnet, or Gemini-1.5-Pro -- **For Privacy**: Use Ollama (runs locally, no data sent to servers) -- **For Free Usage**: Try Google Gemini's free tier or set up Ollama +## Recommended Defaults -## 🔧 **Ollama Setup** +- Fastest general: `gpt-4o-mini` / `gemini-1.5-flash` / `claude-3-5-haiku-20241022` +- Highest quality: `gpt-4o` / `claude-3-5-sonnet-20241022` / `gemini-1.5-pro` +- Local-only privacy: `ollama` + local STT -If you want to use Ollama (local AI): +## Troubleshooting -1. Install Ollama from [ollama.ai](https://ollama.ai/) -2. Run: `ollama pull llama3.2` (or your preferred model) -3. Make sure Ollama is running: `ollama serve` -4. Select "Ollama (Local)" in the extension +- `API key not set`: save provider key in Assistant Setup. +- `Failed to fetch models`: key may be invalid, provider API unavailable, or network blocked. Default model list is used as fallback. +- `Ollama connection failed`: ensure `ollama serve` is running and model is pulled. +- Slow or expensive responses: switch to smaller/faster model and enable Speed mode. -## 🆘 **Troubleshooting** +## Storage Note -- **"API key not set"**: Make sure you've entered a valid API key -- **"Failed to connect"**: Check your internet connection (or Ollama service for local) -- **"Invalid API key"**: Verify your API key is correct and has sufficient credits -- **Slow responses**: Try switching to a faster model like GPT-4o-mini or Gemini-1.5-Flash - -## 🔒 **Privacy & Security** - -- API keys are stored locally in Chrome's secure storage -- Only the selected provider receives your audio transcriptions -- Ollama option keeps everything completely local -- No audio data is stored permanently +- Provider API keys are stored in extension sync storage (`chrome.storage.sync`). +- Keep least-privilege keys where possible and rotate keys regularly. diff --git a/NEW_FEATURES_GUIDE.md b/NEW_FEATURES_GUIDE.md index ea36f52..e39998d 100644 --- a/NEW_FEATURES_GUIDE.md +++ b/NEW_FEATURES_GUIDE.md @@ -10,7 +10,7 @@ Context management allows you to provide additional information (like your CV, j #### 1. **Upload Files** - Click the "Upload Files" tab in the Context Management section - Click "📁 Upload CV/Job Description" -- Select your files (supports TXT, PDF, DOC, DOCX) +- Select your files (supports TXT, PDF, DOCX) - Files will be automatically processed and saved #### 2. **Add Text Directly** @@ -44,7 +44,7 @@ AI Response: *"Based on your background, you have 3 years of Python experience a ## 📱 Multi-Device Listening ### What is Multi-Device Listening? -This feature allows you to use the AI Interview Assistant from other devices (phones, tablets, other computers) while keeping the main processing on your primary Chrome browser. +This feature allows you to use the AI Assistant from other devices (phones, tablets, other computers) while keeping the main processing on your primary Chrome browser. ### How to Enable Multi-Device Access @@ -94,7 +94,7 @@ This feature allows you to use the AI Interview Assistant from other devices (ph ### 1. **Reload the Extension** After the updates, reload the extension in Chrome: - Go to `chrome://extensions/` -- Find "AI Interview Assistant" +- Find "AI Assistant" - Click the reload button 🔄 ### 2. **Configure Context** @@ -177,4 +177,4 @@ After the updates, reload the extension in Chrome: 4. **Practice** - Use the enhanced features in mock interviews 5. **Customize** - Adjust context for different types of interviews -The AI Interview Assistant is now much more powerful and flexible. Use these features to get more personalized, relevant responses that truly reflect your background and the specific role you're interviewing for! +The AI Assistant is now much more powerful and flexible. Use these features to get more personalized, relevant responses that truly reflect your background and the specific role you're interviewing for! diff --git a/Plans_and_Todo.md b/Plans_and_Todo.md index 9f933b4..29d921d 100644 --- a/Plans_and_Todo.md +++ b/Plans_and_Todo.md @@ -1,78 +1,111 @@ # Personal Browser Companion - Plans & To-Do +## Classification +- Core = works in extension-only mode (no extra server required). +- Optional = requires extra server/services (MCP, cloud sync, external APIs) and is opt-in per user. + ## Goals -- Start local-first with an option to sync to cloud. -- Online-only operation (LLM required for decisions). -- Auto-start mode during meetings. -- Integrations: calendar, email, Discord, Nextcloud. +- [ ] [Core] Start local-first with an option to sync to cloud. +- [ ] [Core] Online-only operation (LLM required for decisions). +- [ ] [Core] Auto-start mode during meetings. +- [ ] [Optional] Integrations: calendar, email, Discord, Nextcloud. ## Phase Plan -### Phase 1: Local MVP (Foundation) -- Local storage for sessions, summaries, and user profile. -- Meeting/interview modes with manual start and overlay UI. -- Basic memory retrieval: recent session summaries + user profile. -- Audio capture + STT pipeline (mic + tab) and transcript display. -- Privacy controls: store/forget, per-session toggle. +### Phase 1: Local MVP (Foundation) [Core] +- [x] [Core] Local storage for sessions, summaries, and user profile. +- [x] [Core] Meeting/interview modes with manual start and overlay UI. +- [x] [Core] Basic memory retrieval: recent session summaries + user profile. +- [ ] [Core] Audio capture + STT pipeline (mic + tab) and transcript display. +- [x] [Core] Privacy controls: store/forget, per-session toggle. -### Phase 2: Smart Auto-Start -- Detect meeting tabs (Google Meet, Zoom, Teams) and prompt to start. -- Auto-start rules (domain allowlist, time-based, calendar hints). -- Lightweight on-device heuristics for meeting detection. +### Phase 2: Smart Auto-Start [Core] +- [ ] [Core] Detect meeting tabs (Google Meet, Zoom, Teams) and prompt to start. +- [ ] [Core] Auto-start rules (domain allowlist, time-based, calendar hints). +- [ ] [Core] Lightweight on-device heuristics for meeting detection. -### Phase 3: Cloud Sync (Optional) -- Opt-in cloud sync for memory + settings. -- Conflict resolution strategy (last-write wins + merge for summaries). -- Encryption at rest, user-controlled delete/export. +### Phase 3: Cloud Sync (Optional) [Optional] +- [ ] [Optional] Opt-in cloud sync for memory + settings. +- [ ] [Optional] Conflict resolution strategy (last-write wins + merge for summaries). +- [ ] [Optional] Encryption at rest, user-controlled delete/export. -### Phase 4: Integrations (MCP) -- Calendar: read upcoming meetings, attach context. -- Email: draft follow-ups, summaries. -- Discord: post meeting summary or action items to a channel. -- Nextcloud: store meeting notes, transcripts, and attachments. +### Phase 4: Integrations (MCP) [Optional] +- [ ] [Optional] Calendar: read upcoming meetings, attach context. +- [ ] [Optional] Email: draft follow-ups, summaries. +- [ ] [Optional] Discord: post meeting summary or action items to a channel. +- [ ] [Optional] Nextcloud: store meeting notes, transcripts, and attachments. ## MVP To-Do (Local) ### Core -- Define memory schema (profile, session, summary, action items). -- Implement local RAG: index summaries + profile into embeddings. -- Add session lifecycle: start, pause, end, summarize. +- [x] [Core] Define memory schema (profile, session, summary, action items). +- [x] [Core] Implement local RAG: index summaries + profile into embeddings. +- [x] [Core] Add session lifecycle: start, pause, end, summarize. ### Audio + STT -- Implement reliable STT for tab audio (server-side if needed). -- Keep mic-only STT as fallback. -- Add device selection + live mic monitor. +- [x] [Core] Implement reliable STT for tab audio (OpenAI Whisper chunk transcription from tab/mixed audio). +- [x] [Core] Keep mic-only STT as fallback. +- [x] [Core] Add device selection + live mic monitor. +- [x] [Core] Add separate STT settings (provider/model/API key) independent from chat provider. +- [x] [Optional] Add local STT bridge support (self-hosted faster-whisper endpoint). +- [x] [Core] Add STT "Test Connection" action in Assistant Setup. +- [x] [Core] Add multilingual STT controls (auto/forced language, task, VAD, beam size) with session language lock in auto mode. ### UI/UX -- Overlay controls: resize, hide/show, minimize. -- Auto-start toggle in side panel. -- Session summary view with “save to memory” toggle. +- [x] [Core] Overlay controls: resize, hide/show, minimize. +- [x] [Core] Auto-start toggle in side panel. +- [x] [Core] Session summary view with “save to memory” toggle. +- [x] [Core] Sidebar automation preset selector before Start Listening. +- [x] [Core] One-click session context selector before Start Listening. +- [x] [Core] Profile-scoped context loading to reduce cross-session prompt leakage. +- [x] [Core] Profile manager UI (create/edit/delete profile with mode + prompt). +- [ ] [Core] Import/export context profiles. ### Privacy -- Per-session storage consent prompt. -- “Forget session” button. +- [x] [Core] Per-session storage consent prompt. +- [x] [Core] “Forget session” button. + +### Advanced Settings (Core) +- [x] [Core] Open full settings window from side panel (⚙️). +- [x] [Core] Webhook test: send sample payload and show status. +- [x] [Core] MCP connection test (basic reachability). +- [x] [Core] Cloud endpoint validation (basic reachability). +- [ ] [Core] Automation framework: triggers + actions + approval flow. ## Integration To-Do (MCP) ### MCP Server Options -- Build a local MCP server as a bridge for integrations. -- Use MCP tool registry for calendar/email/Discord/Nextcloud. +- [ ] [Optional] Build a local MCP server as a bridge for integrations. +- [ ] [Optional] Use MCP tool registry for calendar/email/Discord/Nextcloud. + +### Automation (Rules Engine) +- [ ] [Core] Configure triggers (session start/end/manual, meeting domain filters). +- [ ] [Core] Configure actions per trigger (MCP tool + args). +- [ ] [Core] Approval mode: auto-send or review before send. +- [ ] [Core] Run actions on session end (hook into session lifecycle). +- [ ] [Core] Manual “Run Actions” button. ### Calendar -- Read upcoming meetings and titles. -- Auto-attach relevant context packs. +- [ ] [Optional] Read upcoming meetings and titles. +- [ ] [Optional] Auto-attach relevant context packs. ### Email -- Generate follow-up drafts from summary + action items. +- [ ] [Optional] Generate follow-up drafts from summary + action items. ### Discord -- Post meeting summary/action items to a selected channel. +- [ ] [Optional] Post meeting summary/action items to a selected channel. ### Nextcloud -- Upload meeting notes and transcripts. +- [ ] [Optional] Upload meeting notes and transcripts. ## Open Questions -- Preferred cloud provider for sync? -- How long should session memories persist by default? -- Should auto-start be opt-in per domain or global? -- What data should be redacted before sync? +- [Core] How do we isolate interview vs meeting prompts/contexts safely? + - Best solution: Use explicit context profiles (e.g., Interview, Standup, Sales) with separate prompt + context store per profile, and require users to pick one profile before Start Listening. +- [Optional] Preferred cloud provider for sync? + - Best solution: Start with Supabase (Postgres + Auth + Storage) for fastest MVP, then add S3-compatible storage as an optional backend for enterprise/self-hosting. +- [Core] How long should session memories persist by default? + - Best solution: 90 days by default with per-session “keep forever” and a global retention slider (7/30/90/365 days). +- [Core] Should auto-start be opt-in per domain or global? + - Best solution: Opt-in per domain, with a one-click “trust this site” prompt on first detection. +- [Optional] What data should be redacted before sync? + - Best solution: Default to redacting emails, phone numbers, calendar IDs, and detected secrets (API keys/tokens) while letting users add custom redaction rules. diff --git a/README.md b/README.md index 2b7f40a..5c79810 100644 --- a/README.md +++ b/README.md @@ -1,13 +1,66 @@ -# AI Interview Assistant Chrome Extension +# AI Assistant Chrome Extension ## Overview -The AI Interview Assistant is a Chrome extension designed to help users during interviews or meetings by providing real-time AI-powered responses to questions. It listens to the audio from the current tab, transcribes the speech, identifies questions, and generates concise answers using OpenAI's GPT model. +AI Assistant is a Chrome extension for live meeting/interview support. It captures audio, transcribes speech, and generates concise AI responses with configurable chat and STT providers. + +Current extension version: `1.1.0`
- + AI Assistant side panel
+## Screenshots + +### Main side panel + +
+ Main side panel +
+ +### Advanced setup + +
+ Advanced settings +
+ +## Table of Contents + +- [Documentation Index](#documentation-index) +- [Quick Start (2 Minutes)](#quick-start-2-minutes) +- [Features](#features) +- [Installation](#installation) +- [Usage](#usage) +- [Custom Sessions (Context Profiles)](#custom-sessions-context-profiles) +- [Automation in Side Panel](#automation-in-side-panel) +- [Plans & Roadmap](#plans--roadmap) +- [Recent Improvements](#recent-improvements) +- [Privacy and Security](#privacy-and-security) +- [Troubleshooting](#troubleshooting) +- [Contributing](#contributing) +- [License](#license) +- [Disclaimer](#disclaimer) + +## Documentation Index + +Use this `README.md` as the main entrypoint. Additional docs: + +- Product roadmap and task tracking: `Plans_and_Todo.md` +- AI provider setup/details: `AI_PROVIDERS_GUIDE.md` +- New features and updates: `NEW_FEATURES_GUIDE.md` +- Local self-hosted STT bridge: `local_stt_bridge/LOCAL_STT_BRIDGE_GUIDE.md` + +## Quick Start (2 Minutes) + +1. Load the extension in `chrome://extensions` (Developer Mode → Load unpacked). +2. Open the side panel and set **AI Provider**, **Model**, and **API key**. +3. In **Assistant Setup**, choose **Speech-to-Text Provider** (`OpenAI`, `Local faster-whisper`, or `Browser`). +4. Configure STT quality controls (`Language Mode`, optional `Forced language`, `Task`, `VAD`, `Beam size`). +5. Use **Test STT Connection** to validate STT endpoint/key. +6. In **Session Context**, pick a profile (or create one in **Context → Manage Profiles**). +7. (Optional) Pick an **Automation Preset**. +8. Click **Start Listening**. + ## Features - Real-time audio capture (tab, mic, or mixed mode) @@ -15,8 +68,12 @@ The AI Interview Assistant is a Chrome extension designed to help users during i - AI-powered responses with multiple providers (OpenAI, Anthropic, Google, DeepSeek, Ollama) - Persistent side panel interface - Secure API key storage -- Context management (upload or paste documents for better answers) +- Context profiles (prebuilt + custom) with profile-scoped context isolation +- Context management (upload or paste documents per profile) - Speed mode (faster, shorter responses) +- Automation preset selector in side panel (automatic or one selected automation) +- Separate STT settings (OpenAI Whisper, Browser STT, or local faster-whisper bridge) +- Multilingual STT controls (auto/forced language, task mode, VAD, beam size) - Multi-device demo mode for remote access - Overlay controls: drag, resize, minimize, detach, hide/show - Mic monitor with input device selection and live level meter @@ -38,21 +95,56 @@ The AI Interview Assistant is a Chrome extension designed to help users during i 4. Click on "Load unpacked" and select the directory containing the extension files. -5. The AI Interview Assistant extension should now appear in your list of installed extensions. +5. The AI Assistant extension should now appear in your list of installed extensions. ## Usage -1. Click on the AI Interview Assistant icon in the Chrome toolbar to open the side panel. +1. Click on the AI Assistant icon in the Chrome toolbar to open the side panel. -2. Enter your OpenAI API key in the provided input field and click "Save API Key". +2. Select your provider/model and save the provider API key. -3. Click "Start Listening" to begin capturing audio from the current tab. +3. In **Assistant Setup**, configure **Speech-to-Text Provider**: + - `OpenAI Whisper` for hosted tab/mixed transcription + - `Local faster-whisper bridge` for self-hosted STT (`local_stt_bridge/LOCAL_STT_BRIDGE_GUIDE.md`) + - `Browser SpeechRecognition` for mic-oriented local recognition + - Tune multilingual/quality options: + - `Language Mode`: `Auto-detect` or `Force language` + - `Forced language`: language code (for example `en`, `fr`, `de`, `ar`) + - `Task`: `Transcribe` or `Translate to English` + - `VAD`: enable/disable silence filtering + - `Beam size`: decoding quality/performance tradeoff (default `5`) + - Click **Test STT Connection** before starting live capture -4. As questions are detected in the audio, they will appear in the "Transcript" section. +4. In **Session Context**, choose a profile (Interview/Standup/Sales or your custom profile). -5. AI-generated responses will appear in the "AI Response" section. +5. (Optional) In **Automation Preset**, choose: + - `Automatic` to run all enabled automations that match each trigger, or + - a single automation to run only that one for session start/end. -6. Click "Stop Listening" to end the audio capture. +6. Click **Start Listening** to begin capturing audio from the current tab. + +7. Click **Stop Listening** to end the audio capture. + +## Custom Sessions (Context Profiles) + +Custom session behavior is configured through **profiles**. + +1. Open side panel → **Context** → **Manage Profiles**. +2. Click **New Profile**. +3. Set: + - Profile name (for example: `Interview (Backend)` or `Meeting (Sales Discovery)`) + - Mode (`interview`, `meeting`, `standup`, or `custom`) + - System prompt (instructions specific to this profile) +4. Click **Save Profile**. +5. Back in **Session**, select that profile in **Session Context** before clicking **Start Listening**. + +Each profile uses its own scoped context store to reduce prompt/context leakage between use cases. + +## Automation in Side Panel + +- Use **Automation Preset** to choose how automations run for the current session. +- Use **Run Selected Automation Now** to manually test from the side panel. +- Use **Advanced Settings (⚙️)** for full automation editing (actions, MCP tools, webhook args, triggers, approval behavior). ## Plans & Roadmap @@ -76,11 +168,14 @@ The AI Interview Assistant is a Chrome extension designed to help users during i - Ensure you have granted the necessary permissions for the extension to access tab audio. - If you're not seeing responses, check that your API key is entered correctly and that you have sufficient credits on your OpenAI account. +- If local STT on a public domain keeps failing with `Invalid HTTP request received`, check protocol mismatch: + - `http://` endpoints on HSTS domains may be auto-upgraded to `https://` by Chrome. + - Use a proper HTTPS reverse proxy in front of the STT service, or use localhost/IP for plain HTTP testing. - For any issues, please check the Chrome developer console for error messages. ## Contributing -Contributions to the AI Interview Assistant are welcome! Please feel free to submit pull requests or create issues for bugs and feature requests. +Contributions to the AI Assistant are welcome! Please feel free to submit pull requests or create issues for bugs and feature requests. ## License diff --git a/Screenshot-advanced.png b/Screenshot-advanced.png new file mode 100644 index 0000000..e289ae6 Binary files /dev/null and b/Screenshot-advanced.png differ diff --git a/assistant.html b/assistant.html index 35261d1..addd1f0 100644 --- a/assistant.html +++ b/assistant.html @@ -3,12 +3,12 @@ - AI Interview Assistant + AI Assistant
-

AI Interview Assistant

+

AI Assistant

Detached view
diff --git a/background.js b/background.js index 6bb2d79..7f89fad 100644 --- a/background.js +++ b/background.js @@ -2,7 +2,86 @@ const DEFAULT_AI_CONFIG = { provider: 'openai', model: 'gpt-4o-mini' }; const DEFAULT_CAPTURE_MODE = 'tab'; -const LISTENING_PROMPT = 'You are a helpful assistant that answers questions briefly and concisely during interviews. Provide clear, professional responses.'; +const DEFAULT_LISTENING_PROMPT = 'You are a helpful assistant that answers questions briefly and concisely during interviews. Provide clear, professional responses.'; +const CONTEXT_PROFILES_STORAGE_KEY = 'contextProfiles'; +const ACTIVE_CONTEXT_PROFILE_STORAGE_KEY = 'activeContextProfileId'; +const CONTEXTS_BY_PROFILE_STORAGE_KEY = 'contextsByProfile'; +const DEFAULT_CONTEXT_PROFILE_ID = 'interview_software'; +const DEFAULT_CONTEXT_PROFILES = [ + { + id: 'interview_software', + name: 'Interview (Software Development)', + mode: 'interview', + systemPrompt: 'You are an interview assistant for software development. Keep responses concise, technically correct, and structured.' + }, + { + id: 'meeting_standup', + name: 'Meeting (Daily Standup)', + mode: 'standup', + systemPrompt: 'You are a standup meeting assistant. Focus on updates, blockers, owners, and next steps.' + }, + { + id: 'meeting_sales', + name: 'Meeting (Sales Call)', + mode: 'meeting', + systemPrompt: 'You are a sales call assistant. Focus on customer needs, objections, commitments, and clear follow-up actions.' + } +]; +const DEFAULT_MODE_POLICIES = { + interview: { + systemPrompt: 'You are an interview assistant. Prioritize concise, high-signal answers tailored to technical interviews.', + maxGeneralItems: 4, + maxSystemItems: 2 + }, + meeting: { + systemPrompt: 'You are a meeting assistant. Prioritize clarity, decisions, risks, and concrete next steps.', + maxGeneralItems: 5, + maxSystemItems: 2 + }, + standup: { + systemPrompt: 'You are a standup assistant. Keep updates concise and action-oriented.', + maxGeneralItems: 4, + maxSystemItems: 2 + }, + custom: { + systemPrompt: DEFAULT_LISTENING_PROMPT, + maxGeneralItems: 4, + maxSystemItems: 2 + } +}; +const MEMORY_STORAGE_KEY = 'memoryStore'; +const MEMORY_SCHEMA_VERSION = 1; +const SESSION_STATUS = { + IDLE: 'idle', + ACTIVE: 'active', + PAUSED: 'paused', + ENDED: 'ended' +}; +const RAG_MIN_SCORE = 0.05; +const RAG_MAX_ITEMS = 3; +const STANDUP_PROMPT = `You are an assistant that produces daily standup summaries.\nReturn JSON only with keys:\nsummary (string), action_items (array of {text, assignee?}), blockers (array of strings), decisions (array of strings).\nKeep summary concise and action items clear.`; +const TRANSCRIPT_NOISE_PATTERNS = [ + /^:\w+:$/i, + /^click to react$/i, + /^add reaction$/i, + /^reply$/i, + /^forward$/i, + /^more$/i, + /^message\s+#/i +]; + +const createDefaultMemoryStore = () => ({ + version: MEMORY_SCHEMA_VERSION, + profile: { + name: '', + role: '', + notes: '', + updatedAt: null + }, + sessions: [], + summaries: [], + actionItems: [] +}); const AI_SERVICES = { openai: { @@ -16,7 +95,7 @@ const AI_SERVICES = { messages: [ { role: 'system', - content: `${LISTENING_PROMPT}${context ? `\n\nContext Information:\n${context}` : ''}` + content: `${options.systemPrompt || DEFAULT_LISTENING_PROMPT}${context ? `\n\nContext Information:\n${context}` : ''}` }, { role: 'user', content: question } ], @@ -38,7 +117,7 @@ const AI_SERVICES = { messages: [ { role: 'user', - content: `${LISTENING_PROMPT}${context ? `\n\nContext Information:\n${context}` : ''}\n\nQuestion: ${question}` + content: `${options.systemPrompt || DEFAULT_LISTENING_PROMPT}${context ? `\n\nContext Information:\n${context}` : ''}\n\nQuestion: ${question}` } ] }), @@ -54,7 +133,7 @@ const AI_SERVICES = { role: 'system', parts: [ { - text: `${LISTENING_PROMPT}${context ? `\n\nContext Information:\n${context}` : ''}` + text: `${options.systemPrompt || DEFAULT_LISTENING_PROMPT}${context ? `\n\nContext Information:\n${context}` : ''}` } ] }, @@ -82,7 +161,7 @@ const AI_SERVICES = { messages: [ { role: 'system', - content: `${LISTENING_PROMPT}${context ? `\n\nContext Information:\n${context}` : ''}` + content: `${options.systemPrompt || DEFAULT_LISTENING_PROMPT}${context ? `\n\nContext Information:\n${context}` : ''}` }, { role: 'user', content: question } ], @@ -98,7 +177,7 @@ const AI_SERVICES = { }), formatRequest: (model, question, context = '', options = {}) => ({ model, - prompt: `${LISTENING_PROMPT}${context ? `\n\nContext Information:\n${context}` : ''}\n\nQuestion: ${question}\n\nAnswer:`, + prompt: `${options.systemPrompt || DEFAULT_LISTENING_PROMPT}${context ? `\n\nContext Information:\n${context}` : ''}\n\nQuestion: ${question}\n\nAnswer:`, stream: false, options: { temperature: options.temperature ?? 0.7, @@ -117,7 +196,16 @@ const state = { remoteServer: null, remoteServerPort: null, activeConnections: new Set(), - isActive: true + currentSessionId: null, + currentSessionStatus: SESSION_STATUS.IDLE, + currentSessionConsent: false, + pendingSessionConsent: null, + lastSessionId: null, + mcpInitialized: false, + pendingAutomation: null, + activeContextProfileId: DEFAULT_CONTEXT_PROFILE_ID, + activeAutomationId: null, + sttSessionLanguage: '' }; chrome.runtime.onMessage.addListener((request, sender, sendResponse) => { @@ -135,31 +223,42 @@ chrome.windows.onRemoved.addListener((windowId) => { }); initializeActiveState(); +initializeMemoryStore(); +initializeContextProfiles(); function handleMessage(request, _sender, sendResponse) { switch (request.action) { case 'startListening': - if (!state.isActive) { - chrome.runtime.sendMessage({ - action: 'updateAIResponse', - response: 'Extension is inactive. Turn it on in the side panel to start listening.' - }); - return false; - } if (request.aiProvider && request.model) { state.currentAIConfig = { provider: request.aiProvider, model: request.model }; } if (request.captureMode) { state.currentCaptureMode = request.captureMode; } + if (request.contextProfileId) { + state.activeContextProfileId = request.contextProfileId; + chrome.storage.sync.set({ [ACTIVE_CONTEXT_PROFILE_STORAGE_KEY]: request.contextProfileId }); + } + state.activeAutomationId = request.automationId || null; startListening(); return false; case 'stopListening': stopListening(); return false; + case 'pauseListening': + pauseListening(); + return false; + case 'resumeListening': + startListening(); + return false; case 'getAIResponse': getAIResponse(request.question); return false; + case 'transcribeAudioChunk': + transcribeAudioChunk(request.audioBase64, request.mimeType, request.captureMode) + .then((result) => sendResponse(result)) + .catch((error) => sendResponse({ success: false, error: error.message })); + return true; case 'startRemoteServer': startRemoteServer(request.sessionId, request.port, sendResponse); return true; @@ -175,8 +274,75 @@ function handleMessage(request, _sender, sendResponse) { case 'openAssistantWindow': openAssistantWindow(sendResponse); return true; - case 'setActiveState': - setActiveState(Boolean(request.isActive), sendResponse); + case 'openSettingsWindow': + openSettingsWindow(sendResponse); + return true; + case 'mcp:listTools': + listMcpTools(sendResponse); + return true; + case 'mcp:callTool': + callMcpTool(request.toolName, request.args, sendResponse); + return true; + case 'stt:testConnection': + testSttConnection() + .then((result) => sendResponse(result)) + .catch((error) => sendResponse({ success: false, error: error.message })); + return true; + case 'automation:run': + runAutomation(request.trigger || 'manual', request.automationId || null, { testMode: Boolean(request.testMode) }, sendResponse); + return true; + case 'automation:list': + getAutomationsWithMigration() + .then((automations) => sendResponse({ success: true, automations })) + .catch((error) => sendResponse({ success: false, error: error.message })); + return true; + case 'automation:approve': + approveAutomation(sendResponse); + return true; + case 'automation:reject': + rejectAutomation(sendResponse); + return true; + case 'updateTranscript': + appendTranscriptToCurrentSession(request.transcript); + return false; + case 'session:setConsent': + setCurrentSessionConsent(Boolean(request.consent)); + return false; + case 'session:forgetCurrent': + forgetCurrentSession(); + return false; + case 'session:getState': + sendResponse({ + sessionId: state.currentSessionId, + status: state.currentSessionStatus, + consent: state.currentSessionConsent, + lastSessionId: state.lastSessionId + }); + return true; + case 'session:saveSummary': + saveCurrentSessionSummary(request.content || '', Boolean(request.saveToMemory), request.sessionId).then((result) => + sendResponse(result) + ); + return true; + case 'memory:get': + getMemoryStore().then((store) => sendResponse({ success: true, store })); + return true; + case 'memory:updateProfile': + updateMemoryProfile(request.profile || {}).then((store) => sendResponse({ success: true, store })); + return true; + case 'memory:addSession': + addMemorySession(request.session || {}).then((session) => sendResponse({ success: true, session })); + return true; + case 'memory:addSummary': + addMemorySummary(request.summary || {}).then((summary) => sendResponse({ success: true, summary })); + return true; + case 'memory:addActionItems': + addMemoryActionItems(request.items || [], request.sessionId).then((items) => + sendResponse({ success: true, items }) + ); + return true; + case 'memory:clear': + clearMemoryStore().then((store) => sendResponse({ success: true, store })); return true; default: return false; @@ -184,6 +350,9 @@ function handleMessage(request, _sender, sendResponse) { } function startListening() { + state.sttSessionLanguage = ''; + ensureActiveSession(); + runAutomation('sessionStart', state.activeAutomationId, { testMode: false }, () => {}); if (state.currentCaptureMode === 'mic') { startMicListening(); return; @@ -312,6 +481,8 @@ function injectContentScriptAndStartCapture(tabId, streamId) { } function stopListening() { + endCurrentSession(); + runAutomation('sessionEnd', state.activeAutomationId, { testMode: false }, () => {}); chrome.tabs.query({ active: true, currentWindow: true }, (tabs) => { if (chrome.runtime.lastError || tabs.length === 0) { console.error('Error querying tabs for stop:', chrome.runtime.lastError); @@ -329,6 +500,24 @@ function stopListening() { }); } +function pauseListening() { + pauseCurrentSession(); + chrome.tabs.query({ active: true, currentWindow: true }, (tabs) => { + if (chrome.runtime.lastError || tabs.length === 0) { + console.error('Error querying tabs for pause:', chrome.runtime.lastError); + return; + } + + chrome.tabs.sendMessage(tabs[0].id, { action: 'stopCapture' }, () => { + if (chrome.runtime.lastError) { + console.error('Error pausing capture:', chrome.runtime.lastError); + return; + } + chrome.runtime.sendMessage({ action: 'updateAIResponse', response: 'Paused listening.' }); + }); + }); +} + function isQuestion(text) { const questionWords = ['what', 'when', 'where', 'who', 'why', 'how']; const lowerText = text.toLowerCase(); @@ -350,8 +539,12 @@ async function getAIResponse(question) { throw new Error(`Unsupported AI provider: ${provider}`); } - const contextData = await getStoredContexts(); - const { systemContexts, generalContexts } = selectContextsForRequest(contextData, speedMode); + const activeProfile = await getActiveContextProfile(); + const modePolicy = getModePolicy(activeProfile); + const contextData = await getStoredContexts(activeProfile.id); + const { systemContexts, generalContexts } = selectContextsForRequest(contextData, speedMode, modePolicy); + const memoryStore = await getMemoryStore(); + const memoryContext = buildMemoryContext(question, memoryStore, speedMode); const systemPromptExtra = systemContexts.length ? systemContexts.map((ctx) => `${ctx.title}:\n${ctx.content}`).join('\n\n---\n\n') @@ -382,11 +575,11 @@ async function getAIResponse(question) { } const mergedContextRaw = systemPromptExtra - ? `${systemPromptExtra}${contextString ? `\n\n---\n\n${contextString}` : ''}` - : contextString; + ? `${systemPromptExtra}${contextString ? `\n\n---\n\n${contextString}` : ''}${memoryContext ? `\n\n---\n\n${memoryContext}` : ''}` + : `${contextString}${memoryContext ? `\n\n---\n\n${memoryContext}` : ''}`; const mergedContext = truncateContext(mergedContextRaw, provider, speedMode); - const requestOptions = buildRequestOptions(speedMode); + const requestOptions = buildRequestOptions(speedMode, modePolicy); const body = JSON.stringify(service.formatRequest(model, question, mergedContext, requestOptions)); const controller = new AbortController(); @@ -416,7 +609,7 @@ async function getAIResponse(question) { } const data = await response.json(); - const answer = service.parseResponse(data); + const answer = normalizeGeneratedText(service.parseResponse(data)); chrome.runtime.sendMessage({ action: 'updateAIResponse', response: answer }); broadcastToRemoteDevices('aiResponse', { response: answer, question }); @@ -456,13 +649,15 @@ function truncateContext(context, provider, speedMode) { return `${context.slice(0, maxChars)}\n\n[Context truncated to fit model limits.]`; } -function selectContextsForRequest(contexts, speedMode) { +function selectContextsForRequest(contexts, speedMode, modePolicy = DEFAULT_MODE_POLICIES.custom) { const sorted = [...contexts].sort((a, b) => (b.createdAt || '').localeCompare(a.createdAt || '')); const systemContexts = sorted.filter((ctx) => ctx.type === 'system'); const generalContexts = sorted.filter((ctx) => ctx.type !== 'system'); - const maxGeneralItems = speedMode ? 2 : 4; - const maxSystemItems = speedMode ? 1 : 2; + const baseGeneralItems = Number(modePolicy.maxGeneralItems) || 4; + const baseSystemItems = Number(modePolicy.maxSystemItems) || 2; + const maxGeneralItems = speedMode ? Math.max(1, baseGeneralItems - 2) : baseGeneralItems; + const maxSystemItems = speedMode ? 1 : baseSystemItems; const maxItemChars = speedMode ? 4000 : 8000; const trimItem = (ctx) => ({ @@ -476,11 +671,78 @@ function selectContextsForRequest(contexts, speedMode) { }; } -function buildRequestOptions(speedMode) { +function buildRequestOptions(speedMode, modePolicy = DEFAULT_MODE_POLICIES.custom) { if (!speedMode) { - return { maxTokens: 200, temperature: 0.7 }; + return { + maxTokens: 200, + temperature: 0.7, + systemPrompt: modePolicy.systemPrompt || DEFAULT_LISTENING_PROMPT + }; + } + return { + maxTokens: 120, + temperature: 0.4, + systemPrompt: modePolicy.systemPrompt || DEFAULT_LISTENING_PROMPT + }; +} + +function normalizeGeneratedText(text) { + if (typeof text !== 'string') return text; + return text + .replace(/\r\n/g, '\n') + .replace(/\\r\\n/g, '\n') + .replace(/\\n/g, '\n') + .replace(/\\t/g, '\t'); +} + +function buildStandupRequest(provider, model, transcriptText, options, apiKey) { + switch (provider) { + case 'openai': + case 'deepseek': + return { + model, + messages: [ + { role: 'system', content: STANDUP_PROMPT }, + { role: 'user', content: transcriptText } + ], + max_tokens: options.maxTokens || 200, + temperature: options.temperature ?? 0.4 + }; + case 'anthropic': + return { + model, + max_tokens: options.maxTokens || 200, + messages: [ + { role: 'user', content: `${STANDUP_PROMPT}\n\nTranscript:\n${transcriptText}` } + ] + }; + case 'google': + return { + systemInstruction: { + role: 'system', + parts: [{ text: STANDUP_PROMPT }] + }, + contents: [ + { role: 'user', parts: [{ text: transcriptText }] } + ], + generationConfig: { + maxOutputTokens: options.maxTokens || 200, + temperature: options.temperature ?? 0.4 + } + }; + case 'ollama': + return { + model, + prompt: `${STANDUP_PROMPT}\n\nTranscript:\n${transcriptText}\n\nJSON:`, + stream: false, + options: { + temperature: options.temperature ?? 0.4, + num_predict: options.maxTokens || 200 + } + }; + default: + return AI_SERVICES.openai.formatRequest(model, transcriptText, STANDUP_PROMPT, options); } - return { maxTokens: 120, temperature: 0.4 }; } function getSpeedModeFromStorage() { @@ -546,6 +808,824 @@ function openAssistantWindow(sendResponse) { ); } +function openSettingsWindow(sendResponse) { + chrome.windows.create( + { + url: chrome.runtime.getURL('settings.html'), + type: 'popup', + width: 900, + height: 720 + }, + (win) => { + if (chrome.runtime.lastError || !win) { + sendResponse({ success: false, error: 'Failed to open settings window.' }); + return; + } + sendResponse({ success: true }); + } + ); +} + +function getAdvancedSettings() { + return new Promise((resolve) => { + chrome.storage.sync.get(['advancedSettings'], (result) => { + resolve(result.advancedSettings || {}); + }); + }); +} + +function mcpRequest(serverUrl, apiKey, method, params = {}) { + const body = { + jsonrpc: '2.0', + id: Date.now(), + method, + params + }; + const headers = { 'Content-Type': 'application/json' }; + if (apiKey) { + headers['x-mcp-api-key'] = apiKey; + headers.Authorization = `Bearer ${apiKey}`; + } + return fetch(serverUrl, { + method: 'POST', + headers, + body: JSON.stringify(body) + }).then(async (response) => { + const data = await response.json().catch(() => ({})); + if (!response.ok) { + throw new Error(data.error?.message || `MCP request failed (${response.status}).`); + } + if (data.error) { + throw new Error(data.error.message || 'MCP error.'); + } + return data.result; + }); +} + +async function ensureMcpInitialized(serverUrl, apiKey) { + if (state.mcpInitialized) return; + try { + await mcpRequest(serverUrl, apiKey, 'initialize', { + clientInfo: { name: 'AI Assistant', version: '1.0' } + }); + state.mcpInitialized = true; + } catch (error) { + // MCP servers may not require initialize; ignore failures. + } +} + +async function listMcpTools(sendResponse) { + try { + const { mcpServerUrl, mcpApiKey } = await getAdvancedSettings(); + if (!mcpServerUrl) { + sendResponse({ success: false, error: 'MCP server URL is not set.' }); + return; + } + const endpoint = resolveMcpEndpoint(mcpServerUrl); + await ensureMcpInitialized(endpoint, mcpApiKey); + const result = await mcpRequest(endpoint, mcpApiKey, 'tools/list'); + sendResponse({ success: true, tools: result?.tools || [] }); + } catch (error) { + sendResponse({ success: false, error: error.message || 'Failed to list tools.' }); + } +} + +async function callMcpTool(toolName, args, sendResponse) { + try { + const { mcpServerUrl, mcpApiKey } = await getAdvancedSettings(); + if (!mcpServerUrl) { + sendResponse({ success: false, error: 'MCP server URL is not set.' }); + return; + } + if (!toolName) { + sendResponse({ success: false, error: 'Select a tool first.' }); + return; + } + const endpoint = resolveMcpEndpoint(mcpServerUrl); + await ensureMcpInitialized(endpoint, mcpApiKey); + const result = await mcpRequest(endpoint, mcpApiKey, 'tools/call', { + name: toolName, + arguments: args || {} + }); + sendResponse({ success: true, result }); + } catch (error) { + sendResponse({ success: false, error: error.message || 'Failed to call tool.' }); + } +} + +async function runAutomation(trigger, automationId, options, sendResponse) { + try { + const runOptions = options || { testMode: false }; + const automations = await getAutomationsWithMigration(); + const eligible = automations.filter((automation) => { + if (automationId && automation.id !== automationId) return false; + if (!automation.enabled) return false; + const triggers = automation.triggers || {}; + return Boolean(triggers[trigger]); + }); + + if (!eligible.length) { + sendResponse({ success: false, error: 'No automations match this trigger.' }); + return; + } + + const results = []; + for (const automation of eligible) { + if (automation.requireApproval && trigger !== 'manual' && !runOptions.testMode) { + if (state.pendingAutomation) continue; + state.pendingAutomation = { trigger, automation, options: runOptions }; + chrome.runtime.sendMessage({ + action: 'automation:needsApproval', + trigger, + automationName: automation.name || 'Automation', + actions: describeAutomationTargets(automation) + }); + continue; + } + const automationResult = await runAutomationByType(automation, runOptions); + results.push({ automationId: automation.id, ...automationResult }); + } + + sendResponse({ success: true, results }); + } catch (error) { + sendResponse({ success: false, error: error.message || 'Automation failed.' }); + } +} + +async function runStandupAutomationFor(automation, options) { + const standup = automation.standup || {}; + let aiResult; + let summaryText; + let session = null; + const now = new Date(); + const dateIso = now.toISOString().slice(0, 10); + const timeIso = now.toISOString().slice(11, 19); + const dateCompact = dateIso.replace(/-/g, ''); + const dateTimeIso = `${dateIso}_${timeIso.replace(/:/g, '-')}`; + const weekday = now.toLocaleDateString('en-US', { weekday: 'long' }); + const monthName = now.toLocaleDateString('en-US', { month: 'long' }); + const humanDate = now.toLocaleDateString('en-US', { + weekday: 'long', + year: 'numeric', + month: 'long', + day: 'numeric' + }); + if (options?.testMode) { + aiResult = buildStandupTestResult(); + summaryText = formatStandupText(aiResult, humanDate); + } else { + session = await getLatestSessionWithTranscript(); + if (!session || !session.transcript || session.transcript.length === 0) { + throw new Error('No transcript available for standup summary.'); + } + + const transcriptText = session.transcript.map((entry) => entry.text).join('\n'); + aiResult = await generateStandupSummary(transcriptText); + summaryText = formatStandupText(aiResult, humanDate); + } + + if (session && session.storeConsent) { + await addMemorySummary({ sessionId: session.id, content: summaryText }); + } + + const templateContext = { + summary: summaryText, + summary_brief: aiResult.summary || '', + summary_full: summaryText, + summary_json: JSON.stringify(aiResult, null, 2), + action_items: (aiResult.action_items || []).map((item) => `- ${item.text}${item.assignee ? ` (owner: ${item.assignee})` : ''}`).join('\n'), + action_items_json: JSON.stringify(aiResult.action_items || [], null, 2), + blockers: (aiResult.blockers || []).map((item) => `- ${item}`).join('\n'), + decisions: (aiResult.decisions || []).map((item) => `- ${item}`).join('\n'), + date: dateIso, + date_compact: dateCompact, + datetime: dateTimeIso, + time: timeIso, + weekday, + month: monthName, + date_human: humanDate + }; + + const results = []; + + if (standup.discordToolName && standup.discordArgsTemplate) { + const args = buildTemplateArgs(standup.discordArgsTemplate, templateContext); + const result = await callMcpToolInternal(standup.discordToolName, args); + results.push({ target: 'discord', success: true, result }); + } + + if (standup.nextcloudToolName) { + const nextcloudTemplate = standup.nextcloudArgsTemplate || buildDefaultNextcloudTemplate(); + const args = buildTemplateArgs(nextcloudTemplate, templateContext); + const result = await callMcpToolInternal(standup.nextcloudToolName, args); + results.push({ target: 'nextcloud', success: true, result }); + } + + return { summary: aiResult, results }; +} + +function buildStandupTestResult() { + return { + summary: 'Test standup summary: Worked on automation UX, reviewed MCP integration, and fixed styling issues.', + action_items: [ + { text: 'Post standup summary to Discord', assignee: 'Automation' }, + { text: 'Save notes to Nextcloud', assignee: 'Automation' } + ], + blockers: [], + decisions: ['Proceed with automation manager list + editor layout.'] + }; +} + +async function getLatestSessionWithTranscript() { + const store = await getMemoryStore(); + const sessions = Array.isArray(store.sessions) ? store.sessions : []; + if (!sessions.length) return null; + const sorted = [...sessions].sort((a, b) => (b.startedAt || '').localeCompare(a.startedAt || '')); + return sorted.find((session) => Array.isArray(session.transcript) && session.transcript.length > 0) || sorted[0]; +} + +async function generateStandupSummary(transcriptText) { + const storedConfig = await getAIConfigFromStorage(); + if (storedConfig) { + state.currentAIConfig = storedConfig; + } + const { provider, model } = state.currentAIConfig; + const service = AI_SERVICES[provider]; + if (!service) { + throw new Error(`Unsupported AI provider: ${provider}`); + } + + let apiKey = null; + if (provider !== 'ollama') { + apiKey = await getApiKey(provider); + if (!apiKey) { + throw new Error(`${provider.charAt(0).toUpperCase() + provider.slice(1)} API key not set`); + } + } + + let url; + let headers; + if (provider === 'google') { + url = service.baseUrl(apiKey, model); + headers = service.headers(); + } else { + url = service.baseUrl; + headers = service.headers(apiKey); + } + + const sanitizedTranscript = sanitizeTranscriptForSummary(transcriptText); + const requestOptions = { + ...buildRequestOptions(false), + maxTokens: 600, + temperature: 0.2 + }; + const body = JSON.stringify(buildStandupRequest(provider, model, sanitizedTranscript, requestOptions, apiKey)); + + const response = await fetch(url, { + method: 'POST', + headers, + body + }); + + if (!response.ok) { + const errorText = await response.text(); + throw new Error(`Standup summary failed: ${errorText}`); + } + + const data = await response.json(); + const raw = normalizeGeneratedText(service.parseResponse(data)); + const parsed = parseJsonFromText(raw); + if (!parsed) { + return { + summary: sanitizeSummaryField(normalizeGeneratedText(raw).trim()), + action_items: [], + blockers: [], + decisions: [] + }; + } + const normalized = normalizeStandupResult(parsed); + return normalized; +} + +function sanitizeTranscriptForSummary(transcriptText) { + if (!transcriptText) return ''; + const lines = transcriptText + .split('\n') + .map((line) => normalizeGeneratedText(line).trim()) + .filter(Boolean) + .filter((line) => !TRANSCRIPT_NOISE_PATTERNS.some((pattern) => pattern.test(line))); + + return lines.join('\n').trim(); +} + +function sanitizeSummaryField(text) { + if (!text) return ''; + return String(text) + .replace(/\n{3,}/g, '\n\n') + .replace(/[ \t]{2,}/g, ' ') + .trim(); +} + +function normalizeStandupResult(parsed) { + const summary = sanitizeSummaryField(normalizeGeneratedText(parsed.summary || '')); + const actionItems = Array.isArray(parsed.action_items) + ? parsed.action_items + .map((item) => ({ + text: sanitizeSummaryField(normalizeGeneratedText(item?.text || '')), + assignee: sanitizeSummaryField(normalizeGeneratedText(item?.assignee || '')) + })) + .filter((item) => Boolean(item.text)) + : []; + const blockers = Array.isArray(parsed.blockers) + ? parsed.blockers + .map((item) => sanitizeSummaryField(normalizeGeneratedText(item))) + .filter(Boolean) + : []; + const decisions = Array.isArray(parsed.decisions) + ? parsed.decisions + .map((item) => sanitizeSummaryField(normalizeGeneratedText(item))) + .filter(Boolean) + : []; + + return { + summary, + action_items: actionItems, + blockers, + decisions + }; +} + +function parseJsonFromText(text) { + if (!text) return null; + try { + return JSON.parse(text); + } catch (error) { + const match = text.match(/\{[\s\S]*\}/); + if (!match) return null; + try { + return JSON.parse(match[0]); + } catch (inner) { + return null; + } + } +} + +function formatStandupText(result, humanDate = '') { + const summary = sanitizeSummaryField(result.summary || ''); + const actionItems = (result.action_items || []).map((item) => `- ${item.text}${item.assignee ? ` (owner: ${item.assignee})` : ''}`).join('\n'); + const blockers = (result.blockers || []).map((item) => `- ${item}`).join('\n'); + const decisions = (result.decisions || []).map((item) => `- ${item}`).join('\n'); + return [ + '## Meeting Summary', + humanDate ? `Date: ${humanDate}` : '', + '', + summary || '- None', + '', + '### Action Items', + actionItems || '- None', + '', + '### Blockers', + blockers || '- None', + '', + '### Decisions', + decisions || '- None' + ].join('\n'); +} + +function buildDefaultNextcloudTemplate() { + return JSON.stringify( + { + path: 'notes/daily/standup-{{date}}.txt', + content: + 'Daily Standup - {{date_human}}\n\nSummary\n{{summary_full}}\n\nAction Items\n{{action_items}}\n\nBlockers\n{{blockers}}\n\nDecisions\n{{decisions}}' + }, + null, + 2 + ); +} + +function buildTemplateArgs(template, context) { + try { + const parsedTemplate = JSON.parse(template); + return applyTemplateToValue(parsedTemplate, context); + } catch (error) { + throw new Error('Standup args template must be valid JSON.'); + } +} + +function applyTemplateToValue(value, context) { + if (typeof value === 'string') { + return renderTemplateString(value, context); + } + if (Array.isArray(value)) { + return value.map((item) => applyTemplateToValue(item, context)); + } + if (value && typeof value === 'object') { + const next = {}; + Object.keys(value).forEach((key) => { + next[key] = applyTemplateToValue(value[key], context); + }); + return next; + } + return value; +} + +function renderTemplateString(template, context) { + return Object.keys(context).reduce((result, key) => { + const value = context[key] ?? ''; + return result.split(`{{${key}}}`).join(String(value)); + }, template); +} +async function approveAutomation(sendResponse) { + if (!state.pendingAutomation) { + sendResponse({ success: false, error: 'No pending automation.' }); + return; + } + const { trigger, automation, options } = state.pendingAutomation; + state.pendingAutomation = null; + const result = await runAutomationByType(automation, options || { testMode: false }); + sendResponse({ success: true, trigger, result }); +} + +function rejectAutomation(sendResponse) { + if (!state.pendingAutomation) { + sendResponse({ success: false, error: 'No pending automation.' }); + return; + } + const trigger = state.pendingAutomation.trigger; + state.pendingAutomation = null; + sendResponse({ success: true, trigger }); +} + +async function runAutomationByType(automation, options) { + if (automation.kind === 'standup') { + const result = await runStandupAutomationFor(automation, options || { testMode: false }); + return { success: true, kind: 'standup', result }; + } + const actions = Array.isArray(automation.actions) ? automation.actions : []; + if (!actions.length) { + return { success: false, error: 'No actions configured.' }; + } + const templateContext = await buildAutomationTemplateContext(); + const results = []; + for (const action of actions) { + try { + let result; + if (action.type === 'webhook') { + result = await executeWebhookAction(action, templateContext); + } else { + result = await callMcpToolInternal(action.toolName, action.args || {}); + } + results.push({ action: action.label, success: true, result }); + } catch (error) { + results.push({ action: action.label, success: false, error: error.message }); + } + } + return { success: true, kind: 'actions', results }; +} + +function describeAutomationTargets(automation) { + if (automation.kind === 'standup') { + const targets = []; + if (automation.standup?.discordToolName) targets.push({ label: 'Discord', toolName: automation.standup.discordToolName }); + if (automation.standup?.nextcloudToolName) targets.push({ label: 'Nextcloud', toolName: automation.standup.nextcloudToolName }); + return targets; + } + return (automation.actions || []).map((action) => ({ + label: action.label, + toolName: action.type === 'webhook' ? 'webhook' : action.toolName + })); +} + +async function getAutomationsWithMigration() { + const settings = await getAdvancedSettings(); + if (Array.isArray(settings.automations) && settings.automations.length > 0) { + return settings.automations; + } + const automations = []; + if (settings.automation) { + automations.push({ + id: 'legacy-actions', + name: 'Automation Actions', + kind: 'actions', + enabled: Boolean(settings.automation.enabled), + triggers: settings.automation.triggers || { sessionStart: false, sessionEnd: true, manual: true }, + requireApproval: settings.automation.requireApproval !== false, + actions: settings.automation.actions || [] + }); + } + if (settings.standupAutomation) { + automations.push({ + id: 'legacy-standup', + name: 'Daily Standup', + kind: 'standup', + enabled: Boolean(settings.standupAutomation.enabled), + triggers: settings.standupAutomation.triggers || { sessionEnd: true, manual: true }, + requireApproval: false, + standup: { + discordToolName: settings.standupAutomation.discordToolName || '', + discordArgsTemplate: settings.standupAutomation.discordArgsTemplate || '', + nextcloudToolName: settings.standupAutomation.nextcloudToolName || '', + nextcloudArgsTemplate: settings.standupAutomation.nextcloudArgsTemplate || '' + } + }); + } + if (automations.length) { + const nextSettings = { ...settings, automations }; + chrome.storage.sync.set({ advancedSettings: nextSettings }); + } + return automations; +} + +async function callMcpToolInternal(toolName, args) { + const { mcpServerUrl, mcpApiKey } = await getAdvancedSettings(); + if (!mcpServerUrl) { + throw new Error('MCP server URL is not set.'); + } + const endpoint = resolveMcpEndpoint(mcpServerUrl); + await ensureMcpInitialized(endpoint, mcpApiKey); + return mcpRequest(endpoint, mcpApiKey, 'tools/call', { + name: toolName, + arguments: args || {} + }); +} + +async function buildAutomationTemplateContext() { + const now = new Date(); + const dateIso = now.toISOString().slice(0, 10); + const timeIso = now.toISOString().slice(11, 19); + const dateTimeIso = `${dateIso}_${timeIso.replace(/:/g, '-')}`; + const humanDate = now.toLocaleDateString('en-US', { + weekday: 'long', + year: 'numeric', + month: 'long', + day: 'numeric' + }); + const latestSummary = await getLatestSummaryText(); + const structuredSummary = ensureStructuredReport(latestSummary.text, latestSummary.excerpt || '', humanDate); + return { + date: dateIso, + time: timeIso, + datetime: dateTimeIso, + date_human: humanDate, + summary: structuredSummary, + summary_full: structuredSummary, + summary_plain: latestSummary.text, + summary_source: latestSummary.source, + transcript_excerpt: latestSummary.excerpt || '', + session_id: latestSummary.sessionId || '' + }; +} + +async function getLatestSummaryText() { + const store = await getMemoryStore(); + const summaries = Array.isArray(store.summaries) ? store.summaries : []; + if (summaries.length) { + const sorted = [...summaries].sort((a, b) => (b.createdAt || '').localeCompare(a.createdAt || '')); + const latest = sorted.find((item) => sanitizeSummaryField(item?.content || '')); + if (latest) { + return { + text: sanitizeSummaryField(latest.content || ''), + source: 'memory_summary', + sessionId: latest.sessionId || '' + }; + } + } + + const sessions = Array.isArray(store.sessions) ? store.sessions : []; + if (sessions.length) { + const preferredId = state.lastSessionId || state.currentSessionId || null; + const byRecent = [...sessions].sort((a, b) => { + const aDate = a.endedAt || a.startedAt || a.createdAt || ''; + const bDate = b.endedAt || b.startedAt || b.createdAt || ''; + return bDate.localeCompare(aDate); + }); + const preferred = preferredId ? sessions.find((session) => session.id === preferredId) : null; + const session = preferred || byRecent[0]; + + if (session?.summaryId) { + const linked = summaries.find((item) => item.id === session.summaryId); + const linkedText = sanitizeSummaryField(linked?.content || ''); + if (linkedText) { + return { + text: linkedText, + source: 'session_summary', + sessionId: session.id || '' + }; + } + } + + const transcriptEntries = Array.isArray(session?.transcript) ? session.transcript : []; + const transcriptLines = transcriptEntries + .map((entry) => sanitizeSummaryField(normalizeGeneratedText(entry?.text || ''))) + .filter(Boolean) + .filter((line) => !TRANSCRIPT_NOISE_PATTERNS.some((pattern) => pattern.test(line))); + + if (transcriptLines.length) { + const excerptLines = transcriptLines.slice(-8); + const excerpt = excerptLines.join('\n'); + const summary = sanitizeSummaryField(excerptLines.join(' ')).slice(0, 1600); + return { + text: summary || 'Session transcript is available, but could not be summarized.', + source: 'session_transcript', + excerpt, + sessionId: session.id || '' + }; + } + } + + return { + text: 'Session ended. No summary content was captured yet.', + source: 'fallback', + sessionId: state.lastSessionId || '' + }; +} + +function ensureStructuredReport(summaryText, transcriptExcerpt = '', humanDate = '') { + const text = sanitizeSummaryField(normalizeGeneratedText(summaryText || '')); + if (text.startsWith('## Standup Summary') || text.startsWith('## Session Summary') || text.startsWith('## Meeting Summary')) { + return normalizeMarkdownReport(text, humanDate); + } + + const excerptLines = sanitizeSummaryField(normalizeGeneratedText(transcriptExcerpt || '')) + .split('\n') + .map((line) => line.trim()) + .filter(Boolean) + .slice(-6); + const keyNotes = excerptLines.length + ? excerptLines.map((line) => `- ${line}`).join('\n') + : '- None captured.'; + + const summaryBody = text || 'No summary content was captured yet.'; + return [ + '## Meeting Summary', + humanDate ? `Date: ${humanDate}` : '', + '', + summaryBody, + '', + '### Action Items', + '- None reported.', + '', + '### Blockers', + '- None reported.', + '', + '### Decisions', + '- None reported.', + '', + '### Key Notes', + keyNotes + ].join('\n'); +} + +function normalizeMarkdownReport(reportText, humanDate = '') { + if (!reportText) return ''; + let text = String(reportText).replace(/\r\n/g, '\n'); + + // Recover common flattened markdown patterns. + text = text + .replace(/(##\s+[^\n#]+?)\s+(?=###\s+)/g, '$1\n\n') + .replace(/(##\s+[^\n#]+?)\s+(?=-\s+)/g, '$1\n\n') + .replace(/\s+(###\s+)/g, '\n\n$1') + .replace(/\s+-\s+/g, '\n- '); + + return canonicalizeReportSections(text, humanDate); +} + +function canonicalizeReportSections(text, humanDate = '') { + const lines = String(text || '') + .split('\n') + .map((line) => line.trim()) + .filter((line) => Boolean(line)); + + // Repair a common split artifact: "Standup Summar," + "y" + for (let i = 0; i < lines.length - 1; i += 1) { + if (/^standup summar,?$/i.test(lines[i]) && /^y$/i.test(lines[i + 1])) { + lines[i] = 'Meeting Summary'; + lines.splice(i + 1, 1); + break; + } + } + + const output = []; + let currentSection = 'summary'; + + const pushSection = (title, level = 3) => { + if (output.length) output.push(''); + output.push(`${'#'.repeat(level)} ${title}`); + output.push(''); + }; + + const sectionForLine = (line) => { + const normalized = line.replace(/[,:;\-]+$/g, '').toLowerCase(); + if (normalized === 'standup summary' || normalized === 'session summary' || normalized === 'meeting summary') { + return { key: 'summary', title: 'Meeting Summary', level: 2 }; + } + if (normalized === 'action items') return { key: 'actions', title: 'Action Items', level: 3 }; + if (normalized === 'blockers') return { key: 'blockers', title: 'Blockers', level: 3 }; + if (normalized === 'decisions') return { key: 'decisions', title: 'Decisions', level: 3 }; + if (normalized === 'key notes') return { key: 'notes', title: 'Key Notes', level: 3 }; + return null; + }; + + // Ensure report always starts with top heading. + const firstSection = sectionForLine(lines[0] || ''); + if (!firstSection || firstSection.key !== 'summary') { + pushSection('Meeting Summary', 2); + } + if (humanDate) { + output.push(`Date: ${humanDate}`); + output.push(''); + } + + for (const rawLine of lines) { + const line = rawLine.replace(/\s+/g, ' ').replace(/[,:;]+$/g, '').trim(); + if (!line) continue; + + const nextSection = sectionForLine(line); + if (nextSection) { + currentSection = nextSection.key; + pushSection(nextSection.title, nextSection.level); + continue; + } + + if (currentSection === 'summary') { + output.push(line); + } else if (/^- /.test(line)) { + output.push(line); + } else { + output.push(`- ${line}`); + } + } + + return output + .join('\n') + .replace(/\n{3,}/g, '\n\n') + .trim(); +} + +async function executeWebhookAction(action, context) { + const settings = await getAdvancedSettings(); + const url = action.webhookUrl || settings.webhookUrl; + if (!url) { + throw new Error('Webhook action missing URL and no global webhook URL is configured.'); + } + + const method = action.method || 'POST'; + const headers = { ...(action.headers || {}) }; + const hasExplicitContentType = Boolean(headers['Content-Type'] || headers['content-type']); + const retryValue = Number(action.retryCount); + const retryCount = Number.isFinite(retryValue) ? Math.max(0, Math.min(5, Math.floor(retryValue))) : 0; + + const templateSource = action.bodyTemplate || settings.webhookPayload || '{"message":"{{summary}}","date":"{{date}}"}'; + let bodyToSend = templateSource; + + try { + // Parse template JSON first, then apply placeholders to values to avoid broken escaping. + const parsedTemplate = JSON.parse(templateSource); + const templatedObject = applyTemplateToValue(parsedTemplate, context || {}); + bodyToSend = JSON.stringify(templatedObject); + if (!hasExplicitContentType) { + headers['Content-Type'] = 'application/json'; + } + } catch (error) { + // Non-JSON template: render as plain text payload. + bodyToSend = renderTemplateString(templateSource, context || {}); + if (!hasExplicitContentType) { + headers['Content-Type'] = 'text/plain'; + } + } + + let lastError = null; + for (let attempt = 0; attempt <= retryCount; attempt += 1) { + try { + const response = await fetch(url, { + method, + headers, + body: bodyToSend + }); + const text = await response.text().catch(() => ''); + if (!response.ok) { + throw new Error(`Webhook responded with ${response.status}${text ? `: ${text}` : ''}`); + } + return { + status: response.status, + response: text + }; + } catch (error) { + lastError = error; + } + } + + throw lastError || new Error('Webhook action failed.'); +} + +function resolveMcpEndpoint(rawUrl) { + if (!rawUrl) return ''; + const trimmed = rawUrl.replace(/\/$/, ''); + if (trimmed.endsWith('/mcp')) return trimmed; + return `${trimmed}/mcp`; +} + function getAIConfigFromStorage() { return new Promise((resolve) => { chrome.storage.sync.get(['aiProvider', 'selectedModel'], (result) => { @@ -573,9 +1653,265 @@ function getApiKey(provider) { }); } -function getStoredContexts() { +function getSttConfigFromStorage() { return new Promise((resolve) => { - chrome.storage.local.get('contexts', (result) => { + chrome.storage.sync.get( + ['sttProvider', 'sttModel', 'sttApiKeys', 'apiKeys', 'sttEndpoint', 'sttLanguageMode', 'sttForcedLanguage', 'sttTask', 'sttVadFilter', 'sttBeamSize'], + (result) => { + const provider = result.sttProvider || 'openai'; + const model = result.sttModel || 'whisper-1'; + const sttApiKeys = result.sttApiKeys || {}; + const apiKeys = result.apiKeys || {}; + const apiKey = sttApiKeys[provider] || (provider === 'openai' ? apiKeys.openai : ''); + const endpoint = result.sttEndpoint || 'http://localhost:8790/transcribe'; + const languageMode = result.sttLanguageMode || 'auto'; + const forcedLanguage = String(result.sttForcedLanguage || '').trim().toLowerCase(); + const task = result.sttTask || 'transcribe'; + const vadFilter = result.sttVadFilter !== false; + const beamSize = Math.min(10, Math.max(1, Number(result.sttBeamSize) || 5)); + const language = languageMode === 'forced' && forcedLanguage + ? forcedLanguage + : (languageMode === 'auto' && state.sttSessionLanguage ? state.sttSessionLanguage : ''); + + resolve({ provider, model, apiKey, endpoint, languageMode, forcedLanguage, language, task, vadFilter, beamSize }); + } + ); + }); +} + +function normalizeLocalSttEndpoint(rawEndpoint) { + if (!rawEndpoint) return 'http://localhost:8790/transcribe'; + const trimmed = rawEndpoint.replace(/\/$/, ''); + if (trimmed.endsWith('/transcribe')) return trimmed; + return `${trimmed}/transcribe`; +} + +async function transcribeWithLocalBridge(sttConfig, audioBase64, mimeType, captureMode) { + const endpoint = normalizeLocalSttEndpoint(sttConfig.endpoint); + const headers = {}; + if (sttConfig.apiKey) { + headers.Authorization = `Bearer ${sttConfig.apiKey}`; + headers['x-api-key'] = sttConfig.apiKey; + } + + const blob = decodeBase64ToBlob(audioBase64, mimeType || 'audio/webm'); + const extension = (mimeType || 'audio/webm').includes('mp4') ? 'mp4' : 'webm'; + const formData = new FormData(); + formData.append('file', blob, `chunk.${extension}`); + formData.append('task', sttConfig.task || 'transcribe'); + formData.append('vad_filter', String(Boolean(sttConfig.vadFilter))); + formData.append('beam_size', String(sttConfig.beamSize || 5)); + if (sttConfig.language) { + formData.append('language', sttConfig.language); + } + if (sttConfig.model) { + formData.append('model', sttConfig.model); + } + + const response = await fetch(endpoint, { + method: 'POST', + headers, + body: formData + }); + + if (!response.ok) { + const errorText = await response.text(); + return { success: false, error: `Local STT bridge failed (${response.status}): ${errorText}` }; + } + + const data = await response.json(); + const transcript = normalizeGeneratedText((data.text || data.transcript || '').trim()); + return { success: true, transcript, language: data.language || '' }; +} + +async function testSttConnection() { + const sttConfig = await getSttConfigFromStorage(); + + if (sttConfig.provider === 'browser') { + return { success: true, message: 'Browser STT selected. No remote connection required.' }; + } + + if (sttConfig.provider === 'local') { + const endpoint = normalizeLocalSttEndpoint(sttConfig.endpoint); + const healthEndpoint = endpoint.replace(/\/transcribe$/, '/health'); + const response = await fetch(healthEndpoint, { method: 'GET' }); + if (!response.ok) { + const text = await response.text(); + return { success: false, error: `Local STT health check failed (${response.status}): ${text}` }; + } + return { success: true, message: `Local STT bridge reachable at ${healthEndpoint}` }; + } + + if (sttConfig.provider === 'openai') { + if (!sttConfig.apiKey) { + return { success: false, error: 'Missing OpenAI STT API key.' }; + } + const response = await fetch('https://api.openai.com/v1/models', { + method: 'GET', + headers: { + Authorization: `Bearer ${sttConfig.apiKey}` + } + }); + if (!response.ok) { + const text = await response.text(); + return { success: false, error: `OpenAI STT check failed (${response.status}): ${text}` }; + } + return { success: true, message: 'OpenAI STT connection successful.' }; + } + + return { success: false, error: `Unsupported STT provider: ${sttConfig.provider}` }; +} + +function decodeBase64ToBlob(base64Audio, mimeType = 'audio/webm') { + const binaryString = atob(base64Audio || ''); + const bytes = new Uint8Array(binaryString.length); + for (let i = 0; i < binaryString.length; i += 1) { + bytes[i] = binaryString.charCodeAt(i); + } + return new Blob([bytes], { type: mimeType }); +} + +async function transcribeAudioChunk(audioBase64, mimeType, captureMode) { + if (!audioBase64) { + return { success: false, error: 'No audio chunk provided.' }; + } + + const sttConfig = await getSttConfigFromStorage(); + if (sttConfig.provider === 'browser') { + return { success: false, error: 'Browser STT is selected. Switch STT provider to OpenAI for true tab/mixed transcription.' }; + } + if (sttConfig.provider === 'local') { + const localResult = await transcribeWithLocalBridge(sttConfig, audioBase64, mimeType, captureMode); + if (!localResult.success) return localResult; + const localTranscript = localResult.transcript; + const localLanguage = String(localResult.language || '').trim().toLowerCase(); + if (!localTranscript) { + return { success: true, transcript: '' }; + } + if (sttConfig.languageMode === 'auto' && !state.sttSessionLanguage && localLanguage) { + state.sttSessionLanguage = localLanguage; + } + chrome.runtime.sendMessage({ action: 'updateTranscript', transcript: localTranscript }); + appendTranscriptToCurrentSession(localTranscript); + if (isQuestion(localTranscript)) { + getAIResponse(localTranscript); + } + return { success: true, transcript: localTranscript }; + } + if (sttConfig.provider !== 'openai') { + return { success: false, error: `Unsupported STT provider: ${sttConfig.provider}` }; + } + if (!sttConfig.apiKey) { + return { success: false, error: 'STT API key missing. Save OpenAI STT key in Assistant Setup.' }; + } + + const blob = decodeBase64ToBlob(audioBase64, mimeType || 'audio/webm'); + const formData = new FormData(); + formData.append('file', blob, `chunk.${(mimeType || 'audio/webm').includes('mp4') ? 'mp4' : 'webm'}`); + formData.append('model', sttConfig.model || 'whisper-1'); + formData.append('response_format', 'verbose_json'); + if (sttConfig.language) { + formData.append('language', sttConfig.language); + } + + const sttPath = sttConfig.task === 'translate' + ? 'https://api.openai.com/v1/audio/translations' + : 'https://api.openai.com/v1/audio/transcriptions'; + const response = await fetch(sttPath, { + method: 'POST', + headers: { + Authorization: `Bearer ${sttConfig.apiKey}` + }, + body: formData + }); + + if (!response.ok) { + const errorText = await response.text(); + return { success: false, error: `Transcription failed (${response.status}): ${errorText}` }; + } + + const data = await response.json(); + const transcript = normalizeGeneratedText((data.text || data.transcript || '').trim()); + const detectedLanguage = String(data.language || '').trim().toLowerCase(); + if (!transcript) { + return { success: true, transcript: '' }; + } + + if (sttConfig.languageMode === 'auto' && !state.sttSessionLanguage && detectedLanguage) { + state.sttSessionLanguage = detectedLanguage; + } + + chrome.runtime.sendMessage({ action: 'updateTranscript', transcript }); + appendTranscriptToCurrentSession(transcript); + + if (isQuestion(transcript)) { + getAIResponse(transcript); + } else if (captureMode === 'tab' || captureMode === 'mixed') { + chrome.runtime.sendMessage({ + action: 'updateAIResponse', + response: 'Listening... (ask a question to get a response)' + }); + } + + return { success: true, transcript }; +} + +function initializeContextProfiles() { + chrome.storage.sync.get([CONTEXT_PROFILES_STORAGE_KEY, ACTIVE_CONTEXT_PROFILE_STORAGE_KEY], (result) => { + const existingProfiles = Array.isArray(result[CONTEXT_PROFILES_STORAGE_KEY]) ? result[CONTEXT_PROFILES_STORAGE_KEY] : []; + const existingActive = result[ACTIVE_CONTEXT_PROFILE_STORAGE_KEY]; + + if (!existingProfiles.length) { + state.activeContextProfileId = existingActive || DEFAULT_CONTEXT_PROFILE_ID; + chrome.storage.sync.set({ + [CONTEXT_PROFILES_STORAGE_KEY]: DEFAULT_CONTEXT_PROFILES, + [ACTIVE_CONTEXT_PROFILE_STORAGE_KEY]: existingActive || DEFAULT_CONTEXT_PROFILE_ID + }); + return; + } + + if (!existingActive) { + chrome.storage.sync.set({ [ACTIVE_CONTEXT_PROFILE_STORAGE_KEY]: existingProfiles[0].id || DEFAULT_CONTEXT_PROFILE_ID }); + state.activeContextProfileId = existingProfiles[0].id || DEFAULT_CONTEXT_PROFILE_ID; + return; + } + state.activeContextProfileId = existingActive; + }); +} + +function getActiveContextProfile() { + return new Promise((resolve) => { + chrome.storage.sync.get([CONTEXT_PROFILES_STORAGE_KEY, ACTIVE_CONTEXT_PROFILE_STORAGE_KEY], (result) => { + const profiles = Array.isArray(result[CONTEXT_PROFILES_STORAGE_KEY]) && result[CONTEXT_PROFILES_STORAGE_KEY].length + ? result[CONTEXT_PROFILES_STORAGE_KEY] + : DEFAULT_CONTEXT_PROFILES; + + const requestedId = state.activeContextProfileId || result[ACTIVE_CONTEXT_PROFILE_STORAGE_KEY] || DEFAULT_CONTEXT_PROFILE_ID; + const activeProfile = profiles.find((profile) => profile.id === requestedId) || profiles[0] || DEFAULT_CONTEXT_PROFILES[0]; + resolve(activeProfile); + }); + }); +} + +function getModePolicy(profile) { + const mode = profile && profile.mode ? profile.mode : 'interview'; + const basePolicy = DEFAULT_MODE_POLICIES[mode] || DEFAULT_MODE_POLICIES.custom; + const customPrompt = profile && typeof profile.systemPrompt === 'string' ? profile.systemPrompt.trim() : ''; + return { + ...basePolicy, + systemPrompt: customPrompt || basePolicy.systemPrompt || DEFAULT_LISTENING_PROMPT + }; +} + +function getStoredContexts(profileId) { + return new Promise((resolve) => { + chrome.storage.local.get([CONTEXTS_BY_PROFILE_STORAGE_KEY, 'contexts'], (result) => { + const byProfile = result[CONTEXTS_BY_PROFILE_STORAGE_KEY] || {}; + const profileContexts = Array.isArray(byProfile[profileId]) ? byProfile[profileId] : null; + if (profileContexts) { + resolve(profileContexts); + return; + } resolve(result.contexts || []); }); }); @@ -632,42 +1968,373 @@ function buildTabCaptureErrorMessage(errorMsg) { } function initializeActiveState() { - chrome.storage.sync.get(['extensionActive'], (result) => { - if (chrome.runtime.lastError) { - state.isActive = true; - updateActionBadge(); - return; - } - state.isActive = result.extensionActive !== false; + // Extension is always active now; Start/Stop listening is the only user control. + chrome.storage.sync.set({ extensionActive: true }, () => { updateActionBadge(); }); } -function setActiveState(isActive, sendResponse) { - state.isActive = isActive; - chrome.storage.sync.set({ extensionActive: isActive }, () => { - updateActionBadge(); - if (!isActive) { - stopListeningAcrossTabs(); +function initializeMemoryStore() { + chrome.storage.local.get([MEMORY_STORAGE_KEY], (result) => { + if (chrome.runtime.lastError) { + return; } - sendResponse({ success: true, isActive }); + if (!result[MEMORY_STORAGE_KEY]) { + chrome.storage.local.set({ [MEMORY_STORAGE_KEY]: createDefaultMemoryStore() }); + } + }); +} + +function getMemoryStore() { + return new Promise((resolve) => { + chrome.storage.local.get([MEMORY_STORAGE_KEY], (result) => { + if (chrome.runtime.lastError) { + resolve(createDefaultMemoryStore()); + return; + } + const store = result[MEMORY_STORAGE_KEY] || createDefaultMemoryStore(); + if (store.version !== MEMORY_SCHEMA_VERSION) { + const migrated = { ...createDefaultMemoryStore(), ...store, version: MEMORY_SCHEMA_VERSION }; + chrome.storage.local.set({ [MEMORY_STORAGE_KEY]: migrated }, () => resolve(migrated)); + return; + } + resolve(store); + }); + }); +} + +function setMemoryStore(store) { + return new Promise((resolve) => { + chrome.storage.local.set({ [MEMORY_STORAGE_KEY]: store }, () => resolve(store)); + }); +} + +function updateMemoryProfile(profileUpdates) { + return getMemoryStore().then((store) => { + const updatedProfile = { + ...store.profile, + ...profileUpdates, + updatedAt: new Date().toISOString() + }; + const nextStore = { ...store, profile: updatedProfile }; + return setMemoryStore(nextStore); + }); +} + +function addMemorySession(sessionInput) { + return getMemoryStore().then((store) => { + const session = { + id: sessionInput.id || createMemoryId('session'), + title: sessionInput.title || 'Session', + createdAt: sessionInput.createdAt || new Date().toISOString(), + status: sessionInput.status || SESSION_STATUS.ACTIVE, + startedAt: sessionInput.startedAt || null, + pausedAt: sessionInput.pausedAt || null, + endedAt: sessionInput.endedAt || null, + notes: sessionInput.notes || '', + storeConsent: Boolean(sessionInput.storeConsent), + transcript: Array.isArray(sessionInput.transcript) ? sessionInput.transcript : [], + summaryId: sessionInput.summaryId || null + }; + const nextStore = { ...store, sessions: [...store.sessions, session] }; + return setMemoryStore(nextStore).then(() => session); + }); +} + +function addMemorySummary(summaryInput) { + return getMemoryStore().then((store) => { + const summary = { + id: summaryInput.id || createMemoryId('summary'), + sessionId: summaryInput.sessionId || null, + createdAt: summaryInput.createdAt || new Date().toISOString(), + content: summaryInput.content || '', + highlights: Array.isArray(summaryInput.highlights) ? summaryInput.highlights : [] + }; + const nextStore = { ...store, summaries: [...store.summaries, summary] }; + return setMemoryStore(nextStore).then(() => summary); + }); +} + +function addMemoryActionItems(itemsInput, sessionId) { + return getMemoryStore().then((store) => { + const items = (Array.isArray(itemsInput) ? itemsInput : []).map((item) => ({ + id: item.id || createMemoryId('action'), + sessionId: item.sessionId || sessionId || null, + createdAt: item.createdAt || new Date().toISOString(), + text: item.text || '', + owner: item.owner || '', + dueAt: item.dueAt || null, + done: Boolean(item.done) + })); + const nextStore = { ...store, actionItems: [...store.actionItems, ...items] }; + return setMemoryStore(nextStore).then(() => items); + }); +} + +function clearMemoryStore() { + const store = createDefaultMemoryStore(); + return setMemoryStore(store); +} + +function createMemoryId(prefix) { + if (crypto && typeof crypto.randomUUID === 'function') { + return `${prefix}_${crypto.randomUUID()}`; + } + return `${prefix}_${Math.random().toString(36).slice(2, 10)}${Date.now().toString(36)}`; +} + +function buildMemoryContext(question, store, speedMode) { + if (!store) return ''; + const docs = buildMemoryDocuments(store); + if (!docs.length) return ''; + + const ranked = rankDocuments(question, docs); + const maxItems = speedMode ? Math.max(1, Math.floor(RAG_MAX_ITEMS / 2)) : RAG_MAX_ITEMS; + const selected = ranked.filter((item) => item.score >= RAG_MIN_SCORE).slice(0, maxItems); + if (!selected.length) return ''; + + return selected + .map((item) => `Memory (${item.type}${item.title ? `: ${item.title}` : ''}):\n${item.content}`) + .join('\n\n---\n\n'); +} + +function buildMemoryDocuments(store) { + const docs = []; + const profile = store.profile || {}; + const profileContent = [profile.name, profile.role, profile.notes].filter(Boolean).join('\n'); + if (profileContent.trim()) { + docs.push({ + id: 'profile', + type: 'profile', + title: profile.name || profile.role || 'Profile', + content: profileContent + }); + } + + const summaries = Array.isArray(store.summaries) ? store.summaries : []; + summaries.forEach((summary) => { + if (!summary || !summary.content) return; + docs.push({ + id: summary.id, + type: 'summary', + title: summary.sessionId ? `Session ${summary.sessionId}` : 'Session summary', + content: summary.content + }); + }); + + return docs; +} + +function rankDocuments(query, docs) { + const queryTokens = tokenize(query); + if (!queryTokens.length) return []; + + const docTokens = docs.map((doc) => tokenize(doc.content)); + const idf = buildIdf(docTokens); + const queryVector = buildTfIdfVector(queryTokens, idf); + + return docs + .map((doc, index) => { + const vector = buildTfIdfVector(docTokens[index], idf); + return { ...doc, score: cosineSimilarity(queryVector, vector) }; + }) + .sort((a, b) => b.score - a.score); +} + +function tokenize(text) { + return String(text || '') + .toLowerCase() + .replace(/[^a-z0-9\s]/g, ' ') + .split(/\s+/) + .filter((token) => token.length > 2); +} + +function buildIdf(docTokens) { + const docCount = docTokens.length; + const docFreq = {}; + docTokens.forEach((tokens) => { + const seen = new Set(tokens); + seen.forEach((token) => { + docFreq[token] = (docFreq[token] || 0) + 1; + }); + }); + + const idf = {}; + Object.keys(docFreq).forEach((token) => { + idf[token] = Math.log((docCount + 1) / (docFreq[token] + 1)) + 1; + }); + return idf; +} + +function buildTfIdfVector(tokens, idf) { + const tf = {}; + tokens.forEach((token) => { + tf[token] = (tf[token] || 0) + 1; + }); + const vector = {}; + Object.keys(tf).forEach((token) => { + vector[token] = tf[token] * (idf[token] || 0); + }); + return vector; +} + +function cosineSimilarity(a, b) { + const aKeys = Object.keys(a); + const bKeys = Object.keys(b); + if (!aKeys.length || !bKeys.length) return 0; + + let dot = 0; + let aMag = 0; + let bMag = 0; + + aKeys.forEach((key) => { + const value = a[key]; + aMag += value * value; + if (b[key]) { + dot += value * b[key]; + } + }); + + bKeys.forEach((key) => { + const value = b[key]; + bMag += value * value; + }); + + if (aMag === 0 || bMag === 0) return 0; + return dot / (Math.sqrt(aMag) * Math.sqrt(bMag)); +} + +function updateMemorySession(sessionId, updates) { + return getMemoryStore().then((store) => { + const index = store.sessions.findIndex((session) => session.id === sessionId); + if (index === -1) { + return null; + } + const updated = { ...store.sessions[index], ...updates }; + const nextSessions = [...store.sessions]; + nextSessions[index] = updated; + const nextStore = { ...store, sessions: nextSessions }; + return setMemoryStore(nextStore).then(() => updated); + }); +} + +function ensureActiveSession() { + if (state.currentSessionStatus === SESSION_STATUS.PAUSED && state.currentSessionId) { + state.currentSessionStatus = SESSION_STATUS.ACTIVE; + updateMemorySession(state.currentSessionId, { status: SESSION_STATUS.ACTIVE, pausedAt: null }); + return; + } + + if (state.currentSessionStatus === SESSION_STATUS.ACTIVE && state.currentSessionId) { + return; + } + + addMemorySession({ + status: SESSION_STATUS.ACTIVE, + startedAt: new Date().toISOString(), + storeConsent: Boolean(state.pendingSessionConsent) + }).then((session) => { + state.currentSessionId = session.id; + state.currentSessionStatus = SESSION_STATUS.ACTIVE; + state.currentSessionConsent = Boolean(state.pendingSessionConsent); + state.pendingSessionConsent = null; + }); +} + +function pauseCurrentSession() { + if (!state.currentSessionId || state.currentSessionStatus !== SESSION_STATUS.ACTIVE) { + return; + } + state.currentSessionStatus = SESSION_STATUS.PAUSED; + updateMemorySession(state.currentSessionId, { + status: SESSION_STATUS.PAUSED, + pausedAt: new Date().toISOString() + }); +} + +function endCurrentSession() { + if (!state.currentSessionId || state.currentSessionStatus === SESSION_STATUS.IDLE) { + return; + } + const sessionId = state.currentSessionId; + state.currentSessionId = null; + state.currentSessionStatus = SESSION_STATUS.ENDED; + state.lastSessionId = sessionId; + updateMemorySession(sessionId, { + status: SESSION_STATUS.ENDED, + endedAt: new Date().toISOString() + }); +} + +function appendTranscriptToCurrentSession(text) { + if (!text || !state.currentSessionId) { + return; + } + if (!state.currentSessionConsent) { + return; + } + const entry = { + text: String(text), + createdAt: new Date().toISOString() + }; + getMemoryStore().then((store) => { + const index = store.sessions.findIndex((session) => session.id === state.currentSessionId); + if (index === -1) return; + const session = store.sessions[index]; + const transcript = Array.isArray(session.transcript) ? session.transcript : []; + const updated = { ...session, transcript: [...transcript, entry] }; + const nextSessions = [...store.sessions]; + nextSessions[index] = updated; + setMemoryStore({ ...store, sessions: nextSessions }); + }); +} + +function setCurrentSessionConsent(consent) { + state.currentSessionConsent = consent; + state.pendingSessionConsent = consent; + if (!state.currentSessionId) return; + updateMemorySession(state.currentSessionId, { storeConsent: consent }); +} + +function forgetCurrentSession() { + const sessionId = state.currentSessionId || state.lastSessionId; + if (!sessionId) return; + getMemoryStore().then((store) => { + const nextStore = { + ...store, + sessions: store.sessions.filter((session) => session.id !== sessionId), + summaries: store.summaries.filter((summary) => summary.sessionId !== sessionId), + actionItems: store.actionItems.filter((item) => item.sessionId !== sessionId) + }; + setMemoryStore(nextStore); + }); + state.currentSessionId = null; + state.currentSessionStatus = SESSION_STATUS.IDLE; + state.currentSessionConsent = false; + state.pendingSessionConsent = null; + if (state.lastSessionId === sessionId) { + state.lastSessionId = null; + } +} + +function saveCurrentSessionSummary(content, saveToMemory, sessionIdOverride) { + const sessionId = sessionIdOverride || state.currentSessionId || state.lastSessionId; + if (!sessionId) { + return Promise.resolve({ success: false, error: 'No active session to save.' }); + } + if (!content.trim()) { + return Promise.resolve({ success: false, error: 'Summary is empty.' }); + } + if (!saveToMemory) { + return Promise.resolve({ success: true, saved: false }); + } + return addMemorySummary({ sessionId, content }).then((summary) => { + updateMemorySession(sessionId, { summaryId: summary.id }); + return { success: true, saved: true, summaryId: summary.id }; }); } function updateActionBadge() { if (!chrome.action || !chrome.action.setBadgeText) return; - chrome.action.setBadgeText({ text: state.isActive ? 'ON' : 'OFF' }); - chrome.action.setBadgeBackgroundColor({ color: state.isActive ? '#2ecc71' : '#e74c3c' }); -} - -function stopListeningAcrossTabs() { - chrome.tabs.query({}, (tabs) => { - if (chrome.runtime.lastError || !tabs.length) return; - tabs.forEach((tab) => { - if (!tab.id) return; - chrome.tabs.sendMessage(tab.id, { action: 'stopCapture' }, () => { - // Ignore errors for tabs without the content script. - }); - }); - }); + chrome.action.setBadgeText({ text: '' }); } diff --git a/content.js b/content.js index 3da0f86..efda8c2 100644 --- a/content.js +++ b/content.js @@ -9,6 +9,12 @@ let overlayHidden = false; let analyserNode = null; let meterSource = null; let meterRaf = null; +let transcriptionRecorder = null; +let mixedTabStream = null; +let mixedMicStream = null; +let mixedOutputStream = null; +let lastTranscriptionErrorAt = 0; +let transcriptionWindowTimer = null; chrome.runtime.onMessage.addListener((request, sender, sendResponse) => { if (request.action === 'startCapture') { @@ -80,10 +86,7 @@ function startCapture(streamId) { overlayListening = true; ensureOverlay(); updateOverlayIndicator(); - updateOverlay( - 'response', - 'Tab audio is captured, but speech recognition uses the microphone. Use mic or mixed mode if you want transcription.' - ); + updateOverlay('response', 'Capturing tab audio and transcribing meeting audio...'); navigator.mediaDevices.getUserMedia({ audio: { chromeMediaSource: 'tab', @@ -93,9 +96,7 @@ function startCapture(streamId) { mediaStream = stream; audioContext = new AudioContext(); createAudioMeter(stream); - if (ensureSpeechRecognitionAvailable()) { - startRecognition(); - } + startTranscriptionRecorder(stream, 'tab'); }).catch((error) => { console.error('Error starting capture:', error); let errorMessage = 'Failed to start audio capture. '; @@ -147,18 +148,39 @@ function startMixedCapture(streamId) { overlayListening = true; ensureOverlay(); updateOverlayIndicator(); + updateOverlay('response', 'Capturing mixed audio (tab + mic) and transcribing...'); navigator.mediaDevices.getUserMedia({ audio: { chromeMediaSource: 'tab', chromeMediaSourceId: streamId } - }).then((stream) => { - mediaStream = stream; + }).then(async (tabStream) => { + mixedTabStream = tabStream; audioContext = new AudioContext(); - createAudioMeter(stream); - if (ensureSpeechRecognitionAvailable()) { - startRecognition(); + try { + mixedMicStream = await navigator.mediaDevices.getUserMedia({ audio: true }); + } catch (error) { + console.warn('Mixed mode mic unavailable, falling back to tab-only capture:', error); + mixedMicStream = null; + chrome.runtime.sendMessage({ + action: 'updateAIResponse', + response: 'Mic permission denied in mixed mode. Continuing with tab audio only.' + }); } + + const destination = audioContext.createMediaStreamDestination(); + const tabSource = audioContext.createMediaStreamSource(tabStream); + tabSource.connect(destination); + + if (mixedMicStream) { + const micSource = audioContext.createMediaStreamSource(mixedMicStream); + micSource.connect(destination); + } + + mixedOutputStream = destination.stream; + mediaStream = mixedOutputStream; + createAudioMeter(mixedOutputStream); + startTranscriptionRecorder(mixedOutputStream, 'mixed'); }).catch((error) => { console.error('Error starting mixed capture:', error); chrome.runtime.sendMessage({action: 'updateAIResponse', response: 'Failed to start mixed capture.'}); @@ -235,20 +257,148 @@ function ensureSpeechRecognitionAvailable() { return true; } +function stopTranscriptionRecorder() { + if (transcriptionWindowTimer) { + clearTimeout(transcriptionWindowTimer); + transcriptionWindowTimer = null; + } + if (transcriptionRecorder && transcriptionRecorder.state !== 'inactive') { + try { + transcriptionRecorder.stop(); + } catch (error) { + console.warn('Failed to stop transcription recorder:', error); + } + } + transcriptionRecorder = null; +} + +function blobToBase64(blob) { + return new Promise((resolve, reject) => { + const reader = new FileReader(); + reader.onloadend = () => { + const result = reader.result || ''; + const base64 = String(result).split(',')[1] || ''; + resolve(base64); + }; + reader.onerror = () => reject(new Error('Failed to read recorded audio chunk.')); + reader.readAsDataURL(blob); + }); +} + +function startTranscriptionRecorder(stream, mode) { + stopTranscriptionRecorder(); + const mimeType = MediaRecorder.isTypeSupported('audio/webm;codecs=opus') + ? 'audio/webm;codecs=opus' + : 'audio/webm'; + const WINDOW_MS = 6000; + + const sendBlobForTranscription = async (blob, currentMimeType) => { + if (!isCapturing || !blob || blob.size < 1024) return; + try { + const base64Audio = await blobToBase64(blob); + chrome.runtime.sendMessage( + { + action: 'transcribeAudioChunk', + audioBase64: base64Audio, + mimeType: currentMimeType || mimeType, + captureMode: mode + }, + (response) => { + if (chrome.runtime.lastError) return; + if (!response || !response.success) { + const now = Date.now(); + if (response && response.error && now - lastTranscriptionErrorAt > 6000) { + lastTranscriptionErrorAt = now; + chrome.runtime.sendMessage({ action: 'updateAIResponse', response: response.error }); + updateOverlay('response', response.error); + } + return; + } + if (!response.transcript) return; + updateOverlay('transcript', response.transcript); + } + ); + } catch (error) { + console.warn('Audio chunk transcription failed:', error); + } + }; + + const startWindow = () => { + if (!isCapturing) return; + const recorder = new MediaRecorder(stream, { mimeType }); + transcriptionRecorder = recorder; + const chunks = []; + + recorder.ondataavailable = (event) => { + if (event.data && event.data.size > 0) { + chunks.push(event.data); + } + }; + + recorder.onerror = (event) => { + const message = `Audio recorder error: ${event.error ? event.error.message : 'unknown'}`; + chrome.runtime.sendMessage({ action: 'updateAIResponse', response: message }); + updateOverlay('response', message); + }; + + recorder.onstop = async () => { + transcriptionRecorder = null; + if (!chunks.length) { + if (isCapturing) startWindow(); + return; + } + const blob = new Blob(chunks, { type: recorder.mimeType || mimeType }); + await sendBlobForTranscription(blob, recorder.mimeType || mimeType); + if (isCapturing) { + startWindow(); + } + }; + + recorder.start(); + transcriptionWindowTimer = setTimeout(() => { + transcriptionWindowTimer = null; + if (recorder.state !== 'inactive') { + recorder.stop(); + } + }, WINDOW_MS); + }; + + startWindow(); +} + function stopCapture() { isCapturing = false; overlayListening = false; updateOverlayIndicator(); + stopTranscriptionRecorder(); stopAudioMeter(); if (mediaStream) { mediaStream.getTracks().forEach(track => track.stop()); + mediaStream = null; + } + if (mixedTabStream) { + mixedTabStream.getTracks().forEach(track => track.stop()); + mixedTabStream = null; + } + if (mixedMicStream) { + mixedMicStream.getTracks().forEach(track => track.stop()); + mixedMicStream = null; + } + if (mixedOutputStream) { + mixedOutputStream.getTracks().forEach(track => track.stop()); + mixedOutputStream = null; } if (audioContext) { audioContext.close(); audioContext = null; } if (recognition) { - recognition.stop(); + try { + recognition.stop(); + } catch (error) { + console.warn('Failed to stop recognition:', error); + } + recognition = null; } } @@ -385,7 +535,7 @@ function ensureOverlay() {
- AI Interview Assistant + AI Assistant
diff --git a/contentScript.js b/contentScript.js index a6aba9f..b73d732 100644 --- a/contentScript.js +++ b/contentScript.js @@ -1,7 +1,7 @@ function createDraggableUI() { const uiHTML = `
-
AI Interview Assistant
+
AI Assistant
diff --git a/local_stt_bridge/LOCAL_STT_BRIDGE_GUIDE.md b/local_stt_bridge/LOCAL_STT_BRIDGE_GUIDE.md new file mode 100644 index 0000000..f58bde0 --- /dev/null +++ b/local_stt_bridge/LOCAL_STT_BRIDGE_GUIDE.md @@ -0,0 +1,88 @@ +# Local STT Bridge (faster-whisper) + +Self-hosted Speech-to-Text bridge for the Chrome extension. + +Primary project documentation lives in `README.md`. + +## 1) Install + +Use Python 3.11 or 3.12 (recommended). Python 3.13 may force source builds for audio deps. + +```bash +cd local_stt_bridge +python3.11 -m venv .venv +source .venv/bin/activate +pip install --upgrade pip setuptools wheel +pip install -r requirements.txt +``` + +### macOS build prerequisites (required if `av`/PyAV tries to build) + +```bash +brew install pkg-config ffmpeg +``` + +If install still fails on `PyAV`, recreate the venv with Python 3.11 and retry. + +## 2) Run + +```bash +cd local_stt_bridge +source .venv/bin/activate +export STT_MODEL=small +export STT_DEVICE=auto +export STT_COMPUTE_TYPE=int8 +# Optional auth key: +# export STT_API_KEY=your_local_key +uvicorn server:app --host 0.0.0.0 --port 8790 +``` + +## 3) Verify + +```bash +curl http://localhost:8790/health +``` + +## 4) Extension Setup + +In side panel: +- Assistant Setup -> Speech-to-Text Provider: `Local faster-whisper bridge` +- STT Model: `small` (start here) +- Local STT endpoint: `http://localhost:8790/transcribe` +- Optional Local STT API key if `STT_API_KEY` is set on server +- Optional quality/language controls: + - Language Mode: `Auto-detect` or `Force language` + - Forced language: e.g. `en`, `fr`, `de`, `ar` + - Task: `transcribe` or `translate` + - VAD filter: on/off + - Beam size: integer (default `5`) +- Click `Test STT Connection` from the extension to validate endpoint reachability. + +## API contract expected by the extension + +`POST /transcribe` with `multipart/form-data`: + +- `file` (required): uploaded audio chunk (`webm`/`mp4`/`wav`) +- `task` (optional): `transcribe` or `translate` +- `vad_filter` (optional): `true`/`false` +- `beam_size` (optional): integer +- `language` (optional): language code +- `model` (optional): model hint + +Optional auth headers when enabled: + +- `Authorization: Bearer ` +- `x-api-key: ` + +`GET /health` is used by extension `Test STT Connection`. + +## Public domain + HTTPS note + +If you expose this service on a public domain, use HTTPS via reverse proxy. +Chrome may auto-upgrade `http://` on HSTS domains to `https://`, which causes plain HTTP Uvicorn ports to fail with `Invalid HTTP request received`. + +## Notes + +- `faster-whisper` relies on FFmpeg for many input formats. +- For best CPU cost/performance, use `small` or `medium`. +- `large-v3` improves quality but uses significantly more compute. diff --git a/local_stt_bridge/__pycache__/server.cpython-313.pyc b/local_stt_bridge/__pycache__/server.cpython-313.pyc new file mode 100644 index 0000000..8b15a46 Binary files /dev/null and b/local_stt_bridge/__pycache__/server.cpython-313.pyc differ diff --git a/local_stt_bridge/requirements.txt b/local_stt_bridge/requirements.txt new file mode 100644 index 0000000..b836de2 --- /dev/null +++ b/local_stt_bridge/requirements.txt @@ -0,0 +1,3 @@ +fastapi==0.115.0 +uvicorn[standard]==0.30.6 +faster-whisper==1.0.3 diff --git a/local_stt_bridge/server.py b/local_stt_bridge/server.py new file mode 100644 index 0000000..7da6c94 --- /dev/null +++ b/local_stt_bridge/server.py @@ -0,0 +1,92 @@ +import base64 +import os +import tempfile +from typing import Optional + +from fastapi import FastAPI, Header, HTTPException +from fastapi.middleware.cors import CORSMiddleware +from pydantic import BaseModel + +try: + from faster_whisper import WhisperModel +except ImportError as exc: # pragma: no cover + raise RuntimeError("faster-whisper is required. Install dependencies from requirements.txt") from exc + + +STT_MODEL = os.getenv("STT_MODEL", "small") +STT_DEVICE = os.getenv("STT_DEVICE", "auto") +STT_COMPUTE_TYPE = os.getenv("STT_COMPUTE_TYPE", "int8") +STT_API_KEY = os.getenv("STT_API_KEY", "").strip() + +app = FastAPI(title="Local STT Bridge", version="1.0.0") +app.add_middleware( + CORSMiddleware, + allow_origins=["*"], + allow_credentials=False, + allow_methods=["*"], + allow_headers=["*"], +) + +model = WhisperModel(STT_MODEL, device=STT_DEVICE, compute_type=STT_COMPUTE_TYPE) + + +class TranscribeRequest(BaseModel): + audioBase64: str + mimeType: Optional[str] = "audio/webm" + captureMode: Optional[str] = "tab" + model: Optional[str] = None + + +@app.get("/health") +def health(): + return { + "ok": True, + "engine": "faster-whisper", + "model": STT_MODEL, + "device": STT_DEVICE, + "computeType": STT_COMPUTE_TYPE, + } + + +@app.post("/transcribe") +def transcribe(payload: TranscribeRequest, x_stt_api_key: Optional[str] = Header(default=None)): + if STT_API_KEY and x_stt_api_key != STT_API_KEY: + raise HTTPException(status_code=401, detail="Invalid STT API key") + + try: + audio_bytes = base64.b64decode(payload.audioBase64) + except Exception as exc: + raise HTTPException(status_code=400, detail=f"Invalid base64 audio payload: {exc}") from exc + + suffix = ".webm" + if payload.mimeType and "mp4" in payload.mimeType: + suffix = ".mp4" + elif payload.mimeType and "wav" in payload.mimeType: + suffix = ".wav" + + with tempfile.NamedTemporaryFile(suffix=suffix, delete=False) as tmp: + tmp.write(audio_bytes) + tmp_path = tmp.name + + try: + segments, info = model.transcribe( + tmp_path, + vad_filter=True, + beam_size=1, + language=None, + ) + text = " ".join(segment.text.strip() for segment in segments).strip() + return { + "success": True, + "text": text, + "language": info.language, + "duration": info.duration, + } + except Exception as exc: + raise HTTPException(status_code=500, detail=f"Transcription failed: {exc}") from exc + finally: + try: + os.remove(tmp_path) + except OSError: + pass + diff --git a/manifest.json b/manifest.json index ca2733d..3eeb24f 100644 --- a/manifest.json +++ b/manifest.json @@ -1,11 +1,10 @@ { "manifest_version": 3, - "name": "AI Interview Assistant", - "version": "1.0", + "name": "AI Assistant", + "version": "1.1.0", "description": "Monitors audio and answers questions in real-time using AI", "permissions": [ "tabCapture", - "audioCapture", "storage", "activeTab", "scripting", diff --git a/popup.html b/popup.html index 9fff25c..fcb6095 100644 --- a/popup.html +++ b/popup.html @@ -3,12 +3,12 @@ - AI Interview Assistant + AI Assistant
-

AI Interview Assistant

+

AI Assistant

diff --git a/remote-access.html b/remote-access.html index 4e6174b..5f74690 100644 --- a/remote-access.html +++ b/remote-access.html @@ -3,7 +3,7 @@ - AI Interview Assistant - Remote Access + AI Assistant - Remote Access