CustomGPT Widget - Voice-Enabled AI Assistant
A dual-mode AI assistant with voice and chat interfaces. Interact naturally through voice with real-time particle animations or 3D avatar, or use the text-based chat interface with message reactions and citations.
Get you CustomGPT.ai RAG API key here, needed to use this integration.
Features
Voice Mode
- Voice Activity Detection - Automatic speech detection, no button pressing required
- 3D Avatar with Lip-Sync - Realistic avatar with facial expressions and gestures (NEW!)
- Particle Animation Interface - Visual feedback with dynamic particle effects
- Multiple TTS Providers - Choose from OpenAI TTS, gTTS, ElevenLabs, Edge TTS, or StreamElements
- Avatar States - Dynamic mood changes and gestures during conversation (listening, thinking, speaking)
Chat Mode
- Text-Based Chat - Full-featured chat interface with markdown support
- Message Reactions - Like/dislike AI responses for feedback
- Citations - Hover over citations to see source details
- Per-Message TTS - Play any message as audio with speaker button
- Speech Input - Use microphone button for voice-to-text input
Shared Features
- AI-Powered Responses - Powered by CustomGPT for RAG-based responses or OpenAI for general chat
- Conversation Memory - Maintains context across the conversation
- Multi-language Support - Configure language for both speech recognition and synthesis
- Docker Ready - Easy deployment with Docker
Prerequisites
For Docker Deployment:
For Local Development:
- Python 3.10+
- Node.js 18+
- FFmpeg (Install guide)
- Yarn (
npm install -g yarn) - OpenAI API key
Table of Contents
- Quick Start (Docker)
- Docker Hub Deployment
- Local Development
- Website Integration
- Avatar Mode
- Configuration
- TTS Provider Options
- CustomGPT Integration
- AI Model Configuration
- Troubleshooting
Quick Start (Docker)
1. Create .env file
.env file# Required
OPENAI_API_KEY=sk-your-key-here
# Recommended
AI_COMPLETION_MODEL=gpt-4o-mini
TTS_PROVIDER=OPENAI
OPENAI_TTS_VOICE=nova
LANGUAGE=en2. Run container
chmod +x run.sh # First time only
./run.sh3. Open browser
Visit http://localhost:8000 and allow microphone access.
Docker Hub Deployment
Using Pre-built Image
The fastest way to deploy is using the pre-built Docker image:
docker pull zriyansh/customgpt-widget:latestSupported Architectures: AMD64, ARM64 (Mac M1/M2, Raspberry Pi)
Method 1: Docker Run (Simple)
docker run -d \
--name customgpt-widget \
-p 8000:8000 \
-e OPENAI_API_KEY=your_key_here \
-e AI_COMPLETION_MODEL=gpt-4o-mini \
-e TTS_PROVIDER=OPENAI \
-e LANGUAGE=en \
zriyansh/customgpt-widget:latestMethod 2: Docker Compose (Recommended)
Create docker-compose.yml:
version: '3.8'
services:
widget:
image: zriyansh/customgpt-widget:latest
container_name: customgpt-widget
ports:
- "8000:8000"
env_file:
- .env
restart: unless-stoppedRun:
docker-compose up -dProduction Deployment
With Nginx Reverse Proxy:
server {
listen 80;
server_name yourdomain.com;
location / {
proxy_pass http://localhost:8000;
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection 'upgrade';
proxy_set_header Host $host;
proxy_cache_bypass $http_upgrade;
}
}With SSL (Let's Encrypt):
# Install certbot
sudo apt install certbot python3-certbot-nginx
# Get certificate
sudo certbot --nginx -d yourdomain.com
# Auto-renewal
sudo certbot renew --dry-runWebsite Integration
Script Tag Embed (2 Steps)
Step 1: Add to your HTML (before </body>):
<script>
window.customGPTConfig = {
serverUrl: 'https://your-server.com', // Your backend URL
position: 'bottom-right', // bottom-right, bottom-left
theme: 'dark', // dark or light
initialMode: 'chat', // chat or voice
showBranding: true // Show "Powered by CustomGPT"
};
</script>
<script src="https://your-server.com/widget.js" defer></script>Step 2: Deploy your backend with Docker (see above)
Customization Options
window.customGPTConfig = {
// Required
serverUrl: 'https://your-server.com',
// Optional UI
position: 'bottom-right', // bottom-right, bottom-left, bottom-center
theme: 'dark', // dark, light
primaryColor: '#8b5cf6', // Brand color
initialMode: 'chat', // chat, voice
showBranding: true, // Show branding
// Optional Behavior
autoOpen: false, // Auto-open on page load
openDelay: 3000, // Delay before auto-open (ms)
greeting: 'Hi! How can I help?', // Initial greeting message
// Optional Features
enableVoiceMode: true, // Show voice mode button
enableSTT: true, // Enable speech input
enableTTS: true, // Enable audio output
enableAvatar: true // Enable 3D avatar mode
};Platform-Specific Integration
WordPress:
- Install "Insert Headers and Footers" plugin
- Paste script in footer section
- Save and clear cache
Shopify:
- Go to Online Store �� Themes �� Edit Code
- Open
theme.liquid - Add script before
</body> - Save
Wix:
- Add "Custom Code" element
- Paste script in HTML iframe
- Position the element
Next.js / React:
import { useEffect } from 'react';
export default function MyApp() {
useEffect(() => {
window.customGPTConfig = {
serverUrl: process.env.NEXT_PUBLIC_WIDGET_URL,
theme: 'dark'
};
const script = document.createElement('script');
script.src = `${process.env.NEXT_PUBLIC_WIDGET_URL}/widget.js`;
script.defer = true;
document.body.appendChild(script);
return () => document.body.removeChild(script);
}, []);
return <YourApp />;
}Working Examples
For complete integration examples and step-by-step guides, see the examples/ directory:
- Integration Guide - Complete documentation for website integration
- Floating Widget Example - Test floating chatbot interface
- Inline Embed Example - Test inline page embedding
The examples directory includes:
- Platform-specific integration instructions (WordPress, Shopify, Wix, etc.)
- Framework integration examples (Next.js, React, Vue)
- Customization options and CSS examples
- Analytics tracking setup
- Troubleshooting common issues
Avatar Mode (3D Talking Avatar)
What is Avatar Mode?
Avatar Mode replaces the particle animation interface with a realistic 3D animated avatar that provides visual feedback through facial expressions, gestures, and lip-sync animation during your conversation. The avatar creates a more human-like and engaging interaction experience.
Browser Requirements
Avatar Mode requires a modern browser with WebGL support:
** Fully Supported:**
- Chrome/Edge 90+ (Desktop & Mobile)
- Firefox 88+ (Desktop & Mobile)
- Safari 14+ (Desktop & Mobile)
- Opera 76+
System Requirements:
- WebGL 2.0 capable GPU
- Minimum 4GB RAM recommended
- Stable internet connection (avatar loads from CDN)
Check WebGL Support: Visit https://get.webgl.org/ to verify your browser supports WebGL.
Installation & Hosting
No Additional Installation Required! Avatar Mode is built-in and enabled by default.
How It Works:
- 3D Avatar Library: Loaded from CDN (
@met4citizen/[email protected]) - Avatar Models: Streamed from Ready Player Me CDN
- Zero Configuration: Works out-of-the-box with standard deployment
- No Special Hosting: Same hosting as the rest of the application (Docker or local)
The avatar system automatically:
- Detects WebGL support on startup
- Loads the TalkingHead library from CDN (one-time ~200KB)
- Streams the 3D avatar model (~2-3MB, cached by browser)
- Falls back to particle mode if WebGL is unavailable
Customization Options
Avatar Model - Configure via environment variable:
# Optional: Custom Ready Player Me avatar URL
VITE_AVATAR_GLB_URL=https://models.readyplayer.me/YOUR_AVATAR_ID.glbDefault Avatar: High-quality female avatar with optimized morphTargets for lip-sync
Performance Tuning (frontend/src/utils/avatarConfig.ts):
- Desktop: 60 FPS target
- Mobile: 30 FPS target
- Auto-detection of device tier (high/medium/low)
- Configurable load timeout and retry settings
Technical Details
Architecture:
- Library: TalkingHead v1.6.0 by @met4citizen
- 3D Models: Ready Player Me GLB format with ARKit/Oculus Visemes
- Rendering: Three.js WebGL renderer (managed by TalkingHead)
- Lip-Sync: Word-based phoneme animation with English language support
- State Management: React hooks with global method exposure
CDN Dependencies:
- TalkingHead:
https://cdn.jsdelivr.net/npm/@met4citizen/[email protected]/modules/talkinghead.mjs - Avatar Model:
https://models.readyplayer.me/*.glb - Three.js: Bundled with TalkingHead library
TTS Provider Options
OpenAI TTS (Recommended)
- High quality, natural-sounding voices
- Streaming support for low latency
- Requires OpenAI API key
- Multiple voice options: nova, alloy, echo, fable, onyx, shimmer
TTS_PROVIDER=OPENAI
OPENAI_TTS_MODEL=tts-1 # or tts-1-hd for higher quality
OPENAI_TTS_VOICE=novagTTS (Google Text-to-Speech)
- Free, no API key required
- Good quality, supports many languages
- Slight latency due to online generation
EDGETTS (Microsoft Edge TTS)
- Free, no API key required
- High quality, fast, supports many voices
- Good alternative to OpenAI TTS
TTS_PROVIDER=EDGETTS
EDGETTS_VOICE=en-US-EricNeural # or en-US-JennyNeural, en-GB-SoniaNeural, etc.ELEVENLABS
- Premium, requires API key
- Highest quality, most natural-sounding
- Get API key at elevenlabs.io
TTS_PROVIDER=ELEVENLABS
ELEVENLABS_API_KEY=your_elevenlabs_api_key
ELEVENLABS_VOICE=EXAVITQu4vr4xnSDxMaLSTREAMELEMENTS
- Free, no API key required
- Basic quality
- Good for testing
Local Development
Backend Setup
cd backend
pip install -r requirements.txt
uvicorn main:app --reload --port 8000Backend will be available at http://localhost:8000
Frontend Setup
cd frontend
yarn install
yarn devFrontend will be available at http://localhost:5173
Development Commands
Backend:
uvicorn main:app --reload- Run with hot reloadpython -m pytest- Run tests (if added)
Frontend:
yarn dev- Development server with hot reloadyarn build- Build for productionyarn lint- Run ESLintyarn preview- Preview production build
AI Model Configuration
When using standard OpenAI (not CustomGPT), you can choose from several models:
| Model | Best For | Speed | Context |
|---|---|---|---|
| gpt-4o-mini | Most use cases, voice assistants | Very Fast | 128k |
| gpt-4o | High-quality responses | Fast | 128k |
| gpt-4-turbo | Maximum quality | Medium | 128k |
| gpt-3.5-turbo | Budget option | Very Fast | 16k |
Recommended for voice assistants: gpt-4o-mini (best balance of speed, quality, and cost)
Update in .env:
AI_COMPLETION_MODEL=gpt-4o-miniSpeech-to-Text Models
OpenAI's latest speech-to-text models (Released March 2025):
| Model | Speed | Accuracy | Best For |
|---|---|---|---|
| gpt-4o-mini-transcribe | Very Fast | Excellent | Voice assistants (Default) |
| gpt-4o-transcribe | Fast | Best | Maximum accuracy, challenging audio |
| whisper-1 | Fast | Good | Whisper v2 model |
Default: gpt-4o-mini-transcribe (best balance for voice assistants)
Update in .env:
STT_MODEL=gpt-4o-mini-transcribe # or gpt-4o-transcribe, whisper-1CustomGPT Integration
This project supports CustomGPT for RAG-based AI responses using your custom knowledge base.
Note: OpenAI API key is always required for Whisper (speech-to-text). CustomGPT is only used for AI completions.
Using CustomGPT for AI Responses
Set environment variables in .env:
# Required for speech-to-text (Whisper)
OPENAI_API_KEY=your_openai_api_key
# Enable CustomGPT for AI responses
USE_CUSTOMGPT=true
CUSTOMGPT_PROJECT_ID=your_project_id_here
CUSTOMGPT_API_KEY=your_customgpt_api_key_here
CUSTOMGPT_STREAM=true # Enable streaming for faster responsesThe system will use:
- OpenAI Whisper for speech-to-text
- CustomGPT for AI completions with your RAG data
- Your selected TTS provider for text-to-speech
The CustomGPT endpoint is automatically configured as:
https://app.customgpt.ai/api/v1/projects/{YOUR_PROJECT_ID}
Using Standard OpenAI
To use standard OpenAI GPT models instead of CustomGPT:
USE_CUSTOMGPT=false # or omit this line
OPENAI_API_KEY=your_openai_api_key
AI_COMPLETION_MODEL=gpt-4o-miniConfiguration
Environment Variables
| Variable | Required | Default | Description |
|---|---|---|---|
OPENAI_API_KEY | Yes | - | Your OpenAI API key (required for STT and TTS if using OpenAI) |
STT_MODEL | No | gpt-4o-mini-transcribe | Speech-to-text model |
USE_CUSTOMGPT | No | false | Enable CustomGPT for AI completions |
CUSTOMGPT_PROJECT_ID | Conditional | - | Required if USE_CUSTOMGPT=true |
CUSTOMGPT_API_KEY | Conditional | - | Required if USE_CUSTOMGPT=true |
CUSTOMGPT_STREAM | No | true | Enable streaming for faster responses |
AI_COMPLETION_MODEL | No | gpt-3.5-turbo | Model to use (only for OpenAI, not CustomGPT) |
LANGUAGE | No | en | ISO-639-1 language code for STT/TTS |
TTS_PROVIDER | No | OPENAI | TTS provider: OPENAI, gTTS, ELEVENLABS, STREAMELEMENTS, EDGETTS |
OPENAI_TTS_MODEL | No | tts-1 | OpenAI TTS model: tts-1 (fast) or tts-1-hd (quality) |
OPENAI_TTS_VOICE | No | nova | OpenAI TTS voice: alloy, echo, fable, onyx, nova, shimmer |
EDGETTS_VOICE | No | en-US-EricNeural | Voice for Edge TTS |
ELEVENLABS_API_KEY | Conditional | - | Required if using ELEVENLABS |
ELEVENLABS_VOICE | No | EXAVITQu4vr4xnSDxMaL | Voice ID for ElevenLabs |
VITE_UI_THEME | No | dark | UI theme: dark or light |
VITE_ENABLE_VOICE_MODE | No | true | Show voice mode button |
VITE_ENABLE_STT | No | true | Show microphone button for STT |
VITE_ENABLE_TTS | No | true | Show speaker button for TTS |
VITE_AVATAR_GLB_URL | No | Default avatar | Custom Ready Player Me avatar GLB URL |
Customizing the AI Personality
Edit the system prompt in backend/ai.py:
INITIAL_PROMPT = f"You are CustomGPT Widget - a helpful assistant with a voice interface..."Architecture
Tech Stack
Backend:
- FastAPI - Web framework
- OpenAI Whisper - Speech-to-text
- OpenAI GPT / CustomGPT - AI completions
- Multiple TTS libraries - Text-to-speech
- FFmpeg - Audio processing
Frontend:
- React + TypeScript
- Vite - Build tool
- @ricky0123/vad-react - Voice activity detection
- Canvas API - Particle animations
- @met4citizen/talkinghead - 3D avatar with lip-sync
- Three.js - WebGL 3D rendering (via TalkingHead)
- Ready Player Me - Avatar 3D models
Troubleshooting
Microphone not working
- Ensure you've granted microphone permissions in your browser
- Check browser console for errors
- Try HTTPS (VAD requires secure context)
No audio response
- Check browser console for fetch errors
- Verify backend is running:
curl http://localhost:8000/ - Check backend logs for TTS errors
- Ensure FFmpeg is installed
Poor voice recognition
- Speak clearly and at a normal pace
- Reduce background noise
- Minimum speech duration is 0.4 seconds
- Check LANGUAGE env var matches your spoken language
Docker build fails
- ARM64 Mac: Ensure Docker Desktop supports ARM
- Try
docker buildx build --platform linux/amd64 -t customgpt-widget . - Check disk space:
df -h
API errors
- Verify OPENAI_API_KEY is set correctly
- Check API key has sufficient credits
- For CustomGPT, verify project ID and API key
Avatar Mode issues
Avatar not loading / stuck on loading screen:
- Check browser console for WebGL errors: Press F12 �� Console tab
- Verify WebGL support: Visit https://get.webgl.org/
- Clear browser cache and reload the page
- Check network connectivity (avatar loads from CDN)
- Look for
[Avatar]prefixed logs in console for detailed diagnostics
Avatar shows black screen or renders incorrectly:
- Update your graphics drivers to the latest version
- Try a different browser (Chrome/Firefox recommended)
- Disable browser extensions that might interfere with WebGL
- Check GPU hardware acceleration is enabled in browser settings
- Fallback: Switch to Particle mode using the mode toggle
Avatar lip-sync not working / mouth not moving:
- Check browser console for
[Avatar] speakAudiologs - Verify TTS is working properly (test in particle mode first)
- Audio format must be compatible (MP3/WAV with proper encoding)
- Check
[Avatar] Audio decodedlogs show valid duration and channels - Ensure network is not blocking CDN resources
Avatar gestures/moods not changing:
- Look for
[Avatar] setListening/setProcessing/setIdlelogs in console - Verify
hasSetMoodandhasPlayGestureare bothtruein logs - Check that avatar initialization completed successfully
- Try reloading the page to reinitialize the avatar
- Report any
[Avatar] � Failed to seterrors
Performance issues / choppy animation:
- Lower your browser window size (fewer pixels to render)
- Close other tabs/applications to free up GPU resources
- Avatar auto-adjusts to 30 FPS on mobile devices
- Check
[Avatar] TalkingHead instance createdlogs for initialization time - Consider using particle mode on low-end devices
Avatar timeout errors:
- Default timeout is 10 seconds, configurable in avatarConfig.ts
- Slow network may cause timeout during model download
- Retry will attempt up to 2 times automatically
- Check network speed: Avatar model is ~2-3MB
- CDN issues: Try again later or check jsdelivr.com status
Browser compatibility check:
// Open browser console (F12) and run:
console.log('WebGL:', !!document.createElement('canvas').getContext('webgl'));
console.log('TalkingHead loaded:', !!window.TalkingHead);Debugging tips:
- Enable verbose logging: Look for
[Avatar]prefixed console logs - Check initialization sequence: Library load �� Instance create �� Model load
- Verify global methods: Check
window.avatarSetListeningexists - Test WebGL independently: Visit https://threejs.org/examples/
- Report issues with full browser console logs
License
This project is licensed under the MIT License.
