Voice Integration

Deploy ChatRAG's real-time voice agent for conversational AI powered by your RAG knowledge base. Built with LiveKit, AssemblyAI (STT), and Resemble AI (TTS).

Included in ChatRAG Monorepo

The voice agent is included in your ChatRAG fork at voice-agent/. Deploy it alongside your main app for a complete voice-enabled AI assistant.

Architecture Overview

┌─────────────────────────────────────────────────────────────┐
│                      User's Browser                         │
│  ┌─────────────────────────────────────────────────────┐   │
│  │   ChatRAG Next.js App (Vercel)                      │   │
│  │   - Voice overlay UI                                │   │
│  │   - LiveKit room connection                         │   │
│  └────────────────────────┬────────────────────────────┘   │
└───────────────────────────│─────────────────────────────────┘
                            │ WebRTC
                            ▼
┌─────────────────────────────────────────────────────────────┐
│                    LiveKit Cloud                             │
│   - Real-time audio streaming                               │
│   - Room management                                         │
└────────────────────────────┬────────────────────────────────┘
                             │
                             ▼
┌─────────────────────────────────────────────────────────────┐
│          Voice Agent (Fly.io / Railway / Render)            │
│  ┌──────────────┐ ┌──────────────┐ ┌──────────────────┐    │
│  │ AssemblyAI   │ │   LLM +      │ │   Resemble AI    │    │
│  │ (STT)        │→│   RAG API    │→│   (TTS)          │    │
│  └──────────────┘ └──────────────┘ └──────────────────┘    │
└─────────────────────────────────────────────────────────────┘

Prerequisites

1. LiveKit Cloud Account

You'll need: LIVEKIT_URL, LIVEKIT_API_KEY, LIVEKIT_API_SECRET

2. AssemblyAI API Key

You'll need: ASSEMBLYAI_API_KEY

3. Resemble AI Account

You'll need: RESEMBLE_API_KEY, RESEMBLE_VOICE_ID

4. LLM Provider (OpenAI, Groq, or OpenRouter)

The voice agent uses the same LLM providers as ChatRAG. Configure one of: OPENAI_API_KEY, GROQ_API_KEY, or OPENROUTER_API_KEY

Local Development

Step 1: Configure Environment

cd voice-agent
cp .env.example .env
# Edit .env with your API keys

Step 2: Install Dependencies

npm install

Step 3: Start the Agent

npm run dev

The agent will connect to LiveKit and wait for users to join voice rooms.

Step 4: Configure ChatRAG

In your ChatRAG .env.local:

NEXT_PUBLIC_LIVEKIT_URL=wss://your-project.livekit.cloud
LIVEKIT_API_KEY=your_api_key
LIVEKIT_API_SECRET=your_api_secret
NEXT_PUBLIC_VOICE_ENABLED=true

Deploying to Fly.io

Fly.io is recommended for deploying the voice agent due to its edge computing, persistent connections, and cost-effective pricing (~$3-4/month).

📘 Complete Fly.io Deployment Guide

We have a comprehensive guide covering Dockerfile setup, fly.toml configuration, environment secrets, troubleshooting, and cost breakdown.

View Full Fly.io Guide →

Quick Start

cd voice-agent
fly auth login
fly apps create --name your-app-name
fly secrets set LIVEKIT_URL=... LIVEKIT_API_KEY=... ...
fly deploy

Deploying to Railway

Railway offers simple monorepo deployments with automatic builds.

Step 1: Connect Repository

Go to railway.app and create a new project
Connect your ChatRAG GitHub repository
Select "Deploy from a subdirectory"

Step 2: Configure Service

Create voice-agent/railway.toml:

[build]
builder = "nixpacks"

[deploy]
startCommand = "npm start"
healthcheckPath = "/health"
restartPolicyType = "always"

Step 3: Set Root Directory

In Railway dashboard → Service Settings → Set Root Directory to voice-agent

Step 4: Add Environment Variables

In Railway dashboard → Variables, add all the required environment variables (same as Fly.io secrets above).

Deploying to Render

Render supports monorepo deployments with YAML configuration.

Option A: render.yaml (Infrastructure as Code)

Add to your repository root:

services:
  - type: web
    name: chatrag-voice-agent
    env: node
    rootDir: voice-agent
    buildCommand: npm install && npm run build
    startCommand: npm start
    envVars:
      - key: NODE_ENV
        value: production
      - key: LIVEKIT_URL
        sync: false
      - key: LIVEKIT_API_KEY
        sync: false
      - key: LIVEKIT_API_SECRET
        sync: false
      - key: ASSEMBLYAI_API_KEY
        sync: false
      - key: RESEMBLE_API_KEY
        sync: false
      - key: RESEMBLE_VOICE_ID
        sync: false
      - key: OPENAI_API_KEY
        sync: false

Option B: Manual Setup

Go to render.com → New → Web Service
Connect your ChatRAG repository
Set Root Directory to voice-agent
Build Command: npm install && npm run build
Start Command: npm start
Add environment variables in the dashboard

Environment Variables Reference

Variable	Required	Description
`LIVEKIT_URL`	✓	WebSocket URL from LiveKit Cloud
`LIVEKIT_API_KEY`	✓	API key from LiveKit Cloud
`LIVEKIT_API_SECRET`	✓	API secret from LiveKit Cloud
`ASSEMBLYAI_API_KEY`	✓	For real-time speech-to-text
`RESEMBLE_API_KEY`	✓	For streaming text-to-speech
`RESEMBLE_VOICE_ID`	✓	Voice ID from Resemble AI
`CHATRAG_INTERNAL_API_URL`	✓	Your ChatRAG RAG API endpoint
`CHATRAG_INTERNAL_API_KEY`	✓	Internal API key for RAG access
`VOICE_AGENT_MODEL`		LLM model (default: gpt-4o)
`VOICE_AGENT_LLM_PROVIDER`		openai, groq, or openrouter

Troubleshooting

Agent not connecting to LiveKit

Verify LIVEKIT_URL, LIVEKIT_API_KEY, LIVEKIT_API_SECRET are correct
Check LiveKit Cloud dashboard for connection logs
Ensure the agent machine has outbound WebSocket access

No audio response from agent

Verify RESEMBLE_API_KEY and RESEMBLE_VOICE_ID are set
Check Resemble AI dashboard for API usage/errors
Ensure CHATRAG_INTERNAL_API_URL is accessible from the agent

RAG context not working

Verify CHATRAG_INTERNAL_API_URL points to your deployed ChatRAG
Check CHATRAG_INTERNAL_API_KEY matches your ChatRAG config
Test the RAG API endpoint directly with curl

Voice Agent Features

Real-time RAG: Access your knowledge base during voice conversations
Image RAG: Display relevant images from your knowledge base
Streaming TTS: Natural, low-latency voice responses
Multi-language STT: Support for English, Spanish, French, and more
Visual feedback: Animated orb and text transcript in the UI

← Previous: MCP Integration Next: WhatsApp Integration →