RAG System
ChatRAG's Retrieval-Augmented Generation system provides document-based AI responses with HNSW-optimized vector search.
15-28x Faster Performance
How RAG Works
Retrieval-Augmented Generation combines the power of large language models with your specific document knowledge.
Document Upload
Upload documents (PDF, DOCX, TXT, etc.) through the interface
Processing & Chunking
LlamaCloud parses and intelligently chunks documents into semantic segments
Embedding Generation
OpenAI generates 1536-dimensional vector embeddings for each chunk
Vector Storage
Embeddings stored in Supabase with HNSW indexes for fast retrieval
Query & Retrieval
User queries converted to vectors and matched against stored embeddings
Context Injection
Relevant chunks injected into {{context}} placeholder
AI Response
LLM generates response based on retrieved context and system prompt
HNSW Vector Search
ChatRAG uses state-of-the-art HNSW indexing for lightning-fast semantic search.
Performance Comparison
| Metric | Traditional RAG (IVFFlat) | ChatRAG (HNSW) | Improvement |
|---|---|---|---|
| Single Query | 100-500ms | <50ms | 15-28x faster |
| 10 Concurrent Users | 800-2000ms | <100ms | 20x faster |
| 100k Documents | 1-3 seconds | <200ms | 15x faster |
| Accuracy | 95% | 98% | +3% |
Index Parameters
- m=64: Number of connections per layer
- ef_construction=200: Size of dynamic candidate list
- Dimensions: 1536 (OpenAI text-embedding-3-small)
RAG Configuration
ChatRAG provides 60+ configuration settings to fine-tune RAG performance.
Essential Settings
# System prompt MUST include {{context}}
RAG_SYSTEM_PROMPT=You are an AI assistant.
Context:
{{context}}
Answer based on the context above...
# Performance settings
RAG_ADAPTIVE_RETRIEVAL=true
RAG_MULTI_PASS=true
RAG_FINAL_RESULT_COUNT=25Retrieval Parameters
RAG_INITIAL_MATCH_COUNT=60
RAG_SIMILARITY_THRESHOLD=0.45
RAG_MIN_CONFIDENCE=0.7
RAG_ADJACENT_CHUNKS=true
RAG_ADJACENCY_WINDOW=2Advanced Features
RAG_QUERY_ENHANCEMENT=false
RAG_RERANKING=true
RAG_RERANKING_STRATEGY=hybrid
RAG_MMR_LAMBDA=0.85
RAG_DIVERSITY_WEIGHT=0.15
RAG_CACHE_ENABLED=truePerformance Modes
RAG_PERFORMANCE_MODE=accurate # or "fast" or "balanced"
RAG_MAX_RETRIEVAL_PASSES=2
RAG_COMPLETENESS_CONFIDENCE=0.7Supported Document Types
Portable Document Format
DOCX
Microsoft Word
TXT
Plain Text
HTML
Web Pages
RTF
Rich Text Format
EPUB
E-books
Testing & Diagnostics
ChatRAG includes diagnostic scripts to verify and troubleshoot your RAG system.
Check RAG Flow
Verify the complete RAG pipeline is working correctly
node scripts/rag/check-rag-flow.jsDecode RAG Prompt
Inspect what's stored in your system prompt configuration
node scripts/rag/decode-rag-prompt.jsTest RAG System
End-to-end testing of document retrieval
node scripts/rag/test-rag-system.jsReprocess Documents
Rebuild document index with updated settings
node scripts/rag/reprocess-documents.jsKey RAG Features
Adaptive Retrieval
Intelligent retrieval strategy that adjusts based on query complexity
Multi-Pass Search
Multiple retrieval passes for better coverage and accuracy
Adjacent Chunks
Retrieves surrounding context for better continuity
Semantic Chunking
Intelligent document splitting based on semantic boundaries
Hybrid Reranking
Combines multiple scoring methods for optimal results
Result Caching
Smart caching for improved performance on repeated queries
Best Practices
Always Include {{context}}
This placeholder is required in your system prompt for RAG to function
Use text-embedding-3-small
This embedding model offers the best balance of speed, cost, and accuracy
Start with Default Settings
The default RAG configuration is optimized for most use cases
Monitor Chunk Sizes
Default 2500 characters with 992 overlap works well for most documents
Test After Changes
Always verify RAG is working after modifying configuration
RAG Architecture Components
- Enhanced RAG Retrieval: Multi-stage document search (13KB)
- Adaptive Retrieval: Intelligent strategy selection (25KB)
- Optimized Search: HNSW vector search (11KB)
- Semantic Chunker: Smart document splitting (18KB)
- Query Enhancer: Query optimization (11KB)
- Reranker: Result scoring and ranking (13KB)
- MMR Scorer: Maximal Marginal Relevance (6KB)
- BM25 Scorer: Traditional keyword matching (5KB)