RAG System

ChatRAG's Retrieval-Augmented Generation system provides document-based AI responses with HNSW-optimized vector search.

15-28x Faster Performance

ChatRAG uses HNSW (Hierarchical Navigable Small World) vector indexes for semantic search, providing 15-28x faster performance compared to traditional IVFFlat indexes.

How RAG Works

Retrieval-Augmented Generation combines the power of large language models with your specific document knowledge.

Document Upload

Upload documents (PDF, DOCX, TXT, etc.) through the interface

Processing & Chunking

LlamaCloud parses and intelligently chunks documents into semantic segments

Embedding Generation

OpenAI generates 1536-dimensional vector embeddings for each chunk

Vector Storage

Embeddings stored in Supabase with HNSW indexes for fast retrieval

Query & Retrieval

User queries converted to vectors and matched against stored embeddings

Context Injection

Relevant chunks injected into {{context}} placeholder

AI Response

LLM generates response based on retrieved context and system prompt

HNSW Vector Search

ChatRAG uses state-of-the-art HNSW indexing for lightning-fast semantic search.

Performance Comparison

Metric	Traditional RAG (IVFFlat)	ChatRAG (HNSW)	Improvement
Single Query	100-500ms	<50ms	15-28x faster
10 Concurrent Users	800-2000ms	<100ms	20x faster
100k Documents	1-3 seconds	<200ms	15x faster
Accuracy	95%	98%	+3%

Index Parameters

m=64: Number of connections per layer
ef_construction=200: Size of dynamic candidate list
Dimensions: 1536 (OpenAI text-embedding-3-small)

RAG Configuration

ChatRAG provides 60+ configuration settings to fine-tune RAG performance.

Essential Settings

# System prompt MUST include {{context}}
RAG_SYSTEM_PROMPT=You are an AI assistant.

Context:
{{context}}

Answer based on the context above...

# Performance settings
RAG_ADAPTIVE_RETRIEVAL=true
RAG_MULTI_PASS=true
RAG_FINAL_RESULT_COUNT=25

Retrieval Parameters

RAG_INITIAL_MATCH_COUNT=60
RAG_SIMILARITY_THRESHOLD=0.45
RAG_MIN_CONFIDENCE=0.7
RAG_ADJACENT_CHUNKS=true
RAG_ADJACENCY_WINDOW=2

Advanced Features

RAG_QUERY_ENHANCEMENT=false
RAG_RERANKING=true
RAG_RERANKING_STRATEGY=hybrid
RAG_MMR_LAMBDA=0.85
RAG_DIVERSITY_WEIGHT=0.15
RAG_CACHE_ENABLED=true

Performance Modes

RAG_PERFORMANCE_MODE=accurate  # or "fast" or "balanced"
RAG_MAX_RETRIEVAL_PASSES=2
RAG_COMPLETENESS_CONFIDENCE=0.7

Supported Document Types

PDF

Portable Document Format

DOCX

Microsoft Word

TXT

Plain Text

HTML

Web Pages

RTF

Rich Text Format

EPUB

E-books

Testing & Diagnostics

ChatRAG includes diagnostic scripts to verify and troubleshoot your RAG system.

Check RAG Flow

Verify the complete RAG pipeline is working correctly

node scripts/rag/check-rag-flow.js

Decode RAG Prompt

Inspect what's stored in your system prompt configuration

node scripts/rag/decode-rag-prompt.js

Test RAG System

End-to-end testing of document retrieval

node scripts/rag/test-rag-system.js

Reprocess Documents

Rebuild document index with updated settings

node scripts/rag/reprocess-documents.js

Key RAG Features

Adaptive Retrieval

Intelligent retrieval strategy that adjusts based on query complexity

Multi-Pass Search

Multiple retrieval passes for better coverage and accuracy

Adjacent Chunks

Retrieves surrounding context for better continuity

Semantic Chunking

Intelligent document splitting based on semantic boundaries

Hybrid Reranking

Combines multiple scoring methods for optimal results

Result Caching

Smart caching for improved performance on repeated queries

Best Practices

Always Include {{context}}

This placeholder is required in your system prompt for RAG to function

Use text-embedding-3-small

This embedding model offers the best balance of speed, cost, and accuracy

Start with Default Settings

The default RAG configuration is optimized for most use cases

Monitor Chunk Sizes

Default 2500 characters with 992 overlap works well for most documents

Test After Changes

Always verify RAG is working after modifying configuration

RAG Architecture Components

Enhanced RAG Retrieval: Multi-stage document search (13KB)
Adaptive Retrieval: Intelligent strategy selection (25KB)
Optimized Search: HNSW vector search (11KB)
Semantic Chunker: Smart document splitting (18KB)
Query Enhancer: Query optimization (11KB)
Reranker: Result scoring and ranking (13KB)
MMR Scorer: Maximal Marginal Relevance (6KB)
BM25 Scorer: Traditional keyword matching (5KB)

← Previous: System Prompt Next: Document Processing →