RAG System

ChatRAG's Retrieval-Augmented Generation system provides document-based AI responses with HNSW-optimized vector search.

How RAG Works

Retrieval-Augmented Generation combines the power of large language models with your specific document knowledge.

1

Document Upload

Upload documents (PDF, DOCX, TXT, etc.) through the interface

2

Processing & Chunking

LlamaCloud parses and intelligently chunks documents into semantic segments

3

Embedding Generation

OpenAI generates 1536-dimensional vector embeddings for each chunk

4

Vector Storage

Embeddings stored in Supabase with HNSW indexes for fast retrieval

5

Query & Retrieval

User queries converted to vectors and matched against stored embeddings

6

Context Injection

Relevant chunks injected into {{context}} placeholder

7

AI Response

LLM generates response based on retrieved context and system prompt

HNSW Vector Search

ChatRAG uses state-of-the-art HNSW indexing for lightning-fast semantic search.

Performance Comparison

MetricTraditional RAG (IVFFlat)ChatRAG (HNSW)Improvement
Single Query100-500ms<50ms15-28x faster
10 Concurrent Users800-2000ms<100ms20x faster
100k Documents1-3 seconds<200ms15x faster
Accuracy95%98%+3%

RAG Configuration

ChatRAG provides 60+ configuration settings to fine-tune RAG performance.

Essential Settings

# System prompt MUST include {{context}}
RAG_SYSTEM_PROMPT=You are an AI assistant.

Context:
{{context}}

Answer based on the context above...

# Performance settings
RAG_ADAPTIVE_RETRIEVAL=true
RAG_MULTI_PASS=true
RAG_FINAL_RESULT_COUNT=25

Retrieval Parameters

RAG_INITIAL_MATCH_COUNT=60
RAG_SIMILARITY_THRESHOLD=0.45
RAG_MIN_CONFIDENCE=0.7
RAG_ADJACENT_CHUNKS=true
RAG_ADJACENCY_WINDOW=2

Advanced Features

RAG_QUERY_ENHANCEMENT=false
RAG_RERANKING=true
RAG_RERANKING_STRATEGY=hybrid
RAG_MMR_LAMBDA=0.85
RAG_DIVERSITY_WEIGHT=0.15
RAG_CACHE_ENABLED=true

Performance Modes

RAG_PERFORMANCE_MODE=accurate  # or "fast" or "balanced"
RAG_MAX_RETRIEVAL_PASSES=2
RAG_COMPLETENESS_CONFIDENCE=0.7

Supported Document Types

PDF

Portable Document Format

DOCX

Microsoft Word

TXT

Plain Text

HTML

Web Pages

RTF

Rich Text Format

EPUB

E-books

Testing & Diagnostics

ChatRAG includes diagnostic scripts to verify and troubleshoot your RAG system.

Check RAG Flow

Verify the complete RAG pipeline is working correctly

node scripts/rag/check-rag-flow.js

Decode RAG Prompt

Inspect what's stored in your system prompt configuration

node scripts/rag/decode-rag-prompt.js

Test RAG System

End-to-end testing of document retrieval

node scripts/rag/test-rag-system.js

Reprocess Documents

Rebuild document index with updated settings

node scripts/rag/reprocess-documents.js

Key RAG Features

Adaptive Retrieval

Intelligent retrieval strategy that adjusts based on query complexity

Multi-Pass Search

Multiple retrieval passes for better coverage and accuracy

Adjacent Chunks

Retrieves surrounding context for better continuity

Semantic Chunking

Intelligent document splitting based on semantic boundaries

Hybrid Reranking

Combines multiple scoring methods for optimal results

Result Caching

Smart caching for improved performance on repeated queries

Best Practices

Always Include {{context}}

This placeholder is required in your system prompt for RAG to function

Use text-embedding-3-small

This embedding model offers the best balance of speed, cost, and accuracy

Start with Default Settings

The default RAG configuration is optimized for most use cases

Monitor Chunk Sizes

Default 2500 characters with 992 overlap works well for most documents

Test After Changes

Always verify RAG is working after modifying configuration