Document Processing
Upload and manage documents that power ChatRAG's retrieval-augmented generation system.
Two Ways to Manage Documents
- In-App Dashboard: For end users at http://localhost:3000
- Config UI: For admin workflows at http://localhost:3333
Document Management Methods
In-App Document Dashboard
Recommended for end users
- Access at http://localhost:3000
- User-friendly upload interface
- Row Level Security (users see only their docs)
- Requires authentication
Configuration:
NEXT_PUBLIC_HIDE_DOCUMENT_DASHBOARD=false
NEXT_PUBLIC_READ_ONLY_DOCUMENTS_ENABLED=falseConfig UI Admin Tools
For admin-oriented workflows
- Access at http://localhost:3333
- Bulk operations and reprocessing
- Advanced configuration controls
- Requires SUPABASE_SERVICE_ROLE_KEY
Access via:
npm run configDocument Upload Process
What happens when you upload a document:
File Upload
Document uploaded to Supabase Storage with secure access policies
LlamaCloud Parsing
LlamaCloud extracts text, tables, images, and metadata from the document
Intelligent Chunking
Content split into semantic chunks (default: 2500 chars with 992 overlap)
Embedding Generation
OpenAI generates 1536-dimensional embeddings for each chunk
Database Storage
Chunks stored in document_chunks table with HNSW vector index
Ready for Retrieval
Document immediately available for semantic search and RAG
Supported Document Formats
Portable Document Format with OCR support
DOCX
Microsoft Word documents
TXT
Plain text files
HTML
Web pages and HTML documents
RTF
Rich Text Format
EPUB
E-book format
LlamaCloud Configuration
Configure document parsing behavior through environment variables:
Basic Configuration
LLAMA_CLOUD_API_KEY=llx-...
LLAMACLOUD_PARSING_MODE=balanced # or "fast" or "premium"
LLAMACLOUD_CHUNK_STRATEGY=sentence
LLAMACLOUD_CHUNK_SIZE=2500
LLAMACLOUD_CHUNK_OVERLAP=992
LLAMACLOUD_MULTIMODAL_PARSING=trueAdvanced Parsing
LLAMACLOUD_PARSE_MODE=parse_page_with_agent
LLAMACLOUD_PARSE_MODEL=anthropic-sonnet-4.0
LLAMACLOUD_HIGH_RES_OCR=true
LLAMACLOUD_ADAPTIVE_LONG_TABLE=true
LLAMACLOUD_OUTLINED_TABLE_EXTRACTION=true
LLAMACLOUD_OUTPUT_TABLES_AS_HTML=trueParsing Modes
- Fast: Quick processing, good for simple documents
- Balanced: Recommended for most use cases
- Premium: Maximum accuracy, slower processing
Admin Features
Admin Access Control
Designate admin users who can manage documents for all users:
- Open Config UI → Admin section
- Enter user's email address
- Email must match existing Supabase user
- Requires SUPABASE_SERVICE_ROLE_KEY
Document Reprocessing
Rebuild document index with updated settings:
node scripts/rag/reprocess-documents.jsUseful after changing chunking settings or upgrading embedding models
Read-Only Mode
Prevent users from uploading documents (admin-only dataset):
NEXT_PUBLIC_READ_ONLY_DOCUMENTS_ENABLED=trueVerification Steps
Verify your document processing is working correctly:
Upload a Test Document
Choose a PDF or DOCX with known content you can query
Wait for Processing
Status will change from "Processing" to "Completed"
Ask About Document Content
Query a specific fact from your uploaded document
Verify AI Response
AI should reference uploaded content in its response
{{context}}Storage & Security
Storage Buckets
Documents stored in Supabase Storage with automatic bucket creation:
- • Secure file storage
- • Automatic cleanup on deletion
- • CDN delivery for fast access
Row Level Security (RLS)
Multi-tenant isolation ensures users only see their documents:
- • User-based access control
- • Automatic policy enforcement
- • Admin override capability
Document Processing Pipeline
ChatRAG's document processing includes:
- Document Processor: LlamaCloud integration (15KB)
- Semantic Chunker: Intelligent splitting (18KB)
- Upload Utils: File handling and validation
- Storage Integration: Supabase Storage with RLS
- Database Tables: documents, document_chunks with HNSW indexes