Document Processing

Upload and manage documents that power ChatRAG's retrieval-augmented generation system.

Document Management Methods

In-App Document Dashboard

Recommended for end users

  • Access at http://localhost:3000
  • User-friendly upload interface
  • Row Level Security (users see only their docs)
  • Requires authentication

Configuration:

NEXT_PUBLIC_HIDE_DOCUMENT_DASHBOARD=false
NEXT_PUBLIC_READ_ONLY_DOCUMENTS_ENABLED=false

Config UI Admin Tools

For admin-oriented workflows

  • Access at http://localhost:3333
  • Bulk operations and reprocessing
  • Advanced configuration controls
  • Requires SUPABASE_SERVICE_ROLE_KEY

Access via:

npm run config

Document Upload Process

What happens when you upload a document:

1

File Upload

Document uploaded to Supabase Storage with secure access policies

2

LlamaCloud Parsing

LlamaCloud extracts text, tables, images, and metadata from the document

3

Intelligent Chunking

Content split into semantic chunks (default: 2500 chars with 992 overlap)

4

Embedding Generation

OpenAI generates 1536-dimensional embeddings for each chunk

5

Database Storage

Chunks stored in document_chunks table with HNSW vector index

6

Ready for Retrieval

Document immediately available for semantic search and RAG

Supported Document Formats

PDF

Portable Document Format with OCR support

DOCX

Microsoft Word documents

TXT

Plain text files

HTML

Web pages and HTML documents

RTF

Rich Text Format

EPUB

E-book format

LlamaCloud Configuration

Configure document parsing behavior through environment variables:

Basic Configuration

LLAMA_CLOUD_API_KEY=llx-...
LLAMACLOUD_PARSING_MODE=balanced  # or "fast" or "premium"
LLAMACLOUD_CHUNK_STRATEGY=sentence
LLAMACLOUD_CHUNK_SIZE=2500
LLAMACLOUD_CHUNK_OVERLAP=992
LLAMACLOUD_MULTIMODAL_PARSING=true

Advanced Parsing

LLAMACLOUD_PARSE_MODE=parse_page_with_agent
LLAMACLOUD_PARSE_MODEL=anthropic-sonnet-4.0
LLAMACLOUD_HIGH_RES_OCR=true
LLAMACLOUD_ADAPTIVE_LONG_TABLE=true
LLAMACLOUD_OUTLINED_TABLE_EXTRACTION=true
LLAMACLOUD_OUTPUT_TABLES_AS_HTML=true

Admin Features

Admin Access Control

Designate admin users who can manage documents for all users:

  1. Open Config UI → Admin section
  2. Enter user's email address
  3. Email must match existing Supabase user
  4. Requires SUPABASE_SERVICE_ROLE_KEY

Document Reprocessing

Rebuild document index with updated settings:

node scripts/rag/reprocess-documents.js

Useful after changing chunking settings or upgrading embedding models

Read-Only Mode

Prevent users from uploading documents (admin-only dataset):

NEXT_PUBLIC_READ_ONLY_DOCUMENTS_ENABLED=true

Verification Steps

Verify your document processing is working correctly:

1

Upload a Test Document

Choose a PDF or DOCX with known content you can query

2

Wait for Processing

Status will change from "Processing" to "Completed"

3

Ask About Document Content

Query a specific fact from your uploaded document

4

Verify AI Response

AI should reference uploaded content in its response

Storage & Security

Storage Buckets

Documents stored in Supabase Storage with automatic bucket creation:

  • • Secure file storage
  • • Automatic cleanup on deletion
  • • CDN delivery for fast access

Row Level Security (RLS)

Multi-tenant isolation ensures users only see their documents:

  • • User-based access control
  • • Automatic policy enforcement
  • • Admin override capability