Image RAG
Upload images to your knowledge base, have them captioned by AI, and retrieve them via semantic search when users ask relevant questions.
Visual Knowledge Base
How It Works
1. Upload
Upload images via Config UI with optional context description
2. Caption
GPT-4o Vision analyzes and generates a detailed description
3. Embed
Caption embedded as vector for semantic search
4. Display
Image appears in chat when query matches
Architecture
┌─────────────────────────────────────────────────────────────────┐ │ IMAGE RAG DATA FLOW │ ├─────────────────────────────────────────────────────────────────┤ │ │ │ UPLOAD PHASE │ │ ┌──────────────┐ ┌──────────────────┐ ┌────────────────┐ │ │ │ Config UI │───▶│ /api/upload │───▶│ GPT-4o Vision │ │ │ │ Image Upload │ │ (process image) │ │ (caption) │ │ │ └──────────────┘ └──────────────────┘ └────────────────┘ │ │ │ │ │ ▼ │ │ ┌────────────────────────────────────────────────────────────┐ │ │ │ document_chunks table │ │ │ │ - content: AI-generated caption │ │ │ │ - embedding: vector from caption │ │ │ │ - metadata.image_url: Supabase storage URL │ │ │ └────────────────────────────────────────────────────────────┘ │ │ │ │ RETRIEVAL PHASE │ │ ┌──────────────┐ ┌──────────────────┐ ┌────────────────┐ │ │ │ User Query │───▶│ match_documents │───▶│ Stream to UI │ │ │ │ "Show me X" │ │ (vector search) │ │ (source_images)│ │ │ └──────────────┘ └──────────────────┘ └────────────────┘ │ │ │ └─────────────────────────────────────────────────────────────────┘
Uploading Images
Via Config Dashboard
- Run
npm run configand openhttp://localhost:3333 - Navigate to the "Image RAG Uploads" tab
- Click "Choose File" and select your image (PNG, JPG, WEBP supported)
- (Optional) Add a context description to improve retrieval accuracy
- Click "Upload for RAG"
Context Descriptions
Adding context helps the AI generate better captions and improves retrieval:
Good examples:
"Company logo used in marketing materials"
"Product screenshot showing the dashboard"
"Team photo from Q4 2024 retreat"
"Architecture diagram of the payment system"
Bad examples:
"image1.png"
"screenshot"
(blank)Pro Tip: Batch Uploads
Retrieval Behavior
Semantic Matching
Images are retrieved based on semantic similarity between the user's query and the AI-generated caption. The query "Show me the company logo" will match an image captioned "A blue and white logo showing the ChatRAG brand identity."
Display Position
Retrieved images appear above the AI's text response, making them immediately visible. Images are left-aligned and sized appropriately (single images are larger, multiple images appear in a grid).
Images Are NOT Cited
Unlike text documents, images are not cited in the AI's response sources. They are displayed visually but filtered from the LLM context to prevent the AI from citing filenames like "logo.png" in its response.
Voice Integration
Image RAG works seamlessly with the Voice Agent. Users can speak queries like:
"Show me the ChatRAG logo"
Retrieves and displays the logo image
"What does the dashboard look like?"
Shows relevant dashboard screenshots
Managing Uploaded Images
View Uploaded Images
The Config UI shows a visual grid of all uploaded images with hover previews. Images are displayed in a 2-row horizontal scrolling layout.
Delete Images
Hover over any image to see the delete button. Clicking delete removes:
- The document record from
documentstable - Associated chunks from
document_chunkstable - The file from Supabase Storage
Technical Requirements
| Requirement | Details |
|---|---|
| OpenAI API Key | Required for GPT-4o Vision captioning |
| Supabase Storage | chat-images bucket must exist |
| Database Schema | document_chunks needs metadata JSONB column |
| Supported Formats | PNG, JPG, JPEG, WebP, GIF |
Troubleshooting
Image retrieved but not displayed
- Check browser console for
[Stream] Received source_images event - Verify
metadata.image_urlexists in the chunk record - Ensure the image URL is publicly accessible
Wrong image retrieved
- Add more specific context descriptions when uploading
- Use distinct keywords in captions (e.g., "logo" vs "team photo")
- Image similarity threshold is 0.25 - very low scores may retrieve wrong images
Image appears during stream but disappears
- This indicates a frontend issue with message preservation
- Check that
source_imagesis preserved in[Manual onFinish]logic
Upload succeeds but image doesn't appear in grid
- Wait a few seconds - processing happens in background
- Click the "Refresh" button in the Config UI
- Check server logs for
[Upload] Image RAG processing completed
Key Implementation Files
| File | Purpose |
|---|---|
src/lib/document-processor.ts | processImageDocument() - Vision captioning & embedding |
src/app/api/upload/route.ts | Triggers RAG processing for uploaded images |
src/app/api/chat/route.ts | Injects source_images into response stream |
src/components/ui/source-images-grid.tsx | Renders retrieved images in chat |
scripts/config-ui/index.html | Image upload UI in Config Dashboard |
Image RAG Features
- AI Vision Captioning: GPT-4o generates searchable descriptions
- Semantic Search: Find images by meaning, not just keywords
- Voice Compatible: Works with Voice Agent queries
- Left-Aligned Display: Images appear above text responses
- Chat History: Images persist in saved conversations