Image RAG

Upload images to your knowledge base, have them captioned by AI, and retrieve them via semantic search when users ask relevant questions.

How It Works

1. Upload

Upload images via Config UI with optional context description

2. Caption

GPT-4o Vision analyzes and generates a detailed description

3. Embed

Caption embedded as vector for semantic search

4. Display

Image appears in chat when query matches

Architecture

┌─────────────────────────────────────────────────────────────────┐
│                      IMAGE RAG DATA FLOW                         │
├─────────────────────────────────────────────────────────────────┤
│                                                                  │
│  UPLOAD PHASE                                                    │
│  ┌──────────────┐    ┌──────────────────┐    ┌────────────────┐ │
│  │ Config UI    │───▶│ /api/upload      │───▶│ GPT-4o Vision  │ │
│  │ Image Upload │    │ (process image)  │    │ (caption)      │ │
│  └──────────────┘    └──────────────────┘    └────────────────┘ │
│                                                      │           │
│                                                      ▼           │
│  ┌────────────────────────────────────────────────────────────┐ │
│  │ document_chunks table                                       │ │
│  │   - content: AI-generated caption                          │ │
│  │   - embedding: vector from caption                         │ │
│  │   - metadata.image_url: Supabase storage URL               │ │
│  └────────────────────────────────────────────────────────────┘ │
│                                                                  │
│  RETRIEVAL PHASE                                                 │
│  ┌──────────────┐    ┌──────────────────┐    ┌────────────────┐ │
│  │ User Query   │───▶│ match_documents  │───▶│ Stream to UI   │ │
│  │ "Show me X"  │    │ (vector search)  │    │ (source_images)│ │
│  └──────────────┘    └──────────────────┘    └────────────────┘ │
│                                                                  │
└─────────────────────────────────────────────────────────────────┘

Uploading Images

Via Config Dashboard

  1. Run npm run config and open http://localhost:3333
  2. Navigate to the "Image RAG Uploads" tab
  3. Click "Choose File" and select your image (PNG, JPG, WEBP supported)
  4. (Optional) Add a context description to improve retrieval accuracy
  5. Click "Upload for RAG"

Context Descriptions

Adding context helps the AI generate better captions and improves retrieval:

Good examples:
  "Company logo used in marketing materials"
  "Product screenshot showing the dashboard"
  "Team photo from Q4 2024 retreat"
  "Architecture diagram of the payment system"

Bad examples:
  "image1.png"
  "screenshot"
  (blank)

Retrieval Behavior

Semantic Matching

Images are retrieved based on semantic similarity between the user's query and the AI-generated caption. The query "Show me the company logo" will match an image captioned "A blue and white logo showing the ChatRAG brand identity."

Display Position

Retrieved images appear above the AI's text response, making them immediately visible. Images are left-aligned and sized appropriately (single images are larger, multiple images appear in a grid).

Images Are NOT Cited

Unlike text documents, images are not cited in the AI's response sources. They are displayed visually but filtered from the LLM context to prevent the AI from citing filenames like "logo.png" in its response.

Voice Integration

Image RAG works seamlessly with the Voice Agent. Users can speak queries like:

"Show me the ChatRAG logo"

Retrieves and displays the logo image

"What does the dashboard look like?"

Shows relevant dashboard screenshots

Managing Uploaded Images

View Uploaded Images

The Config UI shows a visual grid of all uploaded images with hover previews. Images are displayed in a 2-row horizontal scrolling layout.

Delete Images

Hover over any image to see the delete button. Clicking delete removes:

  • The document record from documents table
  • Associated chunks from document_chunks table
  • The file from Supabase Storage

Technical Requirements

RequirementDetails
OpenAI API KeyRequired for GPT-4o Vision captioning
Supabase Storagechat-images bucket must exist
Database Schemadocument_chunks needs metadata JSONB column
Supported FormatsPNG, JPG, JPEG, WebP, GIF

Troubleshooting

Image retrieved but not displayed

  • Check browser console for [Stream] Received source_images event
  • Verify metadata.image_url exists in the chunk record
  • Ensure the image URL is publicly accessible

Wrong image retrieved

  • Add more specific context descriptions when uploading
  • Use distinct keywords in captions (e.g., "logo" vs "team photo")
  • Image similarity threshold is 0.25 - very low scores may retrieve wrong images

Image appears during stream but disappears

  • This indicates a frontend issue with message preservation
  • Check that source_images is preserved in [Manual onFinish] logic

Upload succeeds but image doesn't appear in grid

  • Wait a few seconds - processing happens in background
  • Click the "Refresh" button in the Config UI
  • Check server logs for [Upload] Image RAG processing completed

Key Implementation Files

FilePurpose
src/lib/document-processor.tsprocessImageDocument() - Vision captioning & embedding
src/app/api/upload/route.tsTriggers RAG processing for uploaded images
src/app/api/chat/route.tsInjects source_images into response stream
src/components/ui/source-images-grid.tsxRenders retrieved images in chat
scripts/config-ui/index.htmlImage upload UI in Config Dashboard