AI Models

ChatRAG supports 100+ AI models from multiple providers with reasoning capabilities, multi-modal generation, and flexible configuration.

Multiple Providers Supported

Access models from OpenAI, Anthropic, Google, Meta, and more through OpenRouter or direct API integrations.

Model Providers

OpenRouter (Recommended)

100+ Models

Unified API for accessing models from multiple providers with competitive pricing

OpenAI

GPT-4, GPT-4o, o1, o3

Anthropic

Claude 3.5, 4.1 Opus

Google

Gemini 2.5 Flash, Thinking

Direct Provider APIs

Connect directly to individual providers for specific features

OpenAI: Required for embeddings, optional for chat models
Anthropic: Direct Claude API access
Google: Gemini models

Pre-configured Models

ChatRAG comes with these models ready to use:

GPT-4.1 Mini

FastCost-effective

Ideal for general chat and quick responses

Claude Sonnet 4.5

🧠 ReasoningDefault

Extended thinking, excellent for complex tasks

Gemini 2.5 Flash

Lightning

Ultra-fast responses, large context window

Llama 4 Maverick

🔓 Open Source

Open source, privacy-focused

Venice: Uncensored

🎁 Free🔓 Open

Uncensored model, free tier available

Reasoning / Thinking Models

Advanced models that use extended thinking for complex problem-solving:

OpenAI o1/o3 Series

Reasoning through effort levels (low, medium, high)

NEXT_PUBLIC_REASONING_ENABLED=true
NEXT_PUBLIC_DEFAULT_REASONING_EFFORT=medium

Claude 3.7+ Extended Thinking

Token-based reasoning (up to 32k reasoning tokens)

NEXT_PUBLIC_MAX_REASONING_TOKENS=8000
NEXT_PUBLIC_SHOW_REASONING_BY_DEFAULT=false

DeepSeek R1

Dual method (effort + tokens)

Gemini Thinking

Token-based reasoning with configurable limits

When to Use Reasoning Models

Complex problem-solving and analysis
Mathematical or logical reasoning
Multi-step planning and strategy
Code debugging and optimization

Adding Models

Add new models through the Config UI or manually:

Via Config UI (Recommended)

Open npm run config
Navigate to Models section
Click "Fetch Models" to get latest from OpenRouter
Or manually add model with ID and display name
Save configuration
Restart dev server

Manual Configuration

Important: Sync 5 Locations

Models must be synchronized across 5 locations:

.env.local
scripts/init-env.js
src/lib/env.ts
scripts/config-server.js
scripts/config-ui/index.html (3 fallback arrays)

Model schema:

{
  "id": "openai/gpt-4o",
  "displayName": "GPT-4o",
  "isFree": false,
  "isOpenSource": false,
  "supportsReasoning": false,
  "reasoningMethod": "none",
  "contextLength": 128000,
  "description": "Latest GPT-4 model"
}

Model Selection Guide

For General Chat

Use GPT-4o-mini or Claude Sonnet 4.5 for balanced performance and cost

For RAG Applications

Use Claude Sonnet 4.5 for best context understanding and citation accuracy

For WhatsApp

Use GPT-4o-mini or Gemini Flash for fast, concise mobile responses

For Complex Tasks

Use o1, o3, or Claude with extended thinking for reasoning

For Cost Optimization

Use free tier models like Venice Uncensored or Llama 4 Maverick

Model Configuration

Default Models

# Set default model for chat
NEXT_PUBLIC_DEFAULT_MODEL=anthropic/claude-sonnet-4.5

# WhatsApp specific
WHATSAPP_DEFAULT_MODEL=openai/gpt-4o-mini

# Embed widget
NEXT_PUBLIC_EMBED_MODEL=openai/gpt-4o-mini

Temperature & Parameters

Adjust model creativity and randomness (typically set per-request in UI)

Model Icons in UI

Models display these indicators:

🧠 = Supports reasoning/thinking (supportsReasoning: true)
🎁 = Free tier available (isFree: true)
🔓 = Open source model (isOpenSource: true)

← Previous: Document Processing Next: Authentication →