AI Models

ChatRAG supports 100+ AI models from multiple providers with reasoning capabilities, multi-modal generation, and flexible configuration.

Model Providers

OpenRouter (Recommended)

100+ Models

Unified API for accessing models from multiple providers with competitive pricing

OpenAI

GPT-4, GPT-4o, o1, o3

Anthropic

Claude 3.5, 4.1 Opus

Google

Gemini 2.5 Flash, Thinking

Meta

Llama 4 Maverick

Direct Provider APIs

Connect directly to individual providers for specific features

  • OpenAI: Required for embeddings, optional for chat models
  • Anthropic: Direct Claude API access
  • Google: Gemini models

Pre-configured Models

ChatRAG comes with these models ready to use:

GPT-4.1 Mini

FastCost-effective

Ideal for general chat and quick responses

Claude Sonnet 4.5

🧠 ReasoningDefault

Extended thinking, excellent for complex tasks

Gemini 2.5 Flash

Lightning

Ultra-fast responses, large context window

Llama 4 Maverick

🔓 Open Source

Open source, privacy-focused

Venice: Uncensored

🎁 Free🔓 Open

Uncensored model, free tier available

Reasoning / Thinking Models

Advanced models that use extended thinking for complex problem-solving:

OpenAI o1/o3 Series

Reasoning through effort levels (low, medium, high)

NEXT_PUBLIC_REASONING_ENABLED=true
NEXT_PUBLIC_DEFAULT_REASONING_EFFORT=medium

Claude 3.7+ Extended Thinking

Token-based reasoning (up to 32k reasoning tokens)

NEXT_PUBLIC_MAX_REASONING_TOKENS=8000
NEXT_PUBLIC_SHOW_REASONING_BY_DEFAULT=false

DeepSeek R1

Dual method (effort + tokens)

Gemini Thinking

Token-based reasoning with configurable limits

Adding Models

Add new models through the Config UI or manually:

Via Config UI (Recommended)

  1. Open npm run config
  2. Navigate to Models section
  3. Click "Fetch Models" to get latest from OpenRouter
  4. Or manually add model with ID and display name
  5. Save configuration
  6. Restart dev server

Manual Configuration

Model schema:

{
  "id": "openai/gpt-4o",
  "displayName": "GPT-4o",
  "isFree": false,
  "isOpenSource": false,
  "supportsReasoning": false,
  "reasoningMethod": "none",
  "contextLength": 128000,
  "description": "Latest GPT-4 model"
}

Model Selection Guide

For General Chat

Use GPT-4o-mini or Claude Sonnet 4.5 for balanced performance and cost

For RAG Applications

Use Claude Sonnet 4.5 for best context understanding and citation accuracy

For WhatsApp

Use GPT-4o-mini or Gemini Flash for fast, concise mobile responses

For Complex Tasks

Use o1, o3, or Claude with extended thinking for reasoning

For Cost Optimization

Use free tier models like Venice Uncensored or Llama 4 Maverick

Model Configuration

Default Models

# Set default model for chat
NEXT_PUBLIC_DEFAULT_MODEL=anthropic/claude-sonnet-4.5

# WhatsApp specific
WHATSAPP_DEFAULT_MODEL=openai/gpt-4o-mini

# Embed widget
NEXT_PUBLIC_EMBED_MODEL=openai/gpt-4o-mini

Temperature & Parameters

Adjust model creativity and randomness (typically set per-request in UI)