Codapult ships with a production-ready AI layer built on the Vercel AI SDK with support for OpenAI and Anthropic models, streaming responses, tool use, organization quotas, conversation memory, and a full RAG pipeline.
Architecture
src/lib/ai/
├── models.ts # Client-safe model options (id, label, provider)
├── providers.ts # getModel() — resolves modelId → LanguageModel
├── embeddings.ts # Embedding adapter (OpenAI / Ollama)
├── vector-store.ts # Vector store adapter (SQLite / memory)
├── rag.ts # RAG pipeline (index → chunk → embed → store → retrieve)
├── conversations.ts # Conversation/message CRUD
└── chunker.ts # Text chunking with overlap
Chat Endpoint
POST /api/chat accepts a JSON body with a messages array and an optional model selector:
{
"messages": [{ "role": "user", "content": "How do I deploy?" }],
"modelId": "gpt-4o-mini"
}
The endpoint follows the standard API route pattern: auth check → rate limiting (30 requests per 60 seconds per user) → org quota check → Zod validation → RAG context injection → streaming response.
Available Models
| Model ID | Label | Provider |
| -------------------------- | --------------- | --------- |
| gpt-4o-mini | GPT-4o Mini | OpenAI |
| gpt-4o | GPT-4o | OpenAI |
| claude-sonnet-4-20250514 | Claude Sonnet 4 | Anthropic |
| claude-haiku-4-20250514 | Claude Haiku 4 | Anthropic |
Models are defined in src/lib/ai/models.ts. To add a new model, add an entry there and — if it's a new provider — add a case in src/lib/ai/providers.ts.
Configuration
All AI settings live in src/config/app.ts under appConfig.ai:
ai: {
defaultModel: 'gpt-4o-mini',
systemPrompt: 'You are a helpful AI assistant. Be concise, accurate, and helpful.',
ragEnabled: true,
ragMaxChunks: 3,
ragMinScore: 0.4,
allowedModels: [], // empty = all models from models.ts
}
| Setting | Description |
| --------------- | --------------------------------------------------------------------------- |
| defaultModel | Model used when the user doesn't pick one (must match an ID in models.ts) |
| systemPrompt | Prepended to every conversation |
| ragEnabled | Toggle RAG context injection in chat |
| ragMaxChunks | Maximum number of knowledge base chunks injected into the prompt |
| ragMinScore | Minimum cosine similarity score (0–1) for RAG results |
| allowedModels | Restrict the model selector; empty array enables all |
Tool Use
Chat supports function calling via the Vercel AI SDK. Tools are defined in /api/chat/route.ts:
import { z } from 'zod';
import type { Tool } from 'ai';
const chatTools: Record<string, Tool> = {
getWeather: {
description: 'Get current weather for a city',
parameters: z.object({ city: z.string() }),
execute: async ({ city }) => {
const data = await fetchWeather(city);
return { temperature: data.temp, condition: data.condition };
},
},
};
Multi-step tool invocations are enabled with maxSteps: 3.
Organization Quotas
AI usage is tracked per organization. Each plan defines a monthly credit allowance for the aiChat resource. The quota is checked before every chat request via checkOrgQuota(). Credits reset monthly via a background cron job.
Chat Memory
Conversation history is persisted in the database via src/lib/ai/conversations.ts:
| Endpoint | Method | Description |
| ------------------------------ | ------ | -------------------------------- |
| /api/chat/conversations | GET | List user conversations |
| /api/chat/conversations | POST | Create a new conversation |
| /api/chat/conversations/[id] | GET | Get a conversation with messages |
| /api/chat/conversations/[id] | DELETE | Delete a conversation |
The Chat UI component (src/components/ai/ChatUI) connects to these endpoints and renders a full chat interface with model selection, conversation switching, and streaming responses.
RAG Pipeline
The RAG (Retrieval-Augmented Generation) pipeline lets the AI chat reference your domain-specific content — blog posts, help docs, feature requests, or any custom text.
How It Works
- Index — content is chunked (800 chars, 150 overlap), embedded, and stored in the vector store
- Retrieve — user queries are embedded and matched against stored vectors by cosine similarity
- Augment — matching chunks are injected into the system prompt with source citations
Indexing Content
Use the indexDocument function or the admin API:
import { indexDocument } from '@/lib/ai/rag';
await indexDocument({
sourceType: 'help',
sourceId: 'getting-started',
title: 'Getting Started Guide',
content: markdownContent,
});
For large content, use the rag-index background job:
import { enqueue } from '@/lib/jobs';
await enqueue('rag-index', {
sourceType: 'blog',
sourceId: 'post-123',
title: 'My Blog Post',
content: markdownContent,
});
Admin Indexing API
POST /api/ai/index supports three actions:
| Action | Description |
| -------- | ---------------------------------------------------------------------- |
| index | Index a document (sourceType, sourceId, title, content) |
| search | Search the vector store (query, optional sourceTypes, limit, minScore) |
| delete | Delete indexed content (sourceType, optional sourceId) |
Embedding Providers
Embeddings use the adapter pattern, switched via the EMBEDDING_PROVIDER env var:
| Provider | Env Value | Requirements |
| -------- | ------------------ | ------------------------------------------- |
| OpenAI | openai (default) | OPENAI_API_KEY |
| Ollama | ollama | OLLAMA_BASE_URL, OLLAMA_EMBEDDING_MODEL |
Ollama enables fully self-hosted embeddings — no external API calls. The default Ollama model is nomic-embed-text.
Vector Store
Vector storage uses the adapter pattern, switched via VECTOR_STORE_PROVIDER:
| Store | Env Value | Description |
| ------ | ------------------ | --------------------------------------- |
| SQLite | sqlite (default) | Persisted in Turso alongside app data |
| Memory | memory | In-memory store for development/testing |
Source Types
Indexed content is categorized by source type:
| Type | Description |
| ----------------- | ------------------------------------ |
| blog | Blog posts |
| help | Help center / documentation articles |
| feature_request | Feature request descriptions |
| custom | Any custom content |
Environment Variables
| Variable | Default | Description |
| ------------------------ | ------------------------ | ------------------------------------------------- |
| OPENAI_API_KEY | — | Required for OpenAI models and default embeddings |
| ANTHROPIC_API_KEY | — | Required for Anthropic models |
| EMBEDDING_PROVIDER | openai | Embedding backend (openai or ollama) |
| VECTOR_STORE_PROVIDER | sqlite | Vector storage backend (sqlite or memory) |
| OLLAMA_BASE_URL | http://localhost:11434 | Ollama server URL |
| OLLAMA_EMBEDDING_MODEL | nomic-embed-text | Ollama model name for embeddings |
Removing the Module
AI Chat and the RAG Pipeline are separate removable modules. Use the setup wizard (npx @codapult/cli setup) to strip either or both. See docs/MODULES.md for manual removal steps.