WordPress Development — RAG-Powered AI Chat

AI answers grounded in your content.

We build RAG-powered AI chat for WordPress sites. Visitors get accurate answers drawn from your actual content — with inline source citations. No hallucinations, no generic AI responses, no made-up links.

No Hallucinations Source Citations WooCommerce Account Tools OpenAI · Claude · Gemini · Grok
RAG-Grounded Answers
Inline Source Citations
WooCommerce Account Tools
SSE Token Streaming
Native or Pinecone Vector Store
Prompt Injection Protection

What We Build

A complete AI chat solution, custom-built for your WordPress site

We build a full RAG pipeline that indexes your content, retrieves the most relevant context per question, and delivers grounded answers — complete with source citations, WooCommerce account tools, and a streaming chat widget.

Semantic Knowledge Base

Your posts, pages, WooCommerce products, PDFs, and audio files are indexed into a searchable vector database. Content updates automatically sync when you publish or edit.

Your Choice of AI Provider

We integrate the AI model that fits your stack and budget — OpenAI, Anthropic (Claude), Google Gemini, xAI Grok, or a self-hosted endpoint. Chat and embedding providers are independent.

Source Citations

Every site-specific fact gets a numbered inline reference. Only sources the AI actually cited appear in the response — so visitors can verify each claim against your real content.

WooCommerce Account Tools

Logged-in customers can query their own orders, subscriptions, and memberships directly in the chat. Identity is enforced server-side — the AI cannot be tricked into exposing another customer’s data.

Streaming Chat Widget

A floating launcher or page-embedded chat panel. Token-by-token streaming via Server-Sent Events for a responsive feel, with automatic silent fallback for buffered hosting environments.

PDF & Audio Indexing

Documentation PDFs, scanned brochures, and audio recordings are all indexable. Native text extraction covers most PDFs; AI-powered OCR and audio transcription handle the rest.

Admin Chat Logs

Every conversation is logged with outcome classification (answered / no results / out of scope / misuse) and token usage. Configurable retention with automatic purge keeps data lean.

Extensible & Custom-fit

The agent is built on a fully extensible architecture. Custom personas, additional tools, vector store choices, and domain-specific topic restrictions can all be wired in to fit your site exactly.

How It Works

A knowledge base built from your content — not the internet

When you publish or update content, an indexing pipeline extracts clean text, splits it into context-aware chunks, adds a category breadcrumb header for richer embeddings, and stores the resulting vectors. When a visitor asks a question, the query is embedded and matched against your indexed content — only the most relevant chunks reach the AI, each cited with a source number.

  • Paragraph-boundary chunking with configurable size and overlap
  • Contextual embedding header: each chunk carries category breadcrumb + title in its vector input
  • Change detection skips unchanged content — no redundant API calls
  • Optional recency boost with configurable half-life for time-sensitive content
  • If no relevant content is found, no AI call is made — a clear response is returned immediately
Chunking Embeddings Vector Search Similarity Threshold Recency Boost Background Indexing
rag-pipeline

// Indexing pipeline (runs in background)

INDEXING (on publish / update)

1. extract // title + content + meta

2. chunk // paragraph-aware, overlapping

3. embed // + context header prepended

4. upsert // native DB or Pinecone

// Retrieval (per chat turn)

ANSWERING (per visitor message)

embed query → vector search → threshold

grounded prompt → LLM → cite → stream

ai-provider-config

AI Provider Configuration

Chat Provider

Anthropic (Claude)
OpenAI (GPT-4o)
Anthropic (Claude)
Google Gemini
xAI Grok
Custom / Self-hosted

Embedding Provider

OpenAI

Model

claude-sonnet-4-6

Chat and embedding providers are configured independently

Multi-Provider AI

Works with the AI provider you already use or prefer

We integrate whichever AI model fits your requirements. Chat and embedding providers are configured independently, so you can combine them for the best balance of cost, performance, and data-residency requirements. Self-hosted models are fully supported for zero data egress.

  • Chat: OpenAI (GPT-4o, GPT-4o-mini), Anthropic (Claude), Google Gemini, xAI Grok
  • Self-hosted / custom OpenAI-compatible endpoints: Ollama, LM Studio, Groq, Mistral
  • Embeddings: OpenAI text-embedding-3 with dimension truncation, Gemini, or custom
  • Each provider’s model, temperature, and max tokens are configurable independently
  • Fully self-hosted option: no visitor data or content ever leaves your server
OpenAI Anthropic Claude Gemini xAI Grok Custom Endpoint Self-hosted

WooCommerce Integration

Logged-in customers can ask about their own account — safely

Beyond content retrieval, the AI agent can call WooCommerce on behalf of the logged-in customer. They ask in plain language; the agent fetches the data and composes a clear answer. Customer identity is enforced server-side — the model cannot be prompted to access another customer’s data.

  • Orders: recent order list and order-number lookup with status, total, and line items
  • Subscriptions: plan status, included products, and next payment date
  • Memberships: plan name, status, start date, and expiry date
  • Each tool requires the visitor to be logged in — unavailable to guests
  • Agentic loop: up to three rounds of tool calls before a final natural-language answer
WooCommerce Subscriptions Memberships Tool Use Agentic Loop Server-side Auth
woo-account-tools

Chat session — logged in as customer

What’s the status of my latest order?
get_my_orders tool call
Your most recent order is #1042 (placed 2025-06-14), currently Processing. It contains 2× Widget Pro — total $89.00.

Identity comes from the WordPress session, not the chat message

chat-widget-config

Chat Widget Configuration

Availability

Everyone (public)

Floating Launcher

Bottom-right corner

Primary Color

#D20959

Welcome Message

Hello! Ask me anything…

Embed Mode

Floating + Page Embed

Chat Widget & Embed

Two ways to embed — floating launcher or inline page panel

The floating widget launches from any corner of the screen and can be configured to open automatically. A page-embedded panel drops a full-height chat into any post, page, or widget area — useful for dedicated support or help pages. Both can coexist on the same page.

  • Floating launcher at any corner; configurable to start open on page load
  • Page-embedded panel with configurable height, welcome message, and title
  • Custom brand color, logo/avatar, placeholder text, and powered-by text
  • Token-by-token SSE streaming; automatic silent fallback for buffered hosts
  • Restrict to logged-in users, specific roles, or leave open to all visitors
Floating Widget Page Embed SSE Streaming Custom Branding Role Restriction
5 providers
AI chat providers supported
OpenAI, Claude, Gemini, Grok, Custom
Multiple
Vector store options
Native DB, Pinecone, or self-hosted
3 tools
WooCommerce account tools
Orders, Subscriptions, Memberships
5 types
Indexable content types
Posts, Pages, Products, PDFs, Audio

The Process

From initial brief to live AI chat on your site

No AI infrastructure to manage on your end. We handle the setup, configuration, and integration — you get a working AI chat agent that knows your content.

01

Brief & Scope

We discuss your site, content types, AI provider preference, and desired behaviour — how the agent should present itself, which topics it should cover, and whether WooCommerce account tools are needed.

02

Setup & Index

We configure the AI provider, vector store, and indexing pipeline. Your existing content is processed into searchable embeddings. A live dashboard shows the indexing status per content type.

03

Deploy & Monitor

The chat widget goes live on your site. We fine-tune the persona, grounding mode, and topic restrictions based on early conversations. The admin log gives you full visibility into how visitors are using the agent.

FAQ

Common questions

RAG stands for Retrieval-Augmented Generation. Before calling the AI, the agent searches your indexed content for the most relevant chunks and passes them to the model as grounding context. The model is instructed to answer only from those retrieved excerpts, citing each fact with a numbered marker. Since the AI is constrained to what your site actually says, it cannot invent products, prices, or policies that don’t exist.
We support five chat providers: OpenAI (GPT-4o, GPT-4o-mini), Anthropic (Claude Sonnet, Haiku), Google Gemini (2.0 Flash, Pro), xAI Grok (Grok-4), and any custom OpenAI-compatible endpoint — Ollama, LM Studio, Groq, Together.ai, Mistral. Embedding providers: OpenAI text-embedding-3, Google Gemini, or a custom endpoint. Chat and embedding providers can be mixed independently.
No. We use a zero-config native vector store that runs on your WordPress database. It handles approximately 20,000–30,000 indexed chunks comfortably — more than enough for most sites. For larger content libraries or multi-server environments, we can connect a Pinecone serverless index instead — or a self-hosted vector database such as Qdrant or Weaviate.
Yes. PDF attachments are indexed with native text extraction. Scanned PDFs that need OCR are handled by an AI-powered extraction fallback. Audio files (MP3, M4A, WAV, OGG, AAC, WebM) are transcribed via OpenAI Whisper or Gemini. Extraction results are cached per file, so each document is only processed once.
Yes. When order tools are enabled, logged-in customers can ask “What is the status of order #1234?” or “Show my recent orders” and the agent fetches the answer directly from WooCommerce. The customer’s identity is verified server-side from their WordPress login session — the AI cannot be manipulated into showing another customer’s data. WooCommerce Subscriptions and Memberships work the same way.
A guardrail layer inspects every incoming message for patterns like “ignore previous instructions,” “reveal the system prompt,” or jailbreak attempts. On a match, a configured refusal is returned immediately — no retrieval, no AI call. Inside the prompt, user messages and retrieved content are explicitly labelled as data, so the model treats embedded commands as out-of-scope content.
We use Server-Sent Events (SSE) to stream tokens to the browser as the AI generates them — the answer appears word by word. If your hosting environment buffers PHP output (FastCGI, LiteSpeed, CDN proxy), the chat automatically falls back to delivering the complete answer at once, with no visible error to the visitor.
Yes. Using the native vector store and a self-hosted OpenAI-compatible model (Ollama, LM Studio), all processing stays on your server. No content and no visitor queries reach any third-party AI provider. This is an option we can discuss for high-privacy or data-residency-constrained environments.
Built for
WordPress 6.4+ PHP 8.1+
AI Providers
OpenAI Anthropic Gemini Grok Custom Endpoint
Vector Stores
Native DB Pinecone Self-Hosted

Let’s Build This

An AI chat agent that knows your content — not the internet.

Stop answering the same questions. Let an AI agent trained on your site handle them — with source citations, WooCommerce account tools, and a chat widget that matches your brand. We handle the build.

webdevelop.hu · Custom WordPress AI development · Built for WordPress 6.4+ · PHP 8.1+