RAG Chatbot: What It Is, How It Works and When You Need One

Table of contents
In a 2024 McKinsey survey, 65% of companies already used generative AI in at least one business function — double the previous year. But when we talk to founders in Moldova and across the region, the question we hear isn't "should we" — it's "how do we make sure our chatbot doesn't sound like a generic ChatGPT that invents pricing for our products."
The short answer: RAG chatbot. The long — and honest — answer is what follows.
What is a RAG chatbot (and why it's not just ChatGPT with a new coat)
RAG stands for Retrieval-Augmented Generation. The fundamental difference from a pure-LLM chatbot: a RAG chatbot doesn't invent answers from its training data. It first searches your own data — documents, FAQs, contracts, manuals, ticket history — and then writes the answer using only the context it found.
Across 12 support-automation projects we audited in the last two years, 9 shared the same problem: a "smart" chatbot answering confidently about a return policy — a return policy the company either didn't have or defined differently. That's where RAG changes the game: the bot responds "I can't find that information" instead of hallucinating.
How a RAG chatbot works, step by step
1. Data ingestion (knowledge base)
Everything starts with what the bot "knows." PDFs, site pages, Confluence articles, product databases, call transcripts — all of it gets split into small chunks (200–800 tokens) and prepared for indexing.
2. Embeddings and a vector database
Each chunk is converted into a numerical vector (embedding) by a specialized model. Vectors are stored in a vector database — Pinecone, Weaviate, Qdrant, or pgvector on Postgres. This is where "semantic search" happens: two sentences with the same meaning but different words end up close to each other in vector space.
3. Retrieval — finding relevant context
When a user asks something, the question is turned into a vector too. The vector DB returns the top 3–10 most relevant chunks. 80% of RAG bugs hide here: if retrieval brings the wrong context, the LLM will confidently give the wrong answer.
4. Generating the answer via the LLM
The relevant chunks go to an LLM (Claude, GPT-4, Llama, Mistral) along with the question, under a strict prompt: "answer only using the context below; if you can't find the answer, say so." Result: answers anchored in real data, with citations to the source.
Classic chatbot vs RAG chatbot — what matters for business
- Answer accuracy: classic bot (rule-based or pure LLM) — ~60–70%. A well-implemented RAG — 85–95% on a narrow domain.
- Hallucinations: with pure LLM bots they're the rule, not the exception. RAG cuts them dramatically but doesn't fully eliminate them.
- Maintenance: classic bots need hand-written rules per scenario. RAG only needs source documents updated — the bot "learns" automatically.
- Scaling: rule-based becomes unmaintainable past 200–300 rules. RAG scales to tens of thousands of documents.
- Operating cost: rule-based is cheaper short-term; RAG becomes cheaper once the knowledge base crosses a threshold.
5 clear signals your business needs a RAG chatbot
- Your support team answers the same 20–50 questions every day. If your static FAQ goes unread, RAG is the answer — people prefer to ask naturally rather than search.
- You have vast documentation no one reads. Internal manuals, product guides, policies — RAG makes them queryable in natural language.
- Sales lose leads outside business hours. A RAG bot wired to your catalog can qualify and inform leads 24/7 without inventing pricing.
- Onboarding new employees takes weeks. RAG over internal docs cuts time-to-productivity by 30–50%.
- Duplicate ticket volume exceeds 40%. That means the answers exist — accessibility is the problem. Exactly what RAG fixes.
If you recognize yourself in 2 of 5, it's worth exploring a tailored AI integration for your business. 4 of 5 — you're already late to the decision.
The mistakes I see most often in RAG projects
Poorly structured knowledge base
"Let's dump all the PDFs into Pinecone" is the recipe for bad results. Duplicate documents, old versions never deleted, unOCR'd scans — all of it pollutes retrieval. Before any vectors, you do a data audit. Always.
No evaluation loop
How do you know the bot is answering well? Not by "it seems fine on 5 test questions." You need a set of 50–200 real questions (from historical tickets) with human-validated "correct" answers, run on every change. Without it, you don't know whether the latest improvement is a regression.
Ignoring cost per query
A RAG query costs anywhere from $0.001 to $0.05 depending on the model (Haiku vs Opus, GPT-4o-mini vs GPT-4o) and context length. At 10,000 queries/month, the difference is $10 vs $500. The most powerful model isn't always the answer — calibrate to the task.
Confusing RAG with fine-tuning
Fine-tuning teaches the model to answer in a certain style; RAG gives it access to new facts. If you want the bot to know tomorrow's prices, you need RAG, not fine-tuning. They're not alternatives — mature projects combine them.
How we build a RAG chatbot at XCORE — the 4-stage plan
Data and flows audit (1–2 weeks)
We inventory data sources, map the top 50 real questions, define success metrics (self-service rate, CSAT, deflection rate). This is where we decide whether RAG is the answer or a classic automation will do.
PoC on a narrow domain (2–4 weeks)
We build an MVP on a single question category (billing, or returns). We run it on 20–30% of real traffic, measure. If the MVP doesn't clear 80% accuracy on the eval set, we stop or pivot before bigger investment.
Integration with CRM, website and WhatsApp (2–4 weeks)
The bot is only useful where the customers are. Typical integrations: site widget, WhatsApp Business API, Telegram, auto-escalation into AmoCRM/HubSpot when the bot doesn't know. Conversation history sync for continuity.
Monitoring and iteration (continuous)
Dashboard of queries answered "I don't know," low-confidence queries, negative feedback. Every 2 weeks: review, prompt adjustments, knowledge base additions. A RAG chatbot isn't a project that "ends" — it's a product you maintain.
A concrete example of what this looks like in practice is the DoctorChat case study, where we built a conversational platform specialized for the healthcare sector.
Costs and ROI — what to expect realistically
For a typical SMB in Moldova or the region:
- Implementation: $4,000–$25,000 depending on integration complexity and knowledge base size
- Monthly infrastructure: $150–$800 (LLM calls + vector DB + hosting)
- Maintenance: 4–12 hours/month after launch
- Typical ROI: 30–60% reduction in repetitive ticket volume in the first 3 months, freeing the support team for higher-value cases
Important: ROI doesn't come from "let's replace the support team." It comes from "the team solves more of what a bot can't do." Companies aiming at layoffs through RAG fail 90% of the time — customers notice fast when they're talking to a wall, and they leave.
How to decide if RAG is the right step right now
Quick checklist — tick what's true for your business:
- We have written documentation or structured data a bot could use
- Repetitive question volume justifies the investment (>500/month)
- We have in-house capacity (or a partner) to maintain the bot post-launch
- We're OK with a 4–8 week experiment before scaling
- Our budget is realistic (no $500 implementations producing magic)
3 of 5 ticked — worth an audit. 5 of 5 — you're already late. If you're not sure where you land, a short IT consulting session will clarify whether RAG is the answer or you need something else first (better data, a clearer process, or simply a better FAQ).
A RAG chatbot isn't magic. It's a combination of good data, well-tuned retrieval, and an LLM kept on a short leash. When all three line up, it becomes the best member of your support team — one that doesn't sleep, doesn't forget, and never ignores what's written in the return policy.

Need a professional website?
Talk to the XCORE team for free about how we can digitalize your business — website, online store, integrations, or AI automation.