Skip to content
Free Tool Arena

Glossary · Definition

RAG (Retrieval-Augmented Generation)

RAG (Retrieval Augmented Generation) augments an LLM with documents retrieved at query time — typically from a vector database. The LLM grounds its answer in the retrieved text instead of relying purely on training data.

Updated May 2026 · 4 min read
100% in-browserNo downloadsNo sign-upMalware-freeHow we keep this safe →

Definition

RAG (Retrieval Augmented Generation) augments an LLM with documents retrieved at query time — typically from a vector database. The LLM grounds its answer in the retrieved text instead of relying purely on training data.

What it means

A RAG system has three components: an embedding model that converts documents + queries to vectors, a vector database (Pinecone, Weaviate, pgvector, etc.) that stores + retrieves the most-similar documents, and an LLM that synthesizes the retrieved context into an answer. The chunking strategy (500-1500 tokens with overlap) and reranking (often via Cohere Rerank or BM25 hybrid) heavily affect quality.

Advertisement

Why it matters

RAG is how most production AI products use private data without retraining the model. It's cheaper than fine-tuning, easier to update, and the retrieved sources are auditable. The downside: poorly tuned RAG retrieves irrelevant chunks, leading to hallucinations.

Related free tools

Frequently asked questions

RAG vs fine-tuning?

RAG: retrieve at query time, easy to update, sources auditable. Fine-tuning: bake knowledge into model weights, faster inference, but expensive and hard to update. Most production systems start with RAG and add fine-tuning only when retrieval quality plateaus.

Which vector database?

pgvector for most teams (just Postgres). Pinecone for managed scale. Weaviate for hybrid + multi-modal. Qdrant for self-host + speed.

Related terms