Glossary · Definition
Fine-tuning
Fine-tuning is the process of further training a pretrained model on your specific data, baking in style, format, or domain knowledge that's hard to achieve with prompting alone.
Definition
Fine-tuning is the process of further training a pretrained model on your specific data, baking in style, format, or domain knowledge that's hard to achieve with prompting alone.
What it means
Three categories matter in 2026: full fine-tuning (rare for foundation models — too expensive), LoRA / PEFT (parameter-efficient, the standard), and RLHF / DPO (alignment fine-tuning). OpenAI, Anthropic, and Google all offer hosted fine-tuning APIs at $25-100 per million training tokens. Open-weight models (Llama, Qwen, DeepSeek) can be fine-tuned anywhere using libraries like Unsloth, Axolotl, or Hugging Face PEFT.
Advertisement
Why it matters
Most production teams skip fine-tuning until prompting and RAG hit a quality ceiling. Fine-tuning is the right move when: you need consistent format/style not achievable with examples, your domain has terminology the base model doesn't know well, or you're optimizing inference cost (smaller fine-tuned model > prompt-engineered larger one).
Related free tools
Frequently asked questions
Fine-tuning vs RAG?
RAG: retrieve facts at query time. Fine-tuning: bake style/format/terminology into model. They complement; you use both for serious products.
Cost?
$25-100 per million training tokens on hosted APIs. Open-weight LoRA fine-tuning runs $50-500 of GPU time depending on model size + dataset.
Related terms
- DefinitionRAG (Retrieval-Augmented Generation)RAG (Retrieval Augmented Generation) augments an LLM with documents retrieved at query time — typically from a vector database. The LLM grounds its answer in the retrieved text instead of relying purely on training data.
- DefinitionContext windowThe context window is the maximum amount of text (in tokens) an AI model can process in a single request — combining your system prompt, conversation history, and output. Past the limit, the model can't 'see' earlier content.