Glossary · Definition
Knowledge distillation
Knowledge distillation trains a small 'student' model to imitate a larger 'teacher' model's outputs. Used to ship cheap, fast versions of frontier models — DeepSeek-Distill-Qwen, Phi-4, Gemini Flash, etc.
Definition
Knowledge distillation trains a small 'student' model to imitate a larger 'teacher' model's outputs. Used to ship cheap, fast versions of frontier models — DeepSeek-Distill-Qwen, Phi-4, Gemini Flash, etc.
What it means
Original paper: Hinton et al. 2015. The student is trained on the teacher's output distribution (soft targets) rather than hard labels — preserving more information about the teacher's 'opinion' on edge cases. Modern variants distill multiple teachers, distill specific capabilities (reasoning, math, code), or distill from one architecture into another. Frontier models often ship their distilled cousins: DeepSeek R1 → DeepSeek-R1-Distill-Qwen-32B; Gemini Pro → Gemini Flash.
Advertisement
Why it matters
Distillation is how frontier-quality intelligence gets cheap. A distilled 32B model can deliver 80-90% of a 671B teacher's quality at 5-10% of inference cost. For self-host on consumer hardware, distilled models are usually the best practical option.
Related free tools
Frequently asked questions
Best distilled models in 2026?
DeepSeek-R1-Distill-Qwen-32B (reasoning), Phi-4 (general 14B), Gemini Flash (multimodal). All run well on a single H100 or 24-32GB VRAM.
Distillation vs LoRA?
Different goals. LoRA: adapt a model to your task/style. Distillation: shrink a model to a smaller architecture entirely.
Related terms
- DefinitionLoRA (Low-Rank Adaptation)LoRA (Low-Rank Adaptation) is a parameter-efficient fine-tuning technique. Instead of training all model weights, you train small low-rank 'adapter' matrices applied to specific layers. Saves 90%+ of memory + cost vs full fine-tuning.
- DefinitionFine-tuningFine-tuning is the process of further training a pretrained model on your specific data, baking in style, format, or domain knowledge that's hard to achieve with prompting alone.