Glossary · Definition
LoRA (Low-Rank Adaptation)
LoRA (Low-Rank Adaptation) is a parameter-efficient fine-tuning technique. Instead of training all model weights, you train small low-rank 'adapter' matrices applied to specific layers. Saves 90%+ of memory + cost vs full fine-tuning.
Definition
LoRA (Low-Rank Adaptation) is a parameter-efficient fine-tuning technique. Instead of training all model weights, you train small low-rank 'adapter' matrices applied to specific layers. Saves 90%+ of memory + cost vs full fine-tuning.
What it means
Original paper: Hu et al. 2021. The trick: weight updates during fine-tuning have low intrinsic rank, so you can decompose ΔW = BA (where B and A are small low-rank matrices). Train only B + A; freeze the rest. For a 70B model, full fine-tuning needs ~1.4 TB of memory; LoRA at rank 16 needs ~30 GB. Practical implementations: PEFT library (Hugging Face), Unsloth, Axolotl. QLoRA combines LoRA with 4-bit quantization for even lower memory.
Advertisement
Why it matters
LoRA is what makes hobbyist + small-team fine-tuning viable. Pre-LoRA, fine-tuning a 13B model required serious GPU rigs. With LoRA, a single RTX 4090 can fine-tune a 13B model. The democratization of fine-tuning since 2023 is mostly LoRA + QLoRA + good libraries.
Related free tools
Frequently asked questions
LoRA vs full fine-tuning quality?
LoRA captures 90-99% of full fine-tuning's quality at 5-10% of the cost. Worth the tradeoff for nearly all use cases.
Best library?
Unsloth (fastest, best memory) or Axolotl (most flexible config). Both wrap PEFT + Hugging Face Transformers.
Related terms
- DefinitionFine-tuningFine-tuning is the process of further training a pretrained model on your specific data, baking in style, format, or domain knowledge that's hard to achieve with prompting alone.
- DefinitionQuantizationQuantization compresses AI model weights from 16-bit floats (FP16) to lower bit-widths — Q8, Q5, Q4, Q3 — letting larger models fit on smaller hardware at modest quality cost.