Glossary · Definition

LoRA (Low-Rank Adaptation)

LoRA (Low-Rank Adaptation) is a parameter-efficient fine-tuning technique. Instead of training all model weights, you train small low-rank 'adapter' matrices applied to specific layers. Saves 90%+ of memory + cost vs full fine-tuning.

Updated May 2026 · 4 min read

100% in-browserNo downloadsNo sign-upMalware-freeHow we keep this safe →

Definition

What it means

Original paper: Hu et al. 2021. The trick: weight updates during fine-tuning have low intrinsic rank, so you can decompose ΔW = BA (where B and A are small low-rank matrices). Train only B + A; freeze the rest. For a 70B model, full fine-tuning needs ~1.4 TB of memory; LoRA at rank 16 needs ~30 GB. Practical implementations: PEFT library (Hugging Face), Unsloth, Axolotl. QLoRA combines LoRA with 4-bit quantization for even lower memory.

Why it matters

LoRA is what makes hobbyist + small-team fine-tuning viable. Pre-LoRA, fine-tuning a 13B model required serious GPU rigs. With LoRA, a single RTX 4090 can fine-tune a 13B model. The democratization of fine-tuning since 2023 is mostly LoRA + QLoRA + good libraries.

Related free tools

Free toolOpen-Source LLM TrackerLive tracker of 15 open-weight LLMs: Llama 3.3/4, Qwen 3.5, DeepSeek V3.2/R1, Kimi K2, Mistral Large 3, Gemma 3, Phi-4, SmolLM3. Filter by license.Open tool →

Frequently asked questions

LoRA vs full fine-tuning quality?

LoRA captures 90-99% of full fine-tuning's quality at 5-10% of the cost. Worth the tradeoff for nearly all use cases.

Best library?

Unsloth (fastest, best memory) or Axolotl (most flexible config). Both wrap PEFT + Hugging Face Transformers.

What it means

Why it matters

Related free tools

Frequently asked questions

Related terms