Skip to content
Free Tool Arena

Glossary · Definition

VRAM

VRAM (Video RAM) is the memory on your GPU. It determines which AI models you can run locally — the model + KV cache + activations all need to fit. The single most-relevant hardware spec for local AI.

Updated May 2026 · 4 min read
100% in-browserNo downloadsNo sign-upMalware-freeHow we keep this safe →

Definition

VRAM (Video RAM) is the memory on your GPU. It determines which AI models you can run locally — the model + KV cache + activations all need to fit. The single most-relevant hardware spec for local AI.

What it means

Approximate VRAM needs: 7B model Q4 = 6 GB; 13B Q4 = 10 GB; 32B Q4 = 22 GB; 70B Q4 = 42 GB. Plus 1-5 GB for KV cache depending on context length. Consumer GPUs in 2026: RTX 4090 = 24 GB, RTX 5090 = 32 GB, Apple Silicon unified memory ranges 16-192 GB but slower bandwidth. For models too big for one GPU, you split via tensor parallelism (multiple GPUs in one machine, fast) or pipeline parallelism (multiple machines, slower).

Advertisement

Why it matters

If you're buying hardware for local AI, VRAM is the single most-impactful number. A 4090 (24 GB) vs 4080 (16 GB) is the difference between running 32B vs only 13B models. Mac Studio with 192 GB unified memory hosts 70B+ models that no consumer Nvidia GPU can fit alone.

Related free tools

Frequently asked questions

Can I split a model across GPUs?

Yes — tensor parallelism within one machine via vLLM/TGI; pipeline parallelism across machines via llama.cpp RPC, exo, or Hyperspace pods.

Apple Silicon vs Nvidia?

Mac Studio 192 GB hosts huge models due to unified memory. Nvidia GPUs are faster per-GB but limited by VRAM. Different tradeoffs.

Related terms