Glossary · Definition
MoE (Mixture of Experts)
MoE (Mixture of Experts) is an AI architecture where the model has many specialized sub-networks ('experts') and only activates a few per token. Lets the model be huge in total parameters but cheap to run.
Definition
MoE (Mixture of Experts) is an AI architecture where the model has many specialized sub-networks ('experts') and only activates a few per token. Lets the model be huge in total parameters but cheap to run.
What it means
Standard 'dense' models activate every parameter for every token. MoE models route each token through only a fraction (typically 2-8 of 32-256 experts). This means: total params can be 671B (DeepSeek V3.2) or 1T+ (Kimi K2), but actually-active params per token are much smaller (~20-40B), so inference cost ≈ a 30B model. Mixtral 8x7B was the model that popularized MoE; DeepSeek V3, Llama 4 Maverick, and Kimi K2 are major 2026 examples.
Advertisement
Why it matters
MoE is why open-weight models suddenly leaped ahead in 2025-2026. Frontier-quality models that used to require dense 70-200B dense models now run as MoE at much lower inference cost. The catch: VRAM still needs to fit all the experts (high memory floor), even though compute is cheap.
Related free tools
Frequently asked questions
Best MoE model?
For coding + agents: DeepSeek V3.2 (671B MoE). For long context: Kimi K2 (1T MoE, 1M context). For open + Western: Llama 4 Maverick (402B MoE).
Can I run MoE locally?
Yes if you have the VRAM. Hyperspace pods (multi-machine) make it practical without a single huge GPU.
Related terms
- DefinitionFine-tuningFine-tuning is the process of further training a pretrained model on your specific data, baking in style, format, or domain knowledge that's hard to achieve with prompting alone.
- DefinitionContext windowThe context window is the maximum amount of text (in tokens) an AI model can process in a single request — combining your system prompt, conversation history, and output. Past the limit, the model can't 'see' earlier content.