Glossary · Definition

Perplexity (AI metric)

In AI/ML, perplexity is a measure of how 'surprised' a language model is by a piece of text. Computed as 2^cross-entropy. Lower = better — the model assigns higher probability to the actual text.

Updated May 2026 · 4 min read

100% in-browserNo downloadsNo sign-upMalware-freeHow we keep this safe →

Definition

In AI/ML, perplexity is a measure of how 'surprised' a language model is by a piece of text. Computed as 2^cross-entropy. Lower = better — the model assigns higher probability to the actual text.

What it means

Perplexity is computed on a held-out evaluation set. A model with perplexity 5 means it's roughly 'as confused as' uniformly choosing among 5 next tokens at each step. Modern frontier LLMs have perplexity in the low single digits on standard benchmarks (WikiText-103, The Pile). Lower is always better — but absolute perplexity is hard to compare across tokenizers (different vocab sizes affect the number).

Why it matters

Perplexity is a quick research-grade quality metric for language models. Falling out of favor for production decisions (where MMLU, SWE-bench, custom evals are more relevant) but still useful for: comparing fine-tunes of the same base model, monitoring training progress, sanity-checking that a model hasn't degraded.

Related free tools

Free toolFrontier AI Model TrackerLive tracker of every frontier AI model: Claude 4.x, GPT-5, Gemini 3 Pro, DeepSeek R1/V3.2, Kimi K2, Grok 4, Llama 4, Qwen 3.5, Mistral Large 3.Open tool →

Frequently asked questions

Why low perplexity matters?

It correlates with lots of downstream qualities (better generation, better reasoning) but isn't a perfect proxy. A model can have low perplexity and bad chat behavior.

Same as Perplexity.ai?

No — that's a search engine company that took the name. Different things.

What it means

Why it matters

Related free tools

Frequently asked questions

Related terms