Skip to content
Free Tool Arena

Glossary · Definition

Token

A token is the basic unit of text an LLM reads and produces. Roughly 4 characters or 0.75 words on average for English; longer for code, shorter for languages with lots of subword tokens. APIs bill by token.

Updated May 2026 · 4 min read
100% in-browserNo downloadsNo sign-upMalware-freeHow we keep this safe →

Definition

A token is the basic unit of text an LLM reads and produces. Roughly 4 characters or 0.75 words on average for English; longer for code, shorter for languages with lots of subword tokens. APIs bill by token.

What it means

Tokenization is the first step of every LLM request. The text is split into subword pieces using BPE (byte-pair encoding) or similar. 'Hello world' is 2 tokens. 'antidisestablishmentarianism' is 4-5. Different model families use different tokenizers — GPT, Claude, Gemini, Llama all differ by 10-30% on the same text. The OpenAI tiktoken library is the most-used reference.

Advertisement

Why it matters

Token count drives cost, context window utilization, and (sometimes) speed. Reducing tokens by 30% via prompt compression or caching saves real money on large-scale workloads. Most users undercount their token usage because they ignore tool definitions, system prompts, and reasoning traces.

Related free tools

Frequently asked questions

How many tokens in 1000 words?

About 1,300-1,400 tokens for English. Varies by tokenizer.

Are output tokens billed differently?

Yes — output is typically 4-5x more expensive than input across major providers.

Related terms