Glossary · Definition

Context window

The context window is the maximum amount of text (in tokens) an AI model can process in a single request — combining your system prompt, conversation history, and output. Past the limit, the model can't 'see' earlier content.

Updated May 2026 · 4 min read

100% in-browserNo downloadsNo sign-upMalware-freeHow we keep this safe →

Definition

What it means

Context windows are measured in tokens (~4 characters or ~0.75 words each). Claude Sonnet 4.6 and Opus 4.7 have 1M tokens; Gemini 2.5/3 Pro have 2M; GPT-5 has 400k; DeepSeek V3.2 has 128k. The window includes EVERY token: system prompt + chat history + user message + tool definitions + the model's response. Models also degrade in quality near the max — most pros operate at 50-70% of rated context for production reliability.

Why it matters

Picking a model with too small a context window forces you to chunk documents, lose RAG context, or break agent loops. Conversely, paying for a 2M context model when you use 50k is wasted spend. Right-sizing the window to your actual workload is one of the bigger AI-cost levers.

Related free tools

Free toolAI Context Window PlannerPlan your prompt budget across system + docs + history + output + buffer. See which AI models (Claude, GPT, Gemini, DeepSeek, Kimi) fit your needs.Open tool →Free toolLLM Context Window CalculatorCheck if your input + output tokens fit in any major LLM (GPT-4o, Claude, Gemini, Llama, Mistral) — see headroom and percent used.Open tool →

Frequently asked questions

How big is 1M tokens?

About 750,000 words — roughly 7-8 average books, or a full medium-sized codebase.

What happens when I exceed the window?

The provider truncates the oldest content (most APIs) or refuses the request. Either way, content past the limit is invisible to the model.

What it means

Why it matters

Related free tools

Frequently asked questions

Related terms