RPM = Requests Per Minute (each API call counts as 1 request). TPM = Tokens Per Minute (input + output tokens summed across all requests). For chat use, RPM is usually the binding constraint (lots of small requests). For long-context work (summarizing books, processing transcripts), TPM is binding (few requests but many tokens each). Both apply simultaneously — you hit whichever limit first.

How do I move up tiers?

Most providers move you automatically based on cumulative paid spend over a rolling period (typically the past 30-90 days). Anthropic: Tier 1 ($5+ paid), Tier 2 ($50), Tier 3 ($100), Tier 4 ($400). OpenAI: Tier 1 ($5+ paid), through Tier 5 ($1000+). The progression typically takes weeks. For larger limits, contact sales — most providers offer enterprise tiers with negotiated limits.

Is Claude Pro enough?

Depends on usage. Claude Pro ($20/mo) has a 5x usage limit vs free; sufficient for most casual use but easily exhausted by heavy users (large documents, long conversations, multiple projects per day). Heavy users should consider Claude Max ($100/mo for 5-20x Pro's limit), or supplement with API access for specific workloads. Pro's “5x” is shared across all interfaces (web, desktop, mobile, Claude Code).

What about ChatGPT Plus vs Pro?

ChatGPT Plus ($20/mo): 80 messages per 3 hours on GPT-5; advanced features. ChatGPT Pro ($200/mo): 10x Plus message limit; access to o3-pro model; longer task limits. Most users find Plus sufficient. Pro is for power users hitting Plus limits, agentic workflows requiring extended runtime, or workflows specifically benefiting from o3-pro's deeper reasoning.

How do I handle 429 errors?

Implement exponential backoff: on 429 response, wait 1s and retry, then 2s, 4s, 8s, etc. Most SDKs (Anthropic Python, OpenAI Python) do this automatically — enable retries in client config. For burst traffic patterns, queue requests and rate-limit them on YOUR side to stay under provider limits. Don't rely on provider 429s as a flow-control mechanism; it adds latency.

Can I get higher limits than published tiers?

Yes — contact provider sales. Anthropic, OpenAI, Google, DeepSeek all have enterprise programs with negotiated limits beyond published tiers. Typically requires demonstrating sustained substantial usage (e.g., $5K+/month consistently) and a use case explanation. Custom limits often come with custom pricing, dedicated support, security reviews. For startups under $10K/month spend, just rely on the standard tier progression.

AI & Prompt Tools · Free tool

AI API Rate Limit Tracker

Current RPM, TPM, and daily quota limits across Anthropic, OpenAI, Google, DeepSeek, Perplexity, xAI — by tier, including ChatGPT Plus/Pro caps.

Updated June 2026

Provider	Plan	Price	RPM	TPM	Daily	Notes
Anthropic	Tier 1 ($5 funded)	$5+	50	20k	—	Day-1 cap; raises with usage
Anthropic	Tier 4 (sustained)	$200+	4000	1M	—	Granted ~30 days of usage
Anthropic	Claude Pro	$20/mo	—	—	5x usage	Hourly + weekly caps
Anthropic	Claude Max 5x	$100/mo	—	—	20x usage	Higher weekly cap
OpenAI	Tier 1 (after $5)	$5+	500	200k	10k	GPT-5 = 30k TPM here
OpenAI	Tier 5 (sustained)	$1000+	10000	30M	—	Prod-tier
OpenAI	ChatGPT Plus	$20/mo	—	—	GPT-5: 200/3h	Plus throttles when busy
OpenAI	ChatGPT Pro	$200/mo	—	—	Higher	Pro uses o-pro reasoning
Google	Gemini API Free	$0	5	1M	25	Hard rate-limit
Google	Gemini API Tier 1	Pay	1000	4M	—	Most apps land here
Google	Gemini Advanced	$20/mo	—	—	Generous	1-day cooldown if hammered
DeepSeek	API	Pay	—	—	—	No published rate limit
Perplexity	Pro	$20/mo	—	—	300/day Pro Search	+ unlimited quick search
xAI	Grok API	Pay	60	10k	—	Day-1 default

Hitting limits early? All providers raise tiers based on cumulative spend + days-since-first-payment. Anthropic auto-promotes; OpenAI auto-promotes after 7 days at each tier. To avoid 429s in production, build retry-with-exponential-backoff into your client and use the streaming response API (it bills only what you actually generate).

Data transparency: rate limits verified against provider documentation on 2026-04-30. Tier thresholds change without notice — confirm directly in your provider console before architecting around specific numbers. See source & transparency for full sourcing.

Found this useful?Email Buy Me a Coffee

What it does

AI provider rate limits affect both consumer plans (ChatGPT Plus, Claude Pro, Gemini Advanced have message-per-window caps) and API workloads (RPM = requests per minute, TPM = tokens per minute, daily / monthly quotas). Hitting limits at the wrong moment — production traffic spike, demo to investors, scheduled batch run — can break product reliability. Understanding your current tier and the path to higher tiers is essential for anyone building on top of frontier models. Most providers tier limits based on cumulative spend: Tier 1 ($5+ paid), Tier 2 ($50+), Tier 3 ($100+), Tier 4 ($250+), and so on with rapidly increasing limits. Hitting tier 4 takes weeks to months of consistent usage — plan ahead if you anticipate burst demand.

The tracker covers current rate limits across major providers in both consumer and API contexts: Anthropic Claude API (Sonnet, Haiku, Opus tiers; usage tier progression from $5 to $400+ thresholds), Anthropic Claude Pro/Max consumer ($20/mo Pro 5x usage cap; Max $100/mo with 5-20x usage), OpenAI GPT-5 / o3 API tiers (Tier 1-5+ progression), ChatGPT Plus ($20/mo), ChatGPT Pro ($200/mo), Google Gemini API (free tier substantial; paid tiers based on cumulative usage), Gemini Advanced consumer ($20/mo), DeepSeek API (very generous limits, cheap), Perplexity API and consumer Pro, xAI Grok API and Grok consumer access. Plus weekly / daily caps that some providers enforce on top of per-minute limits.

Strategies for managing rate limits: (1) Implement exponential backoff retry on 429 (rate limit exceeded) responses — most SDKs do this automatically; you just need to enable it. (2) Distribute load across multiple providers (use Sonnet for most, Haiku for batch, GPT-5 for specific tasks) — load-balancing reduces single- provider risk. (3) Use Batch APIs for non- time-sensitive work (50% discount + much higher rate limits, returns in 24h). (4) For consumer usage hitting limits early — Claude Pro 5x cap is shared across web/ desktop/mobile/API; if you're hitting it, consider Max ($100/mo) or API consumption directly. (5) For high-volume API workloads, contact provider sales for custom limits before hitting tier ceilings. (6) Monitor your headers — providers return X-RateLimit-Remaining and similar headers showing remaining quota; build dashboards or logs around these.

Embed this tool on your siteShow snippet

Paste this snippet into any page. Loads on-demand (lazy), no tracking scripts, and sized to most dashboards. Replace the height to fit your layout.

<iframe src="https://freetoolarena.com/embed/ai-rate-limit-tracker" width="100%" height="720" frameborder="0" loading="lazy" title="AI API Rate Limit Tracker" style="border:1px solid #e2e8f0;border-radius:12px;max-width:720px;"></iframe>

Embed docs →

How to use it

Filter by provider (Anthropic, OpenAI, Google, DeepSeek, Perplexity, xAI).
Read tier-by-tier RPM, TPM, daily quota limits.
Identify your current tier (typically based on cumulative paid usage).
Plan path to higher tiers if needed (consistent usage to hit spend thresholds).
For consumer plans, see weekly/daily message caps.

When to use this tool

Architecting an LLM-powered product — sizing infrastructure to expected traffic.
Hitting rate limits in production — confirming you understand the path to higher tiers.
Comparing consumer plans (Claude Pro vs Max, ChatGPT Plus vs Pro).
Choosing between providers for a specific workload (TPM matters for long-context jobs).
Quarterly capacity planning — reviewing whether current tier is sufficient for next quarter's growth.

When not to use it

Real-time current quota — providers return X-RateLimit headers in API responses; check those for actual remaining quota.
Custom enterprise contracts — those have negotiated limits beyond published tiers.
Rate limits for specific feature endpoints (search, embeddings, fine-tuning) — those have their own quotas not always covered.
Older deprecated models — limits may differ from current model lineup.

Common use cases

Verifying a number or output before passing it on
Quick use during a typical workday
Pre-decision sanity-check on inputs and outputs
Educational use — demonstrating the underlying concept

Frequently asked questions

What's RPM vs TPM?: RPM = Requests Per Minute (each API call counts as 1 request). TPM = Tokens Per Minute (input + output tokens summed across all requests). For chat use, RPM is usually the binding constraint (lots of small requests). For long-context work (summarizing books, processing transcripts), TPM is binding (few requests but many tokens each). Both apply simultaneously — you hit whichever limit first.
How do I move up tiers?: Most providers move you automatically based on cumulative paid spend over a rolling period (typically the past 30-90 days). Anthropic: Tier 1 ($5+ paid), Tier 2 ($50), Tier 3 ($100), Tier 4 ($400). OpenAI: Tier 1 ($5+ paid), through Tier 5 ($1000+). The progression typically takes weeks. For larger limits, contact sales — most providers offer enterprise tiers with negotiated limits.
Is Claude Pro enough?: Depends on usage. Claude Pro ($20/mo) has a 5x usage limit vs free; sufficient for most casual use but easily exhausted by heavy users (large documents, long conversations, multiple projects per day). Heavy users should consider Claude Max ($100/mo for 5-20x Pro's limit), or supplement with API access for specific workloads. Pro's “5x” is shared across all interfaces (web, desktop, mobile, Claude Code).
What about ChatGPT Plus vs Pro?: ChatGPT Plus ($20/mo): 80 messages per 3 hours on GPT-5; advanced features. ChatGPT Pro ($200/mo): 10x Plus message limit; access to o3-pro model; longer task limits. Most users find Plus sufficient. Pro is for power users hitting Plus limits, agentic workflows requiring extended runtime, or workflows specifically benefiting from o3-pro's deeper reasoning.
How do I handle 429 errors?: Implement exponential backoff: on 429 response, wait 1s and retry, then 2s, 4s, 8s, etc. Most SDKs (Anthropic Python, OpenAI Python) do this automatically — enable retries in client config. For burst traffic patterns, queue requests and rate-limit them on YOUR side to stay under provider limits. Don't rely on provider 429s as a flow-control mechanism; it adds latency.
Can I get higher limits than published tiers?: Yes — contact provider sales. Anthropic, OpenAI, Google, DeepSeek all have enterprise programs with negotiated limits beyond published tiers. Typically requires demonstrating sustained substantial usage (e.g., $5K+/month consistently) and a use case explanation. Custom limits often come with custom pricing, dedicated support, security reviews. For startups under $10K/month spend, just rely on the standard tier progression.

See how this compares

Head-to-headAnthropic API vs OpenAI APIAnthropic API vs OpenAI API head-to-head: pricing, rate limits, prompt caching, batch API, tool use, vision — and which to build on.

Learn more

Explore more ai & prompt tools tools

100% in-browserNo downloadsNo sign-upMalware-freeHow we keep this safe →