Question 1

What is the difference between token bucket and leaky bucket rate limiting?

Accepted Answer

Token bucket allows bursting: a full bucket lets you send multiple requests instantly until tokens run out. Leaky bucket (also called leaky bucket as a meter) smooths traffic to a constant output rate — requests are queued and released at a fixed pace regardless of bucket fill level. Token bucket is more common for API rate limits because it rewards clients that stay under their quota with burst headroom.

Question 2

How do I find my API's bucket capacity and refill rate?

Accepted Answer

Check the X-RateLimit-Limit and X-RateLimit-Remaining headers in API responses. Many APIs also document their rate limit algorithm in their docs. AWS API Gateway publishes default burst limits (5,000 for standard tier) and steady-state RPS separately. For undocumented APIs, probe with requests and observe the 429 Retry-After header value — it encodes the wait time this calculator predicts.

Question 3

How do I handle 429 Too Many Requests responses in client code?

Accepted Answer

Read the Retry-After header — it tells you exactly how many seconds to wait (the wait time this calculator computes). If absent, use exponential backoff starting at 1 second and capping at 60 seconds. Never retry immediately on a 429 — doing so depletes any remaining tokens and delays recovery. Use the Retry Backoff Calculator to model your backoff strategy alongside the token refill curve.

Question 4

What request cost should I assign to different endpoint types?

Accepted Answer

A common scheme: simple reads = 1 token, writes = 2 tokens, list/search = 5 tokens, batch/export = 10–50 tokens. The goal is to reflect actual compute cost. Stripe uses 1 token per read and 2 per write. AWS API Gateway uses 1 token per request but lets you configure burst and steady-state limits separately. Calibrate based on P95 latency ratios between endpoint types.

Question 5

Can I use this calculator for Redis-based rate limiters?

Accepted Answer

Yes. A Redis token bucket stores the current token count and last refill timestamp in a key. On each request, it computes elapsed time × refill rate to add tokens (capped at capacity), then checks if cost can be deducted. The math is identical to this calculator. Use the output wait time as the TTL for a retry key or as the delay argument in a job queue. See also the API Rate Limit Calculator for request-window-based limiters.

Token Bucket Rate Limit Calculator

How to Calculate Token Bucket Rate Limits

Formula

Example Token Bucket Rate Limit Calculations

Example 1 — Standard REST API (10 req/s sustained, 50 burst)

Example 2 — GitHub-style API (60 req/min = 1 req/s, burst 10)

Example 3 — Weighted endpoint (search = 5 tokens, read = 1 token)

Tips for Tuning Token Bucket Rate Limits

Notes

Frequently Asked Questions