Simulate API rate limiting with token bucket, sliding window algorithms and quota management. Part of the DevTools Surf developer suite. Browse more tools in the API / Config collection.
Use Cases
Test client retry logic when limits are hit
Design rate limit policies before implementing them in an API gateway
Demonstrate leaky-bucket vs token-bucket behavior to team members
Calculate quota headroom for third-party API integration planning
Tips
Choose between token bucket and sliding window algorithms — token bucket allows short bursts while sliding window enforces a strict per-window limit
Set the burst capacity separately from the average rate to model real API behavior: most APIs allow short spikes above the steady-state limit
Use the quota management panel to simulate per-user, per-IP, and global limits simultaneously
Fun Facts
The token bucket algorithm was first described by computer scientist Jonathan Turner in 1986 for traffic shaping in ATM networks — it predates the web by a decade.
Twitter's API famously introduced rate limiting in 2009 during the Iranian election protests when API traffic spiked 1,000% and threatened stability — the resulting 150 requests/hour limit frustrated developers for years.
RFC 6585 (2012) added HTTP status code 429 'Too Many Requests' specifically for rate limiting, standardizing a practice that previously used 403 or 503 inconsistently across APIs.
FAQ
What is the difference between token bucket and sliding window?
Token bucket refills at a constant rate and allows bursting up to the bucket capacity. Sliding window counts requests in a rolling time window with no burst concept. Sliding window is stricter.
What HTTP status code should a rate-limited response use?
429 Too Many Requests (RFC 6585). Include a Retry-After header with the number of seconds until the client can retry. Avoid using 403 or 503 — they have different semantics.
How should clients handle rate limit responses?
Implement exponential backoff with jitter: wait a base interval, then double on each retry while adding random delay. This prevents the 'thundering herd' problem where all clients retry simultaneously.