What is the difference between token bucket and sliding window?

Token bucket refills at a constant rate and allows bursting up to the bucket capacity. Sliding window counts requests in a rolling time window with no burst concept. Sliding window is stricter.

What HTTP status code should a rate-limited response use?

429 Too Many Requests (RFC 6585). Include a Retry-After header with the number of seconds until the client can retry. Avoid using 403 or 503 — they have different semantics.

How should clients handle rate limit responses?

Implement exponential backoff with jitter: wait a base interval, then double on each retry while adding random delay. This prevents the 'thundering herd' problem where all clients retry simultaneously.

API Rate Limiter | DevTools Surf

DevTools Surf

About API Rate Limiter

API Rate Limiter preview - API / Config tool

Simulate API rate limiting with token bucket, sliding window algorithms and quota management. Part of the DevTools Surf developer suite. Browse more tools in the API / Config collection.

Use Cases

Test client retry logic when limits are hit
Design rate limit policies before implementing them in an API gateway
Demonstrate leaky-bucket vs token-bucket behavior to team members
Calculate quota headroom for third-party API integration planning

Tips

Choose between token bucket and sliding window algorithms — token bucket allows short bursts while sliding window enforces a strict per-window limit
Set the burst capacity separately from the average rate to model real API behavior: most APIs allow short spikes above the steady-state limit
Use the quota management panel to simulate per-user, per-IP, and global limits simultaneously

Fun Facts

The token bucket algorithm was first described by computer scientist Jonathan Turner in 1986 for traffic shaping in ATM networks — it predates the web by a decade.
Twitter's API famously introduced rate limiting in 2009 during the Iranian election protests when API traffic spiked 1,000% and threatened stability — the resulting 150 requests/hour limit frustrated developers for years.
RFC 6585 (2012) added HTTP status code 429 'Too Many Requests' specifically for rate limiting, standardizing a practice that previously used 403 or 503 inconsistently across APIs.

FAQ

What is the difference between token bucket and sliding window?: Token bucket refills at a constant rate and allows bursting up to the bucket capacity. Sliding window counts requests in a rolling time window with no burst concept. Sliding window is stricter.
What HTTP status code should a rate-limited response use?: 429 Too Many Requests (RFC 6585). Include a Retry-After header with the number of seconds until the client can retry. Avoid using 403 or 503 — they have different semantics.
How should clients handle rate limit responses?: Implement exponential backoff with jitter: wait a base interval, then double on each retry while adding random delay. This prevents the 'thundering herd' problem where all clients retry simultaneously.

Related API / Config Tools

REST Handler OpenAPI Viewer Swagger to Collection JSON package.json Analyzer Dockerfile Linter Kubernetes Manifest Validator Mock Server Config Generator API Request Builder