Backend System Design of a Rate Limiter
Goal of the System
Control how often a user or client can hit your API to prevent abuse, reduce server load, and ensure fairness. Example: “Allow max 100 requests per user every 10 minutes.”
Core Backend Workflow
a. Client Makes API Request
- Every request carries an identifier (IP address, user ID, or API key).
- The server needs to check: “Has this client already used up their allowed quota?“
b. Rate Limit Check
- The backend checks a store (usually Redis or in-memory) to see:
- How many requests the client has made.
- When their limit resets.
c. Decision
- If within limit: allow the request.
- If limit exceeded: reject with HTTP 429 (Too Many Requests). Include retry info in the headers.
d. Update Usage Counter
- After allowing the request, update the request count and timestamp.
Common Rate Limiting Strategies
Strategy | How It Works | Use Case |
---|---|---|
Fixed Window | Counter resets every fixed time block (e.g., 1 min) | Simple APIs with consistent traffic |
Sliding Window | Spread count over rolling period (e.g., last 60s from now) | Smooth traffic handling |
Token Bucket | Tokens refill over time, each request consumes 1 | Allows burst + sustained rate |
Leaky Bucket | Queue + fixed processing rate. Excess overflows | Queue-based control (rare in APIs) |
Data Storage Design (using Redis)
We use Redis because:
- Super fast
- Supports expiry (TTL)
- Ideal for per-user rate tracking
Example Redis schema:
Key: rate_limit:{user_id}
Value: {
count: 27,
reset_at: 1680001234 (UNIX timestamp)
}
Expiry: auto-set based on window (e.g., 60s)
Redis Clustering or Sharding
- When traffic spikes (e.g., hundreds of thousands of active users), a single Redis node can become overwhelmed.
- Solution: Redis Cluster, which splits keys across multiple nodes.
- Keys are distributed by a hash function, so no one node becomes the bottleneck.
Prevent Hot Keys (Fair Key Distribution)
- If many users share a similar prefix (
rate_limit:premiumUser1
,rate_limit:premiumUser2
), all these keys might end up on the same Redis shard. - Use key design tricks (e.g., hashing user IDs or adding region info) to distribute load evenly.
Local + Global Limiting (Hybrid Strategy)
- To reduce Redis hits:
- Use local (in-memory) checks for common paths (like homepage requests).
- Do periodic sync with Redis to stay consistent globally.
Example: Allow 5 requests locally, then sync to Redis.
Implementation Considerations
HTTP Headers for Rate Limiting
Including these headers in API responses helps clients manage their usage:
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 73
X-RateLimit-Reset: 1680001234
Graceful Degradation
When rate limiting service fails:
- Fail open (allow requests) for critical APIs
- Fail closed (block requests) for non-critical APIs
Monitoring & Alerting
- Track users approaching limits
- Alert on unusual traffic patterns
- Monitor Redis performance
Rate Limiting in Distributed Systems
Rate limiting becomes more complex in distributed environments where multiple API gateway instances handle incoming traffic. Solutions include:
- Centralized Counter: All API gateways use the same Redis cluster
- Distributed Rate Limiting: Each gateway gets a portion of the limit
- Two-tier Rate Limiting: Coarse-grained at local level, fine-grained at global level
Conclusion
A well-designed rate limiter protects your backend services from abuse while ensuring fair resource allocation. The strategy you choose depends on your specific use case, traffic patterns, and scalability requirements. Redis provides an excellent foundation for implementing rate limiting at scale, but careful consideration of distributed systems challenges is necessary for production deployments.