Goal

Prevent resource starvation caused by Denial of Service (DoS) attack.
Reduce cost.
Prevent servers from being overloaded.

Design

Step 1 – Understand the problem and establish design scope

Typical questions: Client-side/Server-side? IP/ID/… based? Scale? Distributed? Separate service/Application code? Info users?
Typical requirements:
- Accurate.
- Low latency.
- Distributed. Shared across multiple servers or processes.
- Exception handling. Show users exceptions that their requests are throttled.
- High fault tolerance. Doesn’t impact system if down.

Step 2 – Propose high-level design and get buy-in

Where to put rate limiter?

Client-side: Not reliable.
Server-side: Doable.
Middleware: Recommended. (Return status code 429 then throttled.)
- API getaway (cloud micro-service): Fully managed service supports rate limiting, SSL termination, authentication, IP whitelisting, servicing static content, etc.
Server-side vs Middleware:
- (Pro-middleware) Evaluate current tech stack and make sure your current programming language is efficient to implement rate limiting.
- (Pro-server) Identify the rate limiting algorithm fits business needs. Implement everything on server-side has full control.
- (Pro-middleware) If already have mirco-service w/ API getaway, add rate limiter there is easy.
- (Pro-middleware) Build own rate limiting service take time.

Algorithm for rate limiting

Token bucket algorithm

Algorithm
- (Amazon, Stripe)
- A token bucket is a container with pre-defined capacity filled with pre-defined rate periodically.
- When token is full, no more tokens are added.
- Each request consumes one token.
- When token is empty, request is dropped.
Parameters
- Bucket size: Maximum number of tokens allowed.
- Refill rate: number of tokens put in bucket each second.
Requires different buckets for different rules (APIs/IPs/global/etc.).
Pros
- Easy.
- Memory efficient.
- Allows a burst of traffic for short period.
Cons
- Challenging to tune parameters.

Leaking bucket algorithm

Algorithm
- (Shopify)
- FIFO.
- When request arrives, system checks if queue if full.
  - If not full, request added to queue.
  - Otherwise, request dropped.
- Requests are pulled from the queue and processed at regular intervals.
Parameters
- Bucket size: Queue size.
- Outflow rate: How many requests to be process in seconds.
Pros
- Memory efficient given the limited queue size.
- Requests are processed at fixed rate and suitable for cases that stable outflow is needed.
Cons:
- A burst of traffic fills old requests and blocks new requests.
- Challenging to tune parameters.

Fixed window counter algorithm

Algorithm
- Divides timeline into fix-sized time windows and assign a counter for each window.
- Each request increments the counter by one. Got dropped if the counter reaches predefined threshold.
Pros
- Easy.
- Memory efficient.
- Resetting available quota at the end of unit time window fits certain use cases.
Cons
- Spike in traffic at edges of window could cause more requests than allowed quota to go through.

Sliding window log algorithm

Algorithm
- Keep track of request timestamps. Usually in cache.
- When new request in, remove all outdated timestamps.
- Add timestamp of new request to log.
- Accept if log size is smaller than threshold. Reject otherwise.
Pros:
- Accurate.
- Consumes lots of memory, even if a request if rejected, its time still stored in memory.

Sliding window counter algorithm

Algorithm:
- Use Request in current window + request in previous window * overlap percentage of rolling window and previous window.
Pros
- Smooths outs spikes in traffic because rate is based on avg rate of prev window.
- Memory efficient.
Cons
- Approximation.

High-level architecture

Use cache to store counters. Redis:

INCR: Increases the stored counter by 1.
EXPIRE: Sets a timeout for the counter with auto delete. User flow: Client -> Rate Limiter Middleware -> API Server Redis

Step 3 – Design deep dive

Rate limiting rules

Rules are generally written in configuration files and saved on disk.

Exceeding the rate limit

API returns a HTTP response code 429. Might enqueue the rate-limited requests to be processed later (e.g. orders).

Rate limiter headers

HTTP headers:

X-Ratelimit-Remaining: Remaining number of allowed requests in window.
X-Ratelimit-Limit: Total allowed requests in window.
X-Ratelimit-Retry-After: Number of seconds to wait until without being throttled. (When dropped. Send with 429 error code).

Detailed design

Rules stored on disk. Workers pull rules from disk and stores in cahce.
Client send request to rate limiter middleware first.
Rate limiter middleware loads rules from cache, counters and timestamps from Redis cache.
- If request is not limited, forward to API servers.
- If request is limited, return 429 to client. Drop the request or forward to queue.

Rate limiter in a distributed environment

Race condition
- Lock. Slows down.
- Lua script. ???
- Sorted sets data structure in Redis.
Synchronization issue.
- Use sticky session. (Not recommended)
- Use centralized data storage. Redis.
Performance optimization.
- Multi-data center. Route request to closest edge server to reduce latency.
- Synchronize data with an eventual consistency model.

Monitoring

Gather analytics data to check algorithms effective and rules are effective.

Requests are dropped unexpectedly?
Ineffective with sudden increase in traffic?

Step 4 – Wrap up

Algorithms
Architecture
Hard vs soft limit.
Rate limit at different level. This chapter talks about application level (layer 7). But can also do other levels, like Iptables (layer 3). OSI model
Avoids being limited.
- Client cache.
- Understand the limit and don’t send out too many requests.
- Include code to catch exceptions and recover.
- Add sufficient back off time to retry logic.

levendlee

Chapter 4: Design a Rate Limiter

Goal

Design

Step 1 – Understand the problem and establish design scope

Step 2 – Propose high-level design and get buy-in

Where to put rate limiter?

Algorithm for rate limiting

Token bucket algorithm

Leaking bucket algorithm

Fixed window counter algorithm

Sliding window log algorithm

Sliding window counter algorithm

High-level architecture

Step 3 – Design deep dive

Rate limiting rules

Exceeding the rate limit

Rate limiter headers

Detailed design

Rate limiter in a distributed environment

Monitoring

Step 4 – Wrap up

Leave a comment Cancel reply

Chapter 4: Design a Rate Limiter

Goal

Design

Step 1 – Understand the problem and establish design scope

Step 2 – Propose high-level design and get buy-in

Where to put rate limiter?

Algorithm for rate limiting

Token bucket algorithm

Leaking bucket algorithm

Fixed window counter algorithm

Sliding window log algorithm

Sliding window counter algorithm

High-level architecture

Step 3 – Design deep dive

Rate limiting rules

Exceeding the rate limit

Rate limiter headers

Detailed design

Rate limiter in a distributed environment

Monitoring

Step 4 – Wrap up

Share this:

Leave a comment Cancel reply