[Microservice System Design] System Design Basis: Rate Limiter

What is a rate limiter?


Rate limiting refers to preventing the frequency of operations from exceeding a defined limit. In large systems, rate limiting is often used to protect underlying services and resources. Rate limiting is generally used in distributed systems as a defense mechanism to enable shared resources to remain available.

Rate limiting protects your API from accidental or malicious overuse by limiting the number of requests that can reach your API in a given period of time. Without rate limiting, any user can bombard your server with requests, starving other users of the spike.

2c20edf816a1454ae8e8f9943d1070b2.png

Rate limiting at work

Why speed limit?

  • Preventing resource starvation: The most common reason for rate limiting is to improve the availability of API-based services by avoiding resource starvation. If rate limiting is applied, load-based denial-of-service (DoS) attacks can be prevented. Even if one user bombards the API with lots of requests, other users won't starve.

  • Security: Rate limiting prevents security-intensive features such as brute-forcing logins, promo codes, and more. The number of requests for these features is limited at the user level, so brute force algorithms do not work in these scenarios.

  • Protect against operational costs: With pay-per-use models that automatically scale resources, rate limiting helps control operational costs by placing a virtual cap on resource scaling. If rate limiting is not employed, resources can scale disproportionately, resulting in an exponential bill.


Rate Limiting Policy


Rate limiting can be applied to the following parameters:

  • User: Limits the number of requests a user is allowed in a given time period. User-based rate limiting is one of the most common and intuitive forms of rate limiting.

a5e1650d665ab4dfa09d2c5ea33dc528.png

  • 2. Concurrency: This limits the number of parallel sessions a user can allow in a given time frame. Limiting the number of parallel connections also helps mitigate DDOS attacks.

  • 3. Location/ID: This helps to run location-based or demographic-focused campaigns. Requests not from target demographics can be throttled to increase availability in target areas

  • 4. Servers: Server-based rate limiting is a niche strategy. This is usually used when a specific server needs most of the requests, i.e. the server is strongly coupled to a specific function

Rate Limiting Algorithm

Leaky bucket:

The leaky bucket is a simple and intuitive algorithm. It creates a queue with limited capacity. All requests that exceed the queue's capacity within a given time frame overflow.

The advantage of this algorithm is that it smooths out bursts of requests and processes them at a constant rate. It's also easy to implement on a load balancer and is memory efficient for each user. Maintain a constant near-uniform flow to the server regardless of the number of requests.

0128274bdfc47e99a8e1d10f2abafa74.jpeg

Leaky Bucket

The disadvantage of this algorithm is that bursts of requests may fill up the bucket, resulting in a starvation of new requests. It also does not guarantee that the request will be completed within a given amount of time.


2. Token Bucket:

A token bucket is similar to a leaky bucket. Here we assign tokens at the user level. For a given duration d, defines the number of request r packets that a user can receive. Every time a new request hits the server, two actions take place:

  • Get Tokens: Get the current number of tokens for this user. If it is larger than the defined limit, the request is dropped.

  • Renew Token: If the acquired token is less than the limit of duration d, accept the request and append the token.

This algorithm is memory efficient as we save less amount of data per user for our application. The problem here is that it can cause race conditions in a distributed environment. This happens when two requests from two different application servers try to get a token at the same time.

f2b59f5d8523c03a7cbba01bbc56a11d.png

Token Bucket Algorithm

3. Fixed window counter:

Fixed windows are one of the most basic rate limiting mechanisms. We keep a counter for a given amount of time and keep incrementing it for each request we receive. Once the limit is reached, we drop all further requests until the duration is reset.
The advantage here is that it ensures that recent requests are served and not starved by older ones. However, a single burst of traffic at the edge of throttling may hoard all available slots for the current and next slots. Consumers may bombard servers at the edge in an attempt to maximize the number of requests served.

85ae7096d924ab3b59b67b197e3a4656.png

Fixed Window Counter

4. Swipe log:

The sliding log algorithm involves maintaining a time-stamped log of requests at the user level. The system sorts these request times in a collection or a table. It drops all requests with timestamps above the threshold. Every minute we are looking for old requests and filtering them out. We then sum the logs to determine the request rate. If the request will exceed the threshold rate, it is reserved, otherwise it is served.
The advantage of this algorithm is that it is not affected by the boundary conditions of the fixed window. Enforcement of rate limiting will remain precise. Since the system keeps track of each consumer's swipe log, there is no stampede effect to challenge fixed windows.
However, it can be expensive to store an unlimited number of logs for each request. Computation is also expensive, as each request requires computing the sum of the consumer's previous requests, possibly across a cluster of servers. As such, it doesn't scale well to handle high volumes of traffic or denial-of-service attacks.


5、Sliding Window:

This is similar to the Sliding Log algorithm, but memory efficient. It combines the low processing cost of fixed-window algorithms with sliding logarithmically improved boundary conditions.
We keep a list/table of entries sorted by time, each entry is mixed with timestamp and number of requests at that time. We keep a sliding window of duration and only serve requests at a given rate within our window. If the sum of the counters is greater than the limiter's given rate, then we only take the first entry sum equal to the rate limit.
The sliding window approach is the best approach as it offers the flexibility to extend the rate limit with good performance. Rate windows are an intuitive way to present rate limiting data to API consumers. It also avoids the starvation problem of leaky buckets and the bursting problem of fixed window implementations


Rate Limiting in Distributed Systems


The above algorithm works well for single server applications. But when the distributed system involves multiple nodes or application servers, the problem becomes very complicated. The problem is compounded if there are multiple rate-limited services spread across different server regions. Two broad problems encountered in these situations are inconsistencies and race conditions.


inconsistent


For complex systems with multiple application servers distributed in different regions with their own rate limiters, we need to define a global rate limiter.
If a consumer receives a large number of requests in a short period of time, it may individually exceed the global rate limiter. The higher the number of nodes, the more likely a user will exceed the global limit.
There are two ways to solve these problems:

  • Sticky Session: Set up a sticky session in your load balancer so that each consumer is sent to exactly one node. Drawbacks include lack of fault tolerance and scaling issues when nodes are overloaded. You can read more about sticky sessions here

  • Centralized data store: Use a centralized data store like Redis or Cassandra to handle counts per window and consumer. The added latency is a problem, but the flexibility provided makes it an elegant solution.

race condition


Race conditions occur with highly concurrent get-then-set methods. Each request gets the value of counter and tries to increment it. But when the write operation is done, several other requests have already read the value of the counter (which is not correct). Therefore, more requests than expected are sent. This can be mitigated by using locks on read and write operations, making them atomic. But this comes at the cost of performance as it becomes a bottleneck causing more latency.

throttling


Throttling is the process of controlling a customer's use of an API for a given period of time. Limits can be defined at the application level and/or API level. When the throttle limit is exceeded, the server returns an HTTP status of "429 - Too Many Requests".
Throttling type:

  • Hard Throttling: The number of API requests cannot exceed the limit.

  • Soft Throttling: In this type, we can set the API request throttling beyond a certain percentage. For example, if our rate limit is 100 messages per minute and 10% exceeds the limit, then our rate limiter will allow up to 110 messages per minute.

  • Elastic or dynamic limit: In elastic limit, the number of requests may exceed a threshold if the system has some resources available. For example, if a user is only allowed to send 100 messages per minute, we can let that user send more than 100 messages per minute when there are resources available in the system.


thanks for reading!

This article : https://architect.pub/system-design-basics-rate-limiter
Discussion: Knowledge Planet [Chief Architect Circle] or add WeChat trumpet [ca_cto] or add QQ group [792862318]
No public
 
[jiagoushipro]
[Super Architect]
Wonderful graphic and detailed explanation of architecture methodology, architecture practice, technical principles, and technical trends.
We are waiting for you, please scan and pay attention.
WeChat trumpet
 
[ca_cea]
Community of 50,000 people, discussing: enterprise architecture, cloud computing, big data, data science, Internet of Things, artificial intelligence, security, full-stack development, DevOps, digitalization.
 

QQ group
 
[285069459] In-depth exchange of enterprise architecture, business architecture, application architecture, data architecture, technical architecture, integration architecture, security architecture. And various emerging technologies such as big data, cloud computing, Internet of Things, artificial intelligence, etc.
Join the QQ group to share valuable reports and dry goods.

video number [Super Architect]
Quickly understand the basic concepts, models, methods, and experiences related to architecture in 1 minute.
1 minute a day, the structure is familiar.

knowledge planet [Chief Architect Circle] Ask big names, get in touch with them, or get private information sharing.  

Himalayas [Super Architect] Learn about the latest black technology information and architecture experience on the road or in the car. [Intelligent moments, Mr. Architecture will talk to you about black technology]
knowledge planet Meet more friends, workplace and technical chat. Knowledge Planet【Workplace and Technology】
LinkedIn Harry https://www.linkedin.com/in/architect-harry/
LinkedIn group LinkedIn Architecture Group https://www.linkedin.com/groups/14209750/
Weibo‍‍ 【Super Architect】 smart moment‍
Bilibili 【Super Architect】

Tik Tok 【cea_cio】Super Architect

quick worker 【cea_cio_cto】Super Architect

little red book [cea_csa_cto] Super Architect  

website CIO (Chief Information Officer) https://cio.ceo
website CIOs, CTOs and CDOs https://cioctocdo.com
website Architect practical sharing https://architect.pub   
website Programmer cloud development sharing https://pgmr.cloud
website Chief Architect Community https://jiagoushi.pro
website Application development and development platform https://apaas.dev
website Development Information Network https://xinxi.dev
website super architect https://jiagou.dev
website Enterprise technical training https://peixun.dev
website Programmer's Book https://pgmr.pub    
website developer chat https://blog.developer.chat
website CPO Collection https://cpo.work
website chief security officer https://cso.pub    ‍
website CIO cool https://cio.cool
website CDO information https://cdo.fyi
website CXO information https://cxo.pub

Thank you for your attention, forwarding, likes and watching.

Guess you like

Origin blog.csdn.net/jiagoushipro/article/details/131887197