High Concurrency Backend Design - Current Limiting

The system will have an estimated capacity at the beginning of its design. If it exceeds the TPS/QPS threshold that the system can withstand for a long time, the system may be overwhelmed, and eventually the entire service will be insufficient. In order to avoid this situation, we need to limit the flow of interface requests.

The purpose of current throttling is to protect the system by limiting the rate of concurrent access requests or the number of requests within a time window. Once the rate limit is reached, service can be denied, queued or waited.

Common current limiting modes include concurrency control and rate control. One is to limit the number of concurrency, the other is to limit the rate of concurrent access, and the number of requests within a unit time window can also be limited.

Controlling the number of concurrency
is a common method of current limiting, which can be implemented through a semaphore mechanism (such as Semaphore in Java) in practical applications.

For example, we provide a service interface to the outside world, the maximum number of concurrency allowed is 10, and the code is implemented as follows:

public class DubboService {
 
    private final Semaphore permit = new Semaphore(10, true);
 
    public void process(){
 
        try{
            permit.acquire();
            //Business logic processing
 
        } catch (InterruptedException e) {
            e.printStackTrace ();
        } finally {
            permit.release();
        }
    }
}

In the code, although there are 30 threads executing, only 10 concurrent executions are allowed. Semaphore's constructor Semaphore(int permits) accepts an integer number representing the number of available permits. Semaphore(10) means that 10 threads are allowed to acquire licenses, that is, the maximum number of concurrent is 10. The usage of Semaphore is also very simple. First, the thread uses Semaphore's acquire() to obtain a license, and then calls release() to return the license after use. You can also use the tryAcquire() method to try to obtain a license.

Controlling the access rate
In our engineering practice, it is common to use the token bucket algorithm to implement this mode. Other algorithms such as leaky bucket algorithms can also control the rate, but they are not used much in our engineering practice, and will not be introduced here. , readers should understand by themselves.

On Wikipedia, the token bucket algorithm is described as follows:

every 1/r second, a token is added to the bucket.
At most b tokens are stored in the bucket. If the bucket is full, the newly placed tokens will be discarded.
When a packet of n bytes arrives, consume n tokens, then send the packet.
If there are less than n tokens available in the bucket, the packet will be buffered or dropped.
The token bucket controls the amount of data that passes through a time window. At the API level, what we often call QPS and TPS is exactly the amount of requests or transactions in a time window, but the time window is limited to 1s. Put a token into the bucket at a constant rate, and if the request needs to be processed, you need to get a token from the bucket first, and deny service when there is no token in the bucket. Another advantage of the token bucket is that the speed can be easily changed. Once the rate needs to be increased, the rate of the tokens put into the bucket can be increased as needed.

In our engineering practice, we usually use Ratelimiter in Guava to control the rate, for example, we don't want to submit more than 2 tasks per second:

// rate is two licenses per second
final RateLimiter rateLimiter = RateLimiter.create (2.0);
 
void submitTasks(List tasks, Executor executor) {
    for (Runnable task : tasks) {
        rateLimiter.acquire(); // maybe wait
        executor.execute(task);
    }
}

Controlling the number of requests per unit time window In
some scenarios, we want to limit the number of requests or calls per second/minute/day for an interface or service. For example, to limit the number of invocations per second of the service to 50, the implementation is as follows:

import com.google.common.cache.CacheBuilder;
import com.google.common.cache.CacheLoader;
import com.google.common.cache.LoadingCache;
 
import java.util.concurrent.ExecutionException;
import java.util.concurrent.TimeUnit;
import java.util.concurrent.atomic.AtomicLong;
 
    private LoadingCache<long, atomiclong=""> counter =
            CacheBuilder.newBuilder()
                    .expireAfterWrite(2, TimeUnit.SECONDS)
                    .build(new CacheLoader<long, atomiclong="">() {
                        @Override
                        public AtomicLong load(Long seconds) throws Exception {
                            return new AtomicLong(0);
                        }
                    });
 
    public static long permit = 50;
 
    public ResponseEntity getData() throws ExecutionException {
 
        // get the current second
        long currentSeconds = System.currentTimeMillis() / 1000;
        if(counter.get(currentSeconds).incrementAndGet() > permit) {
            return ResponseEntity.builder().code(404).msg("Access rate is too fast").build();
        }
        // business processing
 
    }

http://www.2cto.com/kf/201611/569222.html

High Concurrency Backend Design - Current Limiting

Guess you like