"Spring Cloud Alibaba Microservice Architecture" Topic (13)-Sentinel Service Downgrading of Spring Cloud Alibaba

1 Introduction

In addition to flow control, fusing and downgrading unstable resources in the calling link is also one of the important measures to ensure high availability. A service often calls other modules, which may be another remote service, database, or third-party API. For example, when paying, you may need to call the API provided by UnionPay remotely; to query the price of a product, you may need to perform a database query. However, the stability of this dependent service cannot be guaranteed. If the dependent service is unstable and the response time of the request becomes longer, the response time of the method calling the service will also become longer, and threads will accumulate, which may eventually exhaust the thread pool of the business itself, and the service itself will also change. It's not available.
Insert picture description here
Modern microservice architectures are distributed and consist of many services. Different services call each other to form a complex call link. The above problems will have an enlarged effect in the link call. If a certain ring on a complex link is unstable, it may cascade layer by layer, eventually resulting in the entire link being unavailable.

Therefore, we need to fuse downgrade unstable weakly dependent service calls, temporarily cut off unstable calls, and avoid local instability factors leading to an overall avalanche. As a means of protecting itself, fuse degradation is usually configured on the client (caller).

2. Fuse strategy

  • Slow call ratio (SLOW_REQUEST_RATIO) : Select the slow call ratio as the threshold. You need to set the allowable slow call RT(that is, the maximum response time). If the response time of the request is greater than this value, it will be counted as a slow call. When the number of requests per unit of statistical time (statIntervalMs) is greater than the set minimum number of requests, and the proportion of slow calls is greater than the threshold, the requests will be automatically blown during the next fusing time. After the fusing time, the fuse will enter the detection recovery state (HALF-OPEN state). If the response time of the next request is less than the set slow call RT, it will end the fusing, if it is greater than the set slow call RT, it will be blown again.
  • ERROR_RATIO : When the number of requests in the unit statistical time (statIntervalMs) is greater than the set minimum number of requests, and the abnormal ratio is greater than the threshold, the requests will be automatically blown during the next fusing time. After the fusing time, the fuse will enter the detection recovery state (HALF-OPEN state). If the next request is successfully completed (without error), it will end the fusing, otherwise it will be blown again. The threshold range of the abnormal rate is [0.0, 1.0], which represents 0%-100%.
  • The number of exceptions (ERROR_COUNT) : When the number of exceptions within the unit statistical time exceeds the threshold, it will automatically be fused. After the fusing time, the fuse will enter the detection recovery state (HALF-OPEN state). If the next request is successfully completed (without error), it will end the fusing, otherwise it will be blown again.

Exception degradation is only for business exceptions, and the exception (BlockException) of Sentinel's current limiting degradation itself does not take effect.
Note: The above is for Sentinel 1.8.0 and above. The 1.8.0 version has a brand new improvement and upgrade to the fuse downgrade feature. Since we are using version 1.7.0, there may be a little difference. For example, the minimum number of requests for the RT policy of version 1.7.0 does not support setting, and the default is 5. Each request, etc., can be set in the new version, which is more flexible.

Sentinel fuse downgrade will limit the invocation of this resource when a certain resource in the call link is in an unstable state (for example, the call timeout or the abnormal ratio increases), so that the request fails quickly and avoids affecting other resources and causing the level. Link error.

When a resource is downgraded, within the next downgrade time window, calls to this resource will automatically be fuse (the default behavior is to throw DegradeException).

3. RT of fuse downgrade strategy

[A] The business layer adds the following methods

package com.bruce.controller;

import com.bruce.service.SentinelService;
import lombok.extern.slf4j.Slf4j;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.web.bind.annotation.GetMapping;
import org.springframework.web.bind.annotation.RestController;

import java.util.concurrent.TimeUnit;

@RestController
@Slf4j
public class SentinelController1 {
    
    

    @GetMapping("/testRt")
    public String testRt() {
    
    
        System.out.println("请求....");
        try {
    
    
            TimeUnit.SECONDS.sleep(1);
        } catch (InterruptedException e) {
    
    
            e.printStackTrace();
        }
        return "testRt..";
    }
}

[B] Added Sentinel downgrade rules, the specific configuration is as follows: The
Insert picture description here
above configuration indicates: The resource /testRt is processed within 200 milliseconds, indicating that there is no problem, otherwise, if the processing is not completed, the next time window (here is 1 Within seconds) the resource is not allowed to be accessed and will be restricted by Sentinel.

[C] Jmeter stress test
Insert picture description here
Insert picture description here
When we click on start in Jmeter, 100 /testRt requests will be sent out within one second. Obviously, it has not reached 200 milliseconds to process one request. At this time, our browser visits: http: //localhost:8401/testRt
Insert picture description here
can be seen. At this time, the circuit breaker has been opened and the microservice is not available to the outside world. When we stop Jmeter, visit again: http://localhost:8401/testRt
Insert picture description here
can see that the circuit breaker switch is off , The fuse returns to normal, and our microservices also resume normal access.

[D] Summary

10 threads (greater than the 5 requests required by the official website) call /testRt in one second forever. We hope that the request will be processed in 200 milliseconds. If the task is not processed in more than 200 milliseconds, it will be in the next one second time window. Inside, the circuit breaker was opened (fuse tripped), the microservice was unavailable, and the fuse tripped and cut out.

Later we stop Jmeter, there is not such a large amount of visits, the circuit breaker is closed (the fuse is restored), and the micro service returns to normal.

4. Abnormal proportion of fusing downgrade strategy

[A] The business layer adds the following methods

@GetMapping("/testExceptionRate")
 public String testExceptionRate() {
    
    
     String string = null;
     logger.info(string.toString());
     return "testExceptionRate..";
 }

[B] Added Sentinel downgrade rules, the specific configuration is as shown in the figure below: The
Insert picture description here
above configuration indicates:

[C] Jmeter stress test
Insert picture description here
When we start the thread group to run, here we modify it to send ten requests per second:
Insert picture description here
our browser visit: http://localhost:8401/testExceptionRate
Insert picture description here
can see that when we keep requesting / In the case of testExceptionRate resource, because we will throw an exception every time we run in the background, the exception rate is 100%, which must be greater than the 0.2 (20%) we configured, so the circuit breaker is open at this time and the interface is unavailable.

But when we stop the Jmeter pressure test, we continue to visit many times: http://localhost:8401/testExceptionRate
Insert picture description here
, we can see that when we do not send more than 5 requests within one second, no matter what the abnormal rate is, the circuit breaker Will not automatically open, but directly throw an exception message.

[D] Summary

If the circuit breaker downgrade strategy is set to abnormal ratio, if the number of requests in one second is greater than or equal to 5, and the proportion of business abnormalities is greater than the threshold we set, then the circuit breaker will be opened during the time window we set ,service is not available.

5. The number of abnormalities in the fuse downgrade strategy

[A] The business layer adds the following test methods

 @GetMapping("/testExceptionCount")
 public String testExceptionCount() {
    
    
     int i = 10 / 0;
     return "testExceptionRate..";
 }

[B] Sentinel policy configuration, the specific configuration is as follows:
Insert picture description here
http://localhost:8401/testExceptionCount , the first visit is absolutely error, because the divisor must not be zero, we see the page directly display the error message, but up to 5 errors Later, the service is degraded after entering the fuse, the circuit breaker is opened, and the microservice is unavailable.

[C] Test

We visit five times in the browser: http://localhost:8401/testExceptionCount , we can see that the exception is thrown directly.
Insert picture description here
At this time, the circuit breaker is not turned on, but when we visit http://localhost:8401/testExceptionCount for the sixth time , because we violated the upper limit of the number of exceptions we configured to 5, the sixth visit started, and the circuit was broken at this time The device opens automatically, and the interface is unavailable, as shown in the figure below:
Insert picture description here
【d】Small summary

If the fuse downgrade strategy is set to the number of exceptions, then when the number of program exceptions reaches the threshold we set, the circuit breaker is automatically turned on and the microservices are unavailable. Note that the number of exceptions is on the minute level.

Guess you like

Origin blog.csdn.net/BruceLiu_code/article/details/113886830