SpringCloud Alibaba - Integrate Sentinel based on FeignClient to achieve "thread isolation" and "circuit breaker degradation"

Table of contents

1. FeignClient integrates Sentinel

1.1. Reasons for integration

1.2. Implementation steps

1.2.1. Modify the application.yml file in OrderService

1.2.2. Write downgrade logic after failure for FeignClient

2. Thread isolation

2.1. Two methods of thread isolation

2.1.1. Thread pool isolation

2.1.2. Semaphore isolation (Sentinel’s default method)

2.2. Implement thread isolation (bulkhead mode)

a) Add flow control rules

b) Use JMeter for testing

c) Analysis results

3. Circuit breaker downgrade

3.1. What is circuit breaker degradation?

3.2. Circuit breaker strategy - slow call

a) Add downgrade rules for remote calls on Sentinel

b) Continuously refresh in the browser and analyze the results

3.3. Circuit breaker strategy - abnormal proportion and number of exceptions

a) Add downgrade rules for remote calls on Sentinel

b) Continuously refresh in the browser and analyze the results


1. FeignClient integrates Sentinel


1.1. Reasons for integration

Although the current limiting we learned earlier can avoid service failures caused by high concurrency, the service may still fail due to other reasons. If you want to control these failures and avoid avalanches, you need to rely on thread isolation and circuit breaker degradation.

However, whether it is thread isolation or circuit breaker downgrade, it protects the client (caller) and prevents the caller of the service from being brought down by the faulty service. Therefore, we need to isolate and isolate when the microservice initiates a remote call . Downgrade, that is to say, integrate Sentinel through Feign to isolate and downgrade.

1.2. Implementation steps

1.2.1. Modify the application.yml file in OrderService

Enable Feign's Sentinel function in application.yml.

feign:
  sentinel:
    enabled: true # 开启Feign的Sentinel功能

1.2.2. Write downgrade logic after failure for FeignClient

Here are two ways to implement downgrade logic.

  1. FallbackClass: Unable to handle remote call exceptions.
  2. FallbackFactory: Can handle exceptions in remote calls.

Here we choose the second method, the specific implementation is as follows:

a) Customize the class UserClientFallbackFactory  in the feign-api project and implement the FallbackFactory interface

@Slf4j
public class UserClientFallbackFactory implements FallbackFactory<UserClient> {
    @Override
    public UserClient create(Throwable throwable) {
        return new UserClient() {
            @Override
            public User findById(Long id) {
                //记录异常信息
                log.error("查询用户失败!");
                //根据业务需求返回数据,这里返回一个空对象
                return new User();
            }
        };
    }
}

b) In the configuration class of the feign-api project, register  UserClientFallbackFactory as a Bean

    @Bean
    public UserClientFallbackFactory userClientFallbackFactory() {
        return new UserClientFallbackFactory();
    }

c) Use UserClientFallbackFactor in the UserClient interface (feign remote call interface) in the feign-api project.

@FeignClient(value = "userservice", fallbackFactory = UserClientFallbackFactory.class)
public interface UserClient {

    @GetMapping("/user/{id}")
    User findById(@PathVariable("id") Long id);

}

2. Thread isolation


2.1. Two methods of thread isolation

2.1.1. Thread pool isolation

Thread isolation is to divide each business into an independent thread pool to achieve isolation.

For example, I now have three services a, b, and c, and a depends on b to form a business, and a depends on c to form a business, then thread pool isolation will create a thread pool for the services that each business depends on. In other words, two thread pools are created here, one for b and one for c. When a request arrives at a, the thread of the request itself will not be used, but the threads will be fetched from these two pools respectively. At this time, this thread can call Feign's client and initiate a remote call. 

This isolates the two services. If service b fails, the most it will do is use up the threads in its pool. If there are new requests, you still want to access this service, but the pool is full. , can he still come in? In this way, the resources in service a will not be exhausted.

advantage

1. Support active timeout: The thread pool mode will allocate an independent thread for each remote call, which means that it can be controlled through the thread pool. If a request is found to take a long time, the thread can be terminated immediately.

2. Support asynchronous calls: Each call is an independent thread allocated by the thread pool, not the thread that originally processed tomcat requests, and different services have different thread pools, so the requests processed for a certain service can be At the same time, remote calls are processed for other services.

shortcoming

1. The additional overhead of threads is relatively large: each call has an independent thread. The more threads, the greater the overhead. Apart from anything else, CPU context switching alone is also time-consuming.

Applicable scene

1. Suitable for "low fan-out": Low fan-out means that I have a service that depends on other services. The more dependent services, the higher the fan-out. Each remote call has an independent thread, so in order to avoid Multi-threading overhead problem, more suitable for low fan-out scenarios.

2.1.2. Semaphore isolation (Sentinel’s default method)

The semaphore is equivalent to the Semaphore mentioned before.

For example, I have service a and service b, and service a depends on service b. Now that a is requested, the semaphore will not create an independent thread, but will use the thread you originally processed the request to call Feign directly. Client, how did he achieve isolation? He maintains a counter, and makes a judgment every time a request comes to determine whether there is anything left in the counter. 

For example, the total amount of the counter is 10. Every time a request is entered, the counter will be decremented by one, and it will be incremented by one after the request is processed. If there are 10 requests accessing at the same time, then all 10 signals have been taken, and if there is another request at this time, New requests will be rejected directly, so it can also play a role in fault isolation.

advantage

1. Lightweight, no additional overhead: It actually makes up for the thread pool, because it is just a counter and does not need to start a thread.

shortcoming

1. Active timeout is not supported: After the request comes, it only determines whether there is a semaphore. If so, it will allocate one to you, but the semaphore is out of control and cannot be stopped midway. It can only rely on the timeout of Feign itself. time, so no active timeout can be done.

2. Does not support asynchronous calls: there is no independent thread, let alone asynchronous calls.

Applicable scene

1. High-frequency calls, high fan-out: Because the semaphore overhead is low. The gateway is a high fan-out scenario. It routes requests to your various microservices. The fan-out is quite huge, so the gateway is basically The semaphore mode is used (which is why sentinel is suitable for the semaphore mode).

2.2. Implement thread isolation (bulkhead mode)

In the Sentinel console, when adding a flow-first rule, you can choose from two threshold types:

  • QPS: Requests per second, demonstrated before.
  • Number of threads: The maximum number of tomcat threads that can be used by this resource. By limiting the number of threads, bulkhead mode is implemented.

 

Here is a case demonstration: set flow control rules for UserClient's query user interface, and the number of threads cannot exceed 2.

Since FeignClient has been configured to integrate Sentinel and access query order resources, you can see the following remote calling resources in Sentinel.

a) Add flow control rules

b) Use JMeter for testing

c) Analysis results

You can see in the result tree that all requests were successful. This is because we integrated the Sentinel protection mechanism based on Feign, and the protection strategy is "print exception logs and return empty objects".

You can see that the first few requests successfully returned information, but subsequent requests all returned empty objects. This is because Sentinel's just-configured flow control mechanism was triggered.

You can also see the printout of the log on IDEA

3. Circuit breaker downgrade


3.1. What is circuit breaker degradation?

Circuit breaker downgrade is to use a short-circuit device to collect the statistics of "abnormal ratio, slow call ratio, and number of exceptions" when the service is called. For example, if the statistics are abnormal ratio, then if the abnormal ratio is too high and the threshold is triggered, the service will be blown. In this way The failed service is isolated.

This is like an ancient martial artist who was bitten by a poisonous snake on his hand. He quickly raised the knife and cut off the hand to prevent the poison from spreading to the whole body. However, cutting off the hand is not a skill. Being able to pick it back is the real skill.

Sentinel can let the circuit breaker release requests to access the service when the service is restored.

Specifically, the fuse has the following three states:

3.2. Circuit breaker strategy - slow call

Slow calling means that if your response time is too long and exceeds the specified time, then your request is considered to be very slow, occupying additional resources and slowing down the entire service.

Therefore, if the proportion of slow calls reaches the threshold, that is, every time you call the service, you are very slow, then the circuit breaker will be triggered.

There can be new downgrade rules in the Sentinel console. Here is a description of when the circuit breaker is triggered, for example

Interpretation: If RT (ResponseTIme response time) exceeds 500ms, it is a slow call. Count the requests within the last 10000ms. If the ratio of slow calls exceeds 0.5, a circuit breaker will be triggered. The circuit breaker is usually 5s. After that, it enters the half-open state. Release one request for testing.

Here I use a case to demonstrate: set the degradation rules for the UserClient query user interface, RT is 50ms, statistical time is 1s, the minimum number of requests is 5, the failure threshold ratio is 0.4, and the circuit breaker duration is 5s.

Note: In order to trigger the slow call rule here, I modified the business in UserService to increase the business time.

a) Add downgrade rules for remote calls on Sentinel

It can be configured in the cluster point link.

The rules are as follows

b) Continuously refresh in the browser and analyze the results

After a few quick refreshes, you can see that a circuit breaker has occurred, triggering the downgrade strategy, which means empty user information is returned.

Then, after 5 seconds, after sending a request, you will find that it has entered half-open mode, giving you a chance to test.

3.3. Circuit breaker strategy - abnormal proportion and number of exceptions

Abnormal ratio: It is counted that within a specified period, if the number of calls reaches the specified number of requests, and the exceptions that occur exceed the set ratio, then the circuit breaker will be triggered.

Number of exceptions (similar to the exception ratio, not demonstrated here): As the name implies, it counts the number of calls within a specified period of time when it reaches the specified number of requests, and exceeds the exception threshold, a circuit breaker will be triggered.

For example:

Interpretation: Count the requests in the last 1000 ms. If the number of requests exceeds 10 times and the abnormality ratio is not less than 0.5, the circuit breaker will be triggered and the circuit breaker time will be 5 seconds.

Here I use a case to demonstrate the abnormal ratio: set the degradation rule for the UserClient query user interface, the statistical time is  second, the minimum number of requests is  , the failure threshold ratio is  0.4  , and the circuit breaker duration is  5 s

Note that in order to trigger exception statistics, I modified the business in UserService to throw an exception.

a) Add downgrade rules for remote calls on Sentinel

b) Continuously refresh in the browser and analyze the results

After refreshing 5 times in a row, you can observe that the circuit breaker is triggered.

Recover after 5 seconds. Here you need to change the request to /order/101, so that the remote caller also obtains the user with id = 1. Otherwise, continue to use /order/102 request, and the remote caller with id = 2 will continue to cause an exception. While it is still in the half-open state, it has returned to the open fuse state.

 

Guess you like

Origin blog.csdn.net/CYK_byte/article/details/133486023