SpringCloud Microservice Architecture---Advanced

1. Getting to know Sentinel

1.1. Avalanche problem and solution

1.1.1. Avalanche problem

A service failure in the microservice call link causes all microservices in the entire link to be unavailable, which is an avalanche.

insert image description here

1.1.2. There are four common ways to solve the avalanche problem

1.1.2.1, timeout processing

  • Set the timeout period, the request will return an error message if there is no response after a certain period of time, and will not wait endlessly

insert image description here

1.1.2.2. Bulkhead mode

The warehouse wall pattern comes from the design of the cabin:

insert image description here
The cabin will be separated into multiple independent spaces by partitions. When the hull is damaged, only part of the space will enter, and the failure will be controlled within a certain range to prevent the entire hull from being submerged.

  • Limit the number of threads that each business can use to avoid exhausting the resources of the entire tomcat, so it is also called thread isolation.

insert image description here

1.1.2.3, circuit breaker

  • The abnormal proportion of business execution is counted by the circuit breaker. If the threshold is exceeded, the business will be fused and all requests to access the business will be intercepted.

The circuit breaker will count the number of requests to access a service, the abnormal ratio:
insert image description here

When it is found that the abnormal ratio of requests to access service D is too high, it is considered that service D has the risk of causing an avalanche, and all requests to access service D will be intercepted to form a circuit breaker:

insert image description here

1.1.2.4, current limiting

Flow control : QPS to limit business access to avoid service failure due to sudden increase in traffic.

insert image description here

1.1.3. Summary

What is the avalanche problem?

  • Microservices call each other, because a service failure in the call chain causes the entire link to be inaccessible.

How to avoid service failure caused by instantaneous high concurrent traffic?

  • flow control

How to avoid the avalanche problem caused by service failure?

  • Timeout processing
  • thread isolation
  • downgrade circuit breaker

It can be considered:

Current limiting is the protection of services, avoiding service failures caused by instantaneous high concurrent traffic, thereby avoiding avalanches. is a precautionary measure.

Timeout processing, thread isolation, and downgrade fuse are used to control the fault within a certain range and avoid avalanches when some services fail. is a remedy .

1.2. Comparison of service protection technologies

Multiple service protection technologies are supported in Spring Cloud:

The Hystrix framework was more popular in the early days, but currently the most widely used in China is the Sentinel framework of Alibaba. Here we make a comparison:

Sentinel Hystrix
isolation strategy Semaphore isolation Thread pool isolation/semaphore isolation
Circuit breaker downgrade strategy Based on slow call ratio or abnormal ratio Based on failure rate
Real-time indicator implementation sliding window Sliding window (based on RxJava)
rule configuration Support multiple data sources Support multiple data sources
Scalability multiple extension points plug-in form
Annotation-based support support support
Limiting Based on QPS, support current limiting based on call relationship limited support
traffic shaping Support slow start, uniform queuing mode not support
System Adaptive Protection support not support
console Out of the box, you can configure rules, view second-level monitoring, machine discovery, etc. imperfect
Adaptation to Common Frameworks Servlet、Spring Cloud、Dubbo、gRPC 等 Servlet、Spring Cloud Netflix

1.3, Sentinel introduction and installation

1.3.1. Getting to know Sentinel

Sentinel is a microservice traffic control component open sourced by Alibaba. Official website address: https://sentinelguard.io/zh-cn/index.html

Sentinel has the following characteristics:

  • Rich application scenarios : Sentinel has undertaken the core scenarios of Alibaba’s Double Eleven traffic promotion in the past 10 years, such as seckill (that is, burst traffic is controlled within the range that the system capacity can bear), message peak shaving and valley filling, and cluster traffic control , Real-time fusing of downstream unavailable applications, etc.

  • Complete real-time monitoring : Sentinel also provides real-time monitoring functions. In the console, you can see the second-level data of a single machine connected to the application, or even the aggregated running status of a cluster with a scale of less than 500.

  • Extensive open source ecosystem : Sentinel provides out-of-the-box integration modules with other open source frameworks/libraries, such as integration with Spring Cloud, Dubbo, and gRPC. You only need to introduce the corresponding dependencies and perform simple configurations to quickly access Sentinel.

  • Perfect SPI extension point : Sentinel provides easy-to-use and complete SPI extension interface. You can quickly customize the logic by implementing the extension interface. For example, custom rule management, adaptation of dynamic data sources, etc.

1.3.2. Install Sentinel

1) Download

Sentinel officially provides a UI console, which is convenient for us to set the current limit on the system. You can download it on GitHub .

2) run

Put the jar package in any non-Chinese directory and execute the command:

java -jar sentinel-dashboard-1.8.1.jar

If you want to modify Sentinel's default port, account, and password, you can use the following configuration:

configuration item Defaults illustrate
server.port 8080 service port
sentinel.dashboard.auth.username sentinel default username
sentinel.dashboard.auth.password sentinel default password

For example, to modify the port:

java -Dserver.port=8090 -jar sentinel-dashboard-1.8.1.jar

3) visit

Visit the http://localhost:8080 page, and you can see the sentinel console:

insert image description here

You need to enter the account number and password, the default is: sentinel

After logging in, I found a blank, nothing:

insert image description here
This is because we haven't integrated with microservices yet.

1.4. Microservice integration Sentinel

We integrate sentinel in order-service and connect to sentinel console, the steps are as follows:

1) Introduce sentinel dependency

<!--sentinel-->
<dependency>
    <groupId>com.alibaba.cloud</groupId> 
    <artifactId>spring-cloud-starter-alibaba-sentinel</artifactId>
</dependency>

2) Configure the console

Modify the application.yaml file and add the following content:

server:
  port: 8088
spring:
  cloud: 
    sentinel:
      transport:
        dashboard: localhost:8080

3) Access any endpoint of order-service

Open the browser and visit http://localhost:8088/order/101, so as to trigger the monitoring of sentinel.

Then visit the sentinel console to see the effect:

insert image description here

2. Flow control

2.1. Cluster link

When a request enters a microservice, it first accesses the DispatcherServlet, and then enters the Controller, Service, and Mapper. Such a call chain is called a cluster point link . Each interface monitored in the cluster link is a resource .

By default, sentinel will monitor each endpoint (Endpoint, which is the method in the controller) of SpringMVC, so each endpoint (Endpoint) of SpringMVC is a resource in the call link.

For example, the endpoint in the OrderController in the order-service we just accessed: /order/{orderId}

insert image description here

Flow control, fuse, etc. are all set for the resources in the cluster link, so we can click the button behind the corresponding resource to set the rules:

  • flow control: flow control
  • Downgrade: downgrade fuse
  • Hotspot: hotspot parameter current limit, which is a kind of current limit
  • Authorization: request permission control

2.2. Quick start

2.2.1. Examples

Click the flow control button behind resource/order/{orderId} to pop up the form. You can fill in the current limiting rules in the form, as follows:

insert image description here

Its meaning is to limit the stand-alone QPS of the resource /order/{orderId} to 1, that is, only 1 request is allowed per second, and excess requests will be intercepted and an error will be reported.

2.2.2. Practice:

Requirement: Set flow control rules for the resource /order/{orderId}, QPS cannot exceed 5, and then test.

1) First add the current limiting rule in the sentinel console
insert image description here

2) Use jmeter to test

3) Results

[External link picture transfer failed, the source site may have an anti-theft link mechanism, it is recommended to save the picture and upload it directly (img-2oVtWs8Z-1681563813156)(assets/image-20210715200853671.png)]

As you can see, there are only 5 successful requests each time

2.3, flow control mode

When adding a flow-limiting rule, click Advanced Options to choose from three flow-control modes :

  • Direct: Statistical requests for current resources, and directly limit the flow of current resources when the threshold is triggered, which is also the default mode
  • Association: count another resource related to the current resource, and limit the flow of the current resource when the threshold is triggered
  • Link: Count the requests to access this resource from the specified link, and limit the flow of the specified link when the threshold is triggered

insert image description here
The quickstart test is the direct mode.

2.3.1. Association mode

Association mode : count another resource related to the current resource, and limit the flow of the current resource when the threshold is triggered

Configuration rules :

insert image description here

Grammar description : When the access volume of /write resources triggers the threshold, the flow of /read resources will be limited to avoid affecting /write resources.

Usage scenario : For example, the user needs to modify the order status when paying, and the user wants to query the order at the same time. Query and modification operations compete for database locks, creating contention. The business requirement is to give priority to the business of paying and updating orders. Therefore, when modifying the trigger threshold of the order business, it is necessary to limit the flow of the query order business.

Description of requirements :

  • Create two endpoints in OrderController: /order/query and /order/update, no need to implement business

  • Configure flow control rules, when the QPS accessed by the /order/ update resource exceeds 5, limit the flow of /order/query requests

1) Define the /order/query endpoint to simulate order query

@GetMapping("/query")
public String queryOrder() {
    
    
    return "查询订单成功";
}

2) Define the /order/update endpoint to simulate order updates

@GetMapping("/update")
public String updateOrder() {
    
    
    return "更新订单成功";
}

Restart the service and view the cluster link of the sentinel console:

insert image description here

3) Configure flow control rules

To limit the current of which endpoint, click the button behind that endpoint. We are limiting the order query/order/query, so click the button behind it:

insert image description here

Fill in the flow control rules in the form:
insert image description here

4) Test in Jmeter

You can see 1000 users, 100 seconds, so the QPS is 10, which exceeds the threshold we set: 5

Looking at the http request, the target of the request is /order/update, so this breakpoint will trigger the threshold.

But the target of the current limit is /order/query. We visit it in the browser, and we can find that the current limit is indeed limited.

5) Summary

The association mode can be used if the following conditions are met:

  • two competing resources
  • One with higher priority and one with lower priority

2.3.2, link mode

Link mode : Only make statistics on the requests to access this resource from the specified link, and judge whether it exceeds the threshold.

Configuration example :

For example, there are two request links:

  • /test1 --> /common

  • /test2 --> /common

If you only want to count requests from /test2 to /common, you can configure it like this:

insert image description here

Practical case

Requirements: There are order query and order creation services, both of which need to query commodities. For the statistics of requests from querying orders to querying products, and setting a current limit.

step:

  1. Add a queryGoods method in OrderService without implementing business

  2. In OrderController, modify the /order/query endpoint and call the queryGoods method in OrderService

  3. Add an /order/save endpoint to OrderController and call the queryGoods method of OrderService

  4. Set the current limit rule for queryGoods, the QPS limit of the method of entering queryGoods from /order/query must be less than 2

accomplish:

2.3.2.1, add query product method

In the order-service service, add a queryGoods method to the OrderService class:

public void queryGoods(){
    
    
    System.err.println("查询商品");
}

2.3.2.2. When querying an order, query the product

In the OrderController of order-service, modify the business logic of the /order/query endpoint:

@GetMapping("/query")
public String queryOrder() {
    
    
    // 查询商品
    orderService.queryGoods();
    // 查询订单
    System.out.println("查询订单");
    return "查询订单成功";
}

2.3.2.3, add new orders, query products

In the OrderController of order-service, modify the /order/save endpoint to simulate a new order:

@GetMapping("/save")
public String saveOrder() {
    
    
    // 查询商品
    orderService.queryGoods();
    // 查询订单
    System.err.println("新增订单");
    return "新增订单成功";
}

2.3.2.4. Add resource tags to query products

By default, the methods in OrderService are not monitored by Sentinel, and we need to mark the methods to be monitored through annotations.

Add the @SentinelResource annotation to the queryGoods method of OrderService:

@SentinelResource("goods")
public void queryGoods(){
    
    
    System.err.println("查询商品");
}

In link mode, two links from different sources are monitored. However, sentinel will set the same root resource for all requests entering SpringMVC by default, which will cause the link mode to fail.

We need to turn off this resource aggregation for SpringMVC and modify the application.yml file of the order-service service:

spring:
  cloud:
    sentinel:
      web-context-unify: false # 关闭context整合

Restart the service, visit /order/query and /order/save, and you can see that new resources have appeared in sentinel's cluster point link rules:

insert image description here

2.3.2.5. Add flow control rules

Click the flow control button behind the goods resource, and fill in the following information in the pop-up form:

insert image description here

Only the resources entering /goods from /order/query are counted. The QPS threshold is 2, and if it exceeds, the flow will be limited.

2.3.2.6, Jmeter test

It can be seen that there are 200 users here, and the sending is completed within 50 seconds, and the QPS is 4, which exceeds the threshold 2 we set.

2.3.3. Summary

What are the flow control modes?

• Direct: limit current resources

• Association: High-priority resources trigger thresholds, and low-priority resources are limited.

• Link: When counting thresholds, only the requests entering the current resource from the specified resource are counted, which is the current limit to the source of the request

2.4, flow control effect

In the advanced options of flow control, there is also a flow control effect option:

insert image description here

The flow control effect refers to the measures that should be taken when the request reaches the flow control threshold, including three types:

  • Fail fast: Once the threshold is reached, new requests are rejected immediately and a FlowException is thrown. is the default processing method.

  • warm up: warm up mode, requests exceeding the threshold are also rejected and an exception is thrown. But this mode threshold will change dynamically, gradually increasing from a small value to a maximum threshold.

  • Waiting in queue: Let all requests be executed in a queue in order, and the interval between two requests cannot be less than the specified time

2.4.1、warm up

The threshold is generally the maximum QPS that a microservice can undertake. However, when a service is just started, all resources have not been initialized ( cold start ). If the QPS is directly run to the maximum value, it may cause the service to go down instantly.

Warm up is also called warm-up mode , which is a solution to cold start of services. The initial value of the request threshold is maxThreshold / coldFactor, and after a specified duration, it will gradually increase to the maxThreshold value. The default value of coldFactor is 3.

For example, if I set the maxThreshold of QPS to 10 and the warm-up time to 5 seconds, then the initial threshold is 10 / 3, which is 3, and then gradually increases to 10 after 5 seconds.

insert image description here

the case

Requirements: Set a current limit for the resource /order/{orderId}, the maximum QPS is 10, use the warm up effect, and the warm-up time is 5 seconds

2.4.1.1. Configure flow control rules:

insert image description here

2.4.1.2, Jmeter test

QPS is 10.

When it was just started, most of the requests failed, and only 3 succeeded, indicating that the QPS is limited to 3. As time goes by, the success ratio is getting higher and higher:

insert image description here

2.4.2. Waiting in line

Fail fast and warm up reject new requests and throw exceptions when requests exceed the QPS threshold.

Queuing and waiting is to let all requests enter a queue, and then execute them sequentially according to the time interval allowed by the threshold. Subsequent requests must wait for previous executions to complete, and will be rejected if the expected waiting time for the request exceeds the maximum duration.

working principle

For example: QPS = 5, means that a request in the queue is processed every 200ms; timeout = 2000, means that requests with an expected waiting time of more than 2000ms will be rejected and an exception will be thrown.

So what is the expected waiting time?

For example, 12 requests come at once, because one request is executed every 200ms, then:

  • Expected waiting time for the 6th request = 200 * (6 - 1) = 1000ms
  • Expected waiting time for the 12th request = 200 * (12-1) = 2200ms

Now, 10 requests are received at the same time in the first second, but only one request is received in the second second. At this time, the QPS curve looks like this:

insert image description here

If you use the queue mode for flow control, all incoming requests must be queued and executed at a fixed interval of 200ms, and the QPS will become very smooth:

insert image description here
A smooth QPS curve is more friendly to the server.

the case

Requirement: Set current limit for the resource /order/{orderId}, the maximum QPS is 10, use the flow control effect of queuing, and set the timeout period to 5s

2.4.2.1. Add flow control rules

insert image description here

2.4.2.2, Jmeter test

The QPS is 15, which has exceeded the 10 we set.

If it is the previous fast failure and warmup mode, the excess requests should directly report an error.

But let's look at the results of the queue mode: all passed.

Then go to sentinel to view the QPS curve of real-time monitoring:

insert image description here

QPS is very smooth and consistently maintained at 10, but the excess requests are not rejected, but put into the queue. So the response time (waiting time) will be longer and longer.

When the queue is full, some requests will fail:

insert image description here

2.4.3. Summary

What are the flow control effects?

  • Fail fast: reject new requests when the QPS exceeds the threshold

  • warm up: When the QPS exceeds the threshold, new requests are rejected; the QPS threshold is gradually increased, which can avoid service downtime caused by high concurrency during cold start.

  • Waiting in queue: the request will enter the queue, and the request will be executed sequentially according to the time interval allowed by the threshold; if the expected waiting time of the request is longer than the timeout time, it will be rejected directly

2.5. Current limit of hotspot parameters

The previous current limiting is to count all the requests to access a certain resource and judge whether it exceeds the QPS threshold. The hotspot parameter current limit is to count requests with the same parameter value separately , and judge whether it exceeds the QPS threshold.

2.5.1. Global parameter current limiting

For example, an interface for querying products by id:
insert image description here

In the request to access /goods/{id}, the value of the id parameter will change, and the hotspot parameter current limit will count the QPS separately according to the parameter value, and the statistical results are as follows:
insert image description here

When the request with id=1 triggers the threshold to be limited, the request with id value other than 1 will not be affected.

Configuration example:
insert image description here

The meaning of the representative is: make statistics on parameter 0 (the first parameter) of the hot resource, and the number of requests for the same parameter value per second cannot exceed 5

2.5.2. Current limit of hotspot parameters

In the configuration just now, all products of the interface for querying products are treated equally, and the QPS is limited to 5.

In actual development, some products may be hot products, such as flash sale products. We hope that the QPS limit of these products is different from other products, and is higher. Then you need to configure the advanced options of hotspot parameter current limit:
insert image description here

Combined with the previous configuration, the meaning here is to limit the current of the long type parameter of number 0, and the QPS of the same parameter cannot exceed 5 per second, with two exceptions:

  • If the parameter value is 100, the QPS allowed per 1 second is 10

  • If the parameter value is 101, the QPS allowed per 1 second is 15

2.5.3. Case

Case requirements : Add a hotspot parameter current limit to the resource /order/{orderId}, the rules are as follows:

• The default hotspot parameter rule is that the number of requests per second should not exceed 2

• Set an exception for the parameter 102: the number of requests per second should not exceed 4

• Set an exception for the parameter 103: the number of requests per second should not exceed 10

Note : The hotspot parameter current limit is invalid for the default SpringMVC resource, you need to use the @SentinelResource annotation to mark the resource

2.5.3.1. Marking resources

Add annotations to the /order/{orderId} resource in OrderController in order-service:
insert image description here

2.5.3.2. Hotspot parameter current limiting rules

Visit this interface, you can see that the hot resources we marked appear:

insert image description here

Click the Hotspot Rules menu in the left menu:
insert image description here
Click Add and fill in the form:

insert image description here

2.5.3.3, Jmeter test

The QPS of the request here is 5.

Contains 3 http requests:

Common parameter, QPS threshold is 2

3. Isolation and downgrade

Current limiting is a preventive measure. Although current limiting can try to avoid service failures caused by high concurrency, services may also fail due to other reasons.

To control these faults within a certain range and avoid avalanches, it is necessary to rely on thread isolation (bulkwall mode) and fuse downgrade methods.

Thread isolation has been mentioned before: when the caller calls the service provider, it allocates an independent thread pool for each call request. When a failure occurs, at most the resources in this thread pool are consumed to avoid exhausting all the resources of the caller.

Fuse downgrade : It is to add a circuit breaker on the caller's side to count the calls to the service provider. If the failure rate of the call is too high, the service will be blown and access to the service provider will not be allowed.

It can be seen that whether it is thread isolation or fuse downgrade, it is the protection of the client (caller). It is necessary to perform thread isolation or service fusing when the caller initiates a remote call.

And our microservice remote calls are all based on Feign, so we need to integrate Feign with Sentinel, and implement thread isolation and service fuse in Feign.

3.1, FeignClient integrates Sentinel

In Spring Cloud, microservice calls are all implemented through Feign, so Feign and Sentinel must be integrated for client protection.

3.1.1. Modify the configuration and enable the sentinel function

Modify the application.yml file of OrderService to enable Feign's Sentinel function:

feign:
  sentinel:
    enabled: true # 开启feign对sentinel的支持

3.1.2. Writing failure downgrade logic

After a business failure, an error cannot be reported directly, but a friendly prompt or default result should be returned to the user. This is the failure downgrade logic.

Write downgrade logic for FeignClient after failure

① Method 1: FallbackClass, unable to handle exceptions of remote calls

②Method 2: FallbackFactory, which can handle the exception of remote calls, we choose this

Here we demonstrate the failure downgrade processing of the second method.

Step 1 : Define classes in the feeding-api project and implement FallbackFactory

@Slf4j
public class UserClientFallbackFactory implements FallbackFactory<UserClient> {
    
    
   
    @Override
    public UserClient create(Throwable throwable) {
    
    
        return new UserClient() {
    
    
            @Override
            public User findById(Long id) {
    
    
                log.error("查询用户异常", throwable);
                return new User();
            }
        };
    }
}

Step 2 : Register UserClientFallbackFactory as a Bean in the DefaultFeignConfiguration class in the feeding-api project:

@Bean
public UserClientFallbackFactory userClientFallbackFactory(){
    
    
    return new UserClientFallbackFactory();
}

Step 3 : Use UserClientFallbackFactory in the UserClient interface in the feeding-api project:

@FeignClient(value = "userservice", fallbackFactory = UserClientFallbackFactory.class)
public interface UserClient {
    
    

    @GetMapping("/user/{id}")
    User findById(@PathVariable("id") Long id);
}

After restarting, visit the order query service once, and then check the sentinel console, you can see the new cluster point link:
insert image description here

3.1.3. Summary

Avalanche solutions supported by Sentinel:

  • Thread isolation (silo wall mode)
  • downgrade circuit breaker

Steps for Feign to integrate Sentinel:

  • Configure in application.yml: feign.sentienl.enable=true
  • Write a FallbackFactory for FeignClient and register it as a Bean
  • Configure FallbackFactory to FeignClient

3.2. Thread isolation (bulkwall mode)

3.2.1, the implementation of thread isolation

Thread isolation can be achieved in two ways:

  • thread pool isolation

  • Semaphore isolation (Sentinel uses it by default)

As shown in the picture:
insert image description here

Thread pool isolation : assign a thread pool to each service call business, and use the thread pool itself to achieve the isolation effect

Semaphore isolation : Instead of creating a thread pool, it uses a counter mode to record the number of threads used by the business. When the upper limit of the semaphore is reached, new requests are prohibited.

Pros and cons of both:
insert image description here

3.2.2, thread isolation of sentinel

Instructions for use :

When adding a throttling rule, you can choose two threshold types:
insert image description here

  • QPS: It is the number of requests per second, which has been demonstrated in the quick start

  • Number of threads: It is the maximum number of tomcat threads that can be used by this resource. That is, by limiting the number of threads, thread isolation (bulkwall mode) is achieved.

Case requirements : Set flow control rules for the query user interface of UserClient in the order-service service, and the number of threads cannot exceed 2. Then use jemeter to test.

3.2.2.1. Configure isolation rules

Select the flow control button behind the feign interface:
insert image description here

Fill out the form:
insert image description here

3.2.2.2, Jmeter test

If 10 requests occur at a time, there is a high probability that the number of concurrent threads will exceed 2, and the excess requests will follow the previously defined failure degradation logic.

3.2.3. Summary

What are the two means of thread isolation?

  • Semaphore isolation

  • thread pool isolation

What are the characteristics of semaphore isolation?

  • Based on the counter mode, simple and low overhead

What are the characteristics of thread pool isolation?

  • Based on the thread pool mode, there is additional overhead, but the isolation control is stronger

3.3, fuse downgrade

Fuse downgrade is an important means to solve the avalanche problem. The idea is that the circuit breaker counts the abnormal proportion of service calls and the proportion of slow requests, and if the threshold is exceeded, the service will be broken . That is, all requests to access the service are intercepted; and when the service is restored, the circuit breaker will release the request to access the service.

Circuit breaker control fusing and release is done through the state machine:

insert image description here
The state machine consists of three states:

  • closed: closed state, the circuit breaker releases all requests, and starts to count the proportion of exceptions and slow requests. If the threshold is exceeded, switch to the open state
  • open: In the open state, the service call is interrupted , and the request to access the interrupted service will be rejected, fail fast, and go directly to the downgrade logic. After 5 seconds in the Open state, it will enter the half-open state
  • half-open: In the half-open state, a request is released, and the next operation is judged according to the execution result.
    • The request is successful: switch to the closed state
    • Request failed: switch to open state

There are three types of circuit breaker fusing strategies: slow call, abnormal ratio, abnormal number

3.3.1. Slow calls

Slow call : A request whose service response time (RT) is longer than the specified time is considered a slow call request. Within the specified time, if the number of requests exceeds the set minimum number and the proportion of slow calls is greater than the set threshold, a circuit breaker will be triggered.

For example:
insert image description here
Interpretation: Calls with an RT exceeding 500ms are slow calls. Count the requests within the last 10,000ms. If the number of requests exceeds 10 and the proportion of slow calls is not less than 0.5, a circuit breaker will be triggered and the circuit breaker will last for 5 seconds. Then enter the half-open state and release a request for testing.

the case

Requirements: Set downgrade rules for UserClient's query user interface. The RT threshold for slow calls is 50ms, the statistics time is 1 second, the minimum number of requests is 5, the failure threshold ratio is 0.4, and the fuse duration is 5

3.3.1.1. Set slow call

Modify the service of the /user/{id} interface in user-service. Simulate a delay time by sleeping:
insert image description here

3.3.1.2. Set circuit breaker rules

Next, set the downgrade rule for the feign interface:
insert image description here
requests exceeding 50ms will be considered slow requests

3.3.1.3. Test

Visit in the browser: http://localhost:8088/order/101, quickly refresh 5 times, you can find:

The circuit breaker was triggered, the request duration was shortened to 5ms, it failed quickly, and the degraded logic was followed, and null was returned

3.3.2. Abnormal proportion and abnormal number

Abnormal ratio or abnormal number : Count calls within a specified time period. If the number of calls exceeds the specified number of requests, and the proportion of abnormalities reaches the set ratio threshold (or exceeds the specified abnormal number), a circuit breaker will be triggered.

For example, an unusual scale setting:
insert image description here

Interpretation: Count the requests within the last 1000ms. If the number of requests exceeds 10 and the abnormality ratio is not less than 0.4, a circuit breaker will be triggered.

An exception number setting:
insert image description here
Interpretation: Count the requests within the last 1000ms. If the number of requests exceeds 10 and the proportion of exceptions is not less than 2, a circuit breaker will be triggered.

the case

Requirements: Set downgrade rules for UserClient's query user interface, the statistics time is 1 second, the minimum number of requests is 5, the failure threshold ratio is 0.4, and the fuse duration is 5s

3.3.2.1. Set exception request

First, modify the business of the interface /user/{id} in user-service. Manually throw an exception to trigger a fuse of the abnormal ratio:
insert image description here
that is, when the id is 2, an exception will be triggered

3.3.2.2. Set circuit breaker rules

Next, set the downgrading rules for the feign interface:
insert image description here
Rule:
insert image description here
In 5 requests, as long as the exception ratio exceeds 0.4, that is, there are more than 2 exceptions, the circuit breaker will be triggered.

3.3.2.3. Test

Quickly visit in the browser: http://localhost:8088/order/102, refresh 5 times quickly, and trigger a fuse

3.3.3. Summary

What are the strategies for Sentinel circuit breaker downgrade?

  • Slow call ratio: Calls that exceed the specified duration are slow calls, and the ratio of slow calls within a unit of time is counted. If the threshold is exceeded, it will be blown

  • Abnormal ratio: the ratio of abnormal calls within the statistical unit duration, if the threshold is exceeded, the fuse will be disconnected

  • Abnormal number: Count the number of abnormal calls within the unit time, and if it exceeds the threshold, it will be blown

4. Authorization rules

Authorization rules can judge and control the source of the requester.

4.1. Authorization rules

4.1.1. Basic Rules

Authorization rules can control the source of the caller, and there are two ways: white list and black list.

  • Whitelist: callers whose origin is in the whitelist are allowed to access

  • Blacklist: Callers whose origin is in the blacklist are not allowed to access

Click Authorization on the left menu to see the authorization rules:

insert image description here

  • Resource name: It is the protected resource, such as /order/{orderId}

  • Flow control application: is the list of sources,

    • If the white list is checked, the sources in the list are allowed to access.
    • If the blacklist is checked, the sources in the list are prohibited from accessing.

For example:
insert image description here
we allow requests from gateway to order-service, but do not allow browsers to access order-service, then the source name (origin) of the gateway must be filled in the white list .

4.1.2. How to get origin

Sentinel obtains the source of the request through the parseOrigin of the RequestOriginParser interface.

public interface RequestOriginParser {
    
    
    /**
     * 从请求request对象中获取origin,获取方式自定义
     */
    String parseOrigin(HttpServletRequest request);
}

The function of this method is to get the origin value of the requester from the request object and return it.

By default, no matter where the requester comes from, sentinel will always return the value default, which means that the source of all requests is considered to be the same value default.

Therefore, we need to customize the implementation of this interface so that different requests can return different origins .

For example, in the order-service service, we define an implementation class of RequestOriginParser:

@Component
public class HeaderOriginParser implements RequestOriginParser {
    
    
    @Override
    public String parseOrigin(HttpServletRequest request) {
    
    
        // 1.获取请求头
        String origin = request.getHeader("origin");
        // 2.非空判断
        if (StringUtils.isEmpty(origin)) {
    
    
            origin = "blank";
        }
        return origin;
    }
}

We will try to get the origin value from the request-header.

4.1.3. Add request headers to the gateway

Since the way to get the origin of the request is to get the origin value from the requests-header, we must make all requests routed from the gateway to the microservice have the origin header .

This needs to be realized by using a GatewayFilter learned before, AddRequestHeaderGatewayFilter.

Modify application.yml in the gateway service and add a defaultFilter:

spring:
  cloud:
    gateway:
      default-filters:
        - AddRequestHeader=origin,gateway
      routes:
       # ...略

In this way, all requests routed from the gateway will carry the origin header with the value gateway. Requests arriving at the microservice from elsewhere do not have this header.

4.1.4. Configure authorization rules

Next, we add an authorization rule to allow requests whose origin value is gateway.
insert image description here
The configuration is as follows:
insert image description here
Now, we directly skip the gateway and access the order-service service:
insert image description here
access through the gateway:
insert image description here

4.2. Custom abnormal results

By default, when current limiting, downgrading, or authorization interception occurs, an exception will be thrown to the caller. Abnormal results are flow limiting (current limiting). This is not friendly enough, and it is impossible to know whether it is current limiting, downgrading or authorized interception.

4.2.1, exception type

And if you want to customize the return result when an exception occurs, you need to implement the BlockExceptionHandler interface:

public interface BlockExceptionHandler {
    
    
    /**
     * 处理请求被限流、降级、授权拦截时抛出的异常:BlockException
     */
    void handle(HttpServletRequest request, HttpServletResponse response, BlockException e) throws Exception;
}

This method has three parameters:

  • HttpServletRequest request:request对象
  • HttpServletResponse response:response对象
  • BlockException e: the exception thrown when intercepted by sentinel

BlockException here contains several different subclasses:

abnormal illustrate
FlowException Current limit exception
ParamFlowException Abnormal hotspot parameter current limit
DegradeException downgrade exception
AuthorityException Authorization rule exception
SystemBlockException System rule exception

4.2.2, custom exception handling

Next, we define a custom exception handling class in order-service:

@Component
public class SentinelExceptionHandler implements BlockExceptionHandler {
    
    
 
    @Override
    public void handle(HttpServletRequest request, HttpServletResponse response, BlockException e) throws Exception {
    
    
        String msg = "未知异常";
        int status = 429;

        if (e instanceof FlowException) {
    
    
            msg = "请求被限流了";
        } else if (e instanceof ParamFlowException) {
    
    
            msg = "请求被热点参数限流";
        } else if (e instanceof DegradeException) {
    
    
            msg = "请求被降级了";
        } else if (e instanceof AuthorityException) {
    
    
            msg = "没有权限访问";
            status = 401;
        }

        response.setContentType("application/json;charset=utf-8");
        response.setStatus(status);
        response.getWriter().println("{\"msg\": " + msg + ", \"status\": " + status + "}");
    }
}

Restart the test, and in different scenarios, different exception messages will be returned.

Limiting:

[External link image transfer failed, the source site may have an anti-leeching mechanism, it is recommended to save the image and upload it directly (img-dGLkuUKu-1681568529895)(assets/image-20210716153938887.png)]

When authorizing interception:
insert image description here

5. Persistence of rules

Now, all rules of sentinel are stored in memory, and all rules will be lost after restarting. In a production environment, we must ensure the persistence of these rules to avoid loss.

5.1, rule management mode

Whether the rules can be persisted depends on the rule management mode. Sentinel supports three rule management modes:

  • Original mode: The default mode of Sentinel, the rules are saved in memory, and the service will be lost after restarting the service.
  • pull mode
  • push mode

5.1.1, pull mode

Pull mode: The console pushes the configured rules to the Sentinel client, and the client saves the configured rules in a local file or database. In the future, we will regularly query local files or databases to update local rules.

[External link picture transfer failed, the source site may have an anti-theft link mechanism, it is recommended to save the picture and upload it directly (img-lO9NK6Eb-1681568720137)(assets/image-20210716154155238.png)]

5.1.2, push mode

Push mode: The console pushes configuration rules to a remote configuration center, such as Nacos. The Sentinel client monitors Nacos, obtains push messages of configuration changes, and completes local configuration updates.

insert image description here

5.2. Realize the push mode

5.2.1. Modify the order-service service

Modify OrderService to listen to the sentinel rule configuration in Nacos.

Specific steps are as follows:

5.2.1.1. Introducing dependencies

Introduce sentinel to monitor nacos dependencies in order-service:

<dependency>
    <groupId>com.alibaba.csp</groupId>
    <artifactId>sentinel-datasource-nacos</artifactId>
</dependency>

5.2.1.2. Configure nacos address

Configure the nacos address and monitor configuration information in the application.yml file in order-service:

spring:
  cloud:
    sentinel:
      datasource:
        flow:
          nacos:
            server-addr: localhost:8848 # nacos地址
            dataId: orderservice-flow-rules
            groupId: SENTINEL_GROUP
            rule-type: flow # 还可以是:degrade、authority、param-flow

5.2.2. Modify sentinel-dashboard source code

SentinelDashboard does not support nacos persistence by default, and the source code needs to be modified.

5.2.2.1, decompress sentinel source package

5.2.2.2. Modify nacos dependencies

In the pom file of the sentinel-dashboard source code, nacos depends on the default scope of test, which can only be used during testing. Here it needs to be removed:

<dependency>
    <groupId>com.alibaba.csp</groupId>
    <artifactId>sentinel-datasource-nacos</artifactId>
    <scope>test</scope>
</dependency>

Remove the scope that sentinel-datasource-nacos depends on:

<dependency>
    <groupId>com.alibaba.csp</groupId>
    <artifactId>sentinel-datasource-nacos</artifactId>
</dependency>

5.2.2.3, add nacos support

Under the test package of sentinel-dashboard, support for nacos has been written, and we need to copy it to main.
insert image description here

5.2.2.4. Modify nacos address

Then, you also need to modify the NacosConfig class in the test code:
modify the nacos address in it to read the configuration in application.properties:

insert image description here
Add nacos address configuration in application.properties of sentinel-dashboard:

nacos.addr=localhost:8848

5.2.2.5. Configure nacos data source

In addition, it is also necessary to modify the FlowControllerV2 class under the com.alibaba.csp.sentinel.dashboard.controller.v2 package:
let the Nacos data source we added take effect:
insert image description here

5.2.2.6, modify the front-end page

Next, modify the front-end page and add a menu that supports nacos.

Modify the sidebar.html file in the src/main/webapp/resources/app/scripts/directives/sidebar/ directory, and open this part of the comment: modify the
insert image description here
text in it:
insert image description here

5.2.2.7, recompile and package the project

Run the maven plug-in in IDEA, compile and package the modified Sentinel-Dashboard:

5.2.2.8, start

The startup method is the same as the official one:

java -jar sentinel-dashboard.jar

If you want to modify the nacos address, you need to add parameters:

java -jar -Dnacos.addr=localhost:8848 sentinel-dashboard.jar

6. Theoretical basis

6.1. Distributed transaction problems

6.1.1. Local affairs

Local transactions, that is, traditional stand-alone transactions . In traditional database transactions, four principles must be met:

insert image description here

6.1.2. Distributed transactions

Distributed transactions refer to transactions that are not generated under a single service or a single database architecture, such as:

  • Distributed transactions across data sources
  • Distributed transactions across services
  • general situation

After the horizontal split of the database and the vertical split of the service, a business operation usually needs to span multiple databases and services to complete. For example, the more common cases of order payment in the e-commerce industry include the following behaviors:

  • create new order
  • Deduction of commodity inventory
  • Deduct the amount from the user account balance

Doing the above requires access to three different microservices and three different databases.
insert image description here

The creation of orders, deduction of inventory, and debiting of accounts are a local transaction within each service and database, which can guarantee ACID principles.

But when we regard the three things as a "business", to satisfy the atomicity of the "business", either all operations succeed or all fail, and the phenomenon of partial success and partial failure is not allowed. This is the distributed system . business .

At this time, ACID is difficult to satisfy, which is the problem to be solved by distributed transactions

6.1.3. Demonstrate distributed transaction problems

We use a case to demonstrate the problem of distributed transactions:

1) Create a database named seata_demo, and then import the SQL file
2) Create a new microservice

in:

seata-demo: parent project, responsible for managing project dependencies

  • account-service: account service, responsible for managing the user's fund account. Provide an interface to deduct the balance
  • storage-service: Inventory service, responsible for managing commodity inventory. Provide an interface for deducting inventory
  • order-service: order service, responsible for managing orders. When creating an order, you need to call account-service and storage-service

3) Start nacos and all microservices

4) Test the order function and send a Post request:

The request is as follows:

curl --location --request POST 'http://localhost:8082/order?userId=user202103032042012&commodityCode=100202003032041&count=20&money=200'

The test found that when the inventory is insufficient, if the balance has been deducted, it will not be rolled back, and a distributed transaction problem has occurred.

6.2, CAP theorem

In 1998, Eric Brewer, a computer scientist at the University of California, proposed that there are three indicators for distributed systems.

  • Consistency
  • Availability
  • Partition tolerance (partition fault tolerance)

insert image description here

Their first letters are C, A, P.

Eric Brewer said that these three indicators can not be achieved at the same time. This conclusion is called the CAP theorem.

6.2.1. Consistency

Consistency: When a user accesses any node in the distributed system, the data obtained must be consistent.

For example, now it contains two nodes, where the initial data is consistent:

insert image description here

When we modify the data of one of the nodes, the data of the two has a difference:

insert image description here

To maintain consistency, data synchronization from node01 to node02 must be achieved:
insert image description here

6.2.2. Availability

Availability (availability): A user accessing any healthy node in the cluster must be able to get a response instead of timeout or rejection.

As shown in the figure, there is a cluster with three nodes, and any one of them can be accessed in time to get a response:

insert image description here

When some nodes are inaccessible due to network failure or other reasons, it means the node is unavailable:
insert image description here

6.2.3, Partition fault tolerance

Partition : Due to network failure or other reasons, some nodes in the distributed system lose connection with other nodes, forming an independent partition.

insert image description here

Tolerance (fault tolerance) : When a partition occurs in the cluster, the entire system must continue to provide external services

6.2.4. Contradiction

In a distributed system, the network between systems cannot be guaranteed to be 100% healthy, and there must be failures, and services must be guaranteed externally. Therefore Partition Tolerance is inevitable.

Problems arise when nodes receive new data changes:
insert image description here

If you want to ensure consistency at this time , you must wait for the network to recover. After the data synchronization is completed, the entire cluster will provide services to the outside world. The services are blocked and unavailable.

If you want to ensure availability at this time , you can't wait for the network to recover, then there will be data inconsistencies between node01, node02 and node03.

That is to say, in the case that P must appear, only one between A and C can be realized.

6.2.5. Summary

Briefly describe the content of the CAP theorem?

  • Distributed system nodes are connected through the network, there must be partition problems (P)
  • When partitions appear, the consistency (C) and availability (A) of the system cannot be satisfied at the same time

Thinking: Is the elasticsearch cluster CP or AP?

  • When the ES cluster is partitioned, the faulty node will be removed from the cluster, and the data fragments will be redistributed to other nodes to ensure data consistency. Therefore, it is low availability, high consistency, and belongs to CP

6.3, BASE theory

BASE theory is a solution to CAP, including three ideas:

  • Basically Available (basically available) : When a distributed system fails, it is allowed to lose part of its availability, that is, to ensure that the core is available.
  • **Soft State (soft state): **In a certain period of time, an intermediate state is allowed, such as a temporary inconsistent state.
  • Eventually Consistent : Although strong consistency cannot be guaranteed, data consistency will eventually be achieved after the soft state ends.

The biggest problem of distributed transactions is the consistency of each sub-transaction, so we can learn from the CAP theorem and BASE theory:

  • AP mode: Each sub-transaction is executed and submitted separately, allowing inconsistencies in results, and then taking remedial measures to restore the data to achieve final consistency.

  • CP mode: Each sub-transaction waits for each other after execution, commits at the same time, and rolls back at the same time to reach a strong consensus. However, during the waiting process of the transaction, it is in a weakly available state.

6.4. Ideas for solving distributed transactions

To solve distributed transactions, each subsystem must be able to perceive each other's transaction status in order to ensure consistent status. Therefore, a transaction coordinator is needed to coordinate each transaction participant (subsystem transaction).

The subsystem transactions here are called branch transactions ; the associated branch transactions are called global transactions together

But no matter which mode it is, it is necessary to communicate with each other between subsystem transactions and coordinate transaction status, that is, a transaction coordinator (TC) is required :

insert image description here

Summarize:

Briefly describe the three ideas of BASE theory:

  • basically available
  • soft state
  • eventually consistent

Ideas and models for solving distributed transactions:

  • Global transaction: the entire distributed transaction
  • Branch transaction: transaction of each subsystem included in the distributed transaction
  • The idea of ​​final consistency: each branch transaction is executed and submitted separately, if there is an inconsistency, then find a way to restore the data
  • Strong consistency idea: Do not submit the business after each branch transaction is executed, and wait for each other's results. Then commit or rollback uniformly

7. Getting to know Seata for the first time

Seata is a distributed transaction solution jointly open-sourced by Ant Financial and Alibaba in January 2019. Committed to providing high-performance and easy-to-use distributed transaction services, creating a one-stop distributed solution for users.

Official website address: http://seata.io/ , where documents and podcasts provide a lot of usage instructions and source code analysis.

7.1, Seata's architecture

There are three important roles in Seata transaction management:

  • TC (Transaction Coordinator) - **Transaction Coordinator: **Maintains the status of global and branch transactions, and coordinates global transaction commit or rollback.

  • TM (Transaction Manager) - **Transaction Manager:** Define the scope of global transactions, start global transactions, commit or rollback global transactions.

  • RM (Resource Manager) - **Resource Manager:** manages resources for branch transactions, talks to TC to register branch transactions and report the status of branch transactions, and drives branch transactions to commit or rollback.

The overall structure is shown in the figure:

insert image description here

Seata provides four different distributed transaction solutions based on the above architecture:

  • XA mode: strongly consistent phased transaction mode, sacrificing certain availability and no business intrusion
  • TCC mode: eventually consistent phased transaction mode, with business intrusion
  • AT mode: eventually consistent phased transaction mode, no business intrusion, and also the default mode of Seata
  • SAGA mode: long transaction mode, with business intrusion

No matter what kind of solution, it is inseparable from TC, that is, the coordinator of the transaction.

7.2. Deploy TC service

7.2.1. Deploy Seata's tc-server

7.2.1.1. Download

First of all, we need to download the seata-server package, the address is https://seata.io/zh-cn/blog/download.html

7.2.1.2. Decompression

Unzip the zip package in a non-Chinese directory, and its directory structure is as follows:
insert image description here

7.2.1.3. Modify configuration

Modify the registry.conf file in the conf directory:

registry {
  # tc服务的注册中心类,这里选择nacos,也可以是eureka、zookeeper等
  type = "nacos"

  nacos {
    # seata tc 服务注册到 nacos的服务名称,可以自定义
    application = "seata-tc-server"
    serverAddr = "127.0.0.1:8848"
    group = "DEFAULT_GROUP"
    namespace = ""
    cluster = "SH"
    username = "nacos"
    password = "nacos"
  }
}

config {
  # 读取tc服务端的配置文件的方式,这里是从nacos配置中心读取,这样如果tc是集群,可以共享配置
  type = "nacos"
  # 配置nacos地址等信息
  nacos {
    serverAddr = "127.0.0.1:8848"
    namespace = ""
    group = "SEATA_GROUP"
    username = "nacos"
    password = "nacos"
    dataId = "seataServer.properties"
  }
}

7.2.1.4, add configuration in nacos

Special attention, in order to allow the cluster of tc service to share configuration, we chose nacos as the unified configuration center. Therefore, the server configuration file seataServer.properties needs to be configured in nacos.

The format is as follows:
insert image description here

The configuration content is as follows:

# 数据存储方式,db代表数据库
store.mode=db
store.db.datasource=druid
store.db.dbType=mysql
store.db.driverClassName=com.mysql.jdbc.Driver
store.db.url=jdbc:mysql://127.0.0.1:3306/seata?useUnicode=true&rewriteBatchedStatements=true
store.db.user=root
store.db.password=123
store.db.minConn=5
store.db.maxConn=30
store.db.globalTable=global_table
store.db.branchTable=branch_table
store.db.queryLimit=100
store.db.lockTable=lock_table
store.db.maxWait=5000

# 事务、日志等配置
server.recovery.committingRetryPeriod=1000
server.recovery.asynCommittingRetryPeriod=1000
server.recovery.rollbackingRetryPeriod=1000
server.recovery.timeoutRetryPeriod=1000
server.maxCommitRetryTimeout=-1
server.maxRollbackRetryTimeout=-1
server.rollbackRetryTimeoutUnlockEnable=false
server.undo.logSaveDays=7
server.undo.logDeletePeriod=86400000

# 客户端与服务端传输方式
transport.serialization=seata
transport.compressor=none

# 关闭metrics功能,提高性能
metrics.enabled=false
metrics.registryType=compact
metrics.exporterList=prometheus
metrics.exporterPrometheusPort=9898

7.2.1.5. Create database table

Special attention: When the tc service manages distributed transactions, it needs to record transaction-related data into the database, and you need to create these tables in advance.

Create a new database named seata

These tables mainly record global transactions, branch transactions, and global lock information:


SET NAMES utf8mb4;
SET FOREIGN_KEY_CHECKS = 0;

-- ----------------------------
-- 分支事务表
-- ----------------------------
DROP TABLE IF EXISTS `branch_table`;
CREATE TABLE `branch_table`  (
  `branch_id` bigint(20) NOT NULL,
  `xid` varchar(128) CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL,
  `transaction_id` bigint(20) NULL DEFAULT NULL,
  `resource_group_id` varchar(32) CHARACTER SET utf8 COLLATE utf8_general_ci NULL DEFAULT NULL,
  `resource_id` varchar(256) CHARACTER SET utf8 COLLATE utf8_general_ci NULL DEFAULT NULL,
  `branch_type` varchar(8) CHARACTER SET utf8 COLLATE utf8_general_ci NULL DEFAULT NULL,
  `status` tinyint(4) NULL DEFAULT NULL,
  `client_id` varchar(64) CHARACTER SET utf8 COLLATE utf8_general_ci NULL DEFAULT NULL,
  `application_data` varchar(2000) CHARACTER SET utf8 COLLATE utf8_general_ci NULL DEFAULT NULL,
  `gmt_create` datetime(6) NULL DEFAULT NULL,
  `gmt_modified` datetime(6) NULL DEFAULT NULL,
  PRIMARY KEY (`branch_id`) USING BTREE,
  INDEX `idx_xid`(`xid`) USING BTREE
) ENGINE = InnoDB CHARACTER SET = utf8 COLLATE = utf8_general_ci ROW_FORMAT = Compact;

-- ----------------------------
-- 全局事务表
-- ----------------------------
DROP TABLE IF EXISTS `global_table`;
CREATE TABLE `global_table`  (
  `xid` varchar(128) CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL,
  `transaction_id` bigint(20) NULL DEFAULT NULL,
  `status` tinyint(4) NOT NULL,
  `application_id` varchar(32) CHARACTER SET utf8 COLLATE utf8_general_ci NULL DEFAULT NULL,
  `transaction_service_group` varchar(32) CHARACTER SET utf8 COLLATE utf8_general_ci NULL DEFAULT NULL,
  `transaction_name` varchar(128) CHARACTER SET utf8 COLLATE utf8_general_ci NULL DEFAULT NULL,
  `timeout` int(11) NULL DEFAULT NULL,
  `begin_time` bigint(20) NULL DEFAULT NULL,
  `application_data` varchar(2000) CHARACTER SET utf8 COLLATE utf8_general_ci NULL DEFAULT NULL,
  `gmt_create` datetime NULL DEFAULT NULL,
  `gmt_modified` datetime NULL DEFAULT NULL,
  PRIMARY KEY (`xid`) USING BTREE,
  INDEX `idx_gmt_modified_status`(`gmt_modified`, `status`) USING BTREE,
  INDEX `idx_transaction_id`(`transaction_id`) USING BTREE
) ENGINE = InnoDB CHARACTER SET = utf8 COLLATE = utf8_general_ci ROW_FORMAT = Compact;

SET FOREIGN_KEY_CHECKS = 1;

7.2.1.6. Start TC service

Enter the bin directory and run the seata-server.bat in it:

After the startup is successful, the seata-server should have been registered to the nacos registration center.

Open the browser, visit the nacos address: http://localhost:8848, and then enter the service list page, you can see the information of seata-tc-server:
insert image description here

7.3. Microservice integration with Seata

7.3.1. Introducing dependencies

First, introduce dependencies in order-service:

<!--seata-->
<dependency>
    <groupId>com.alibaba.cloud</groupId>
    <artifactId>spring-cloud-starter-alibaba-seata</artifactId>
    <exclusions>
        <!--版本较低,1.3.0,因此排除--> 
        <exclusion>
            <artifactId>seata-spring-boot-starter</artifactId>
            <groupId>io.seata</groupId>
        </exclusion>
    </exclusions>
</dependency>
<dependency>
    <groupId>io.seata</groupId>
    <artifactId>seata-spring-boot-starter</artifactId>
    <!--seata starter 采用1.4.2版本-->
    <version>${seata.version}</version>
</dependency>

7.3.2. Configure TC address

In application.yml in order-service, configure the TC service information, and obtain the TC address through the registration center nacos combined with the service name:

seata:
  registry: # TC服务注册中心的配置,微服务根据这些信息去注册中心获取tc服务地址
    type: nacos # 注册中心类型 nacos
    nacos:
      server-addr: 127.0.0.1:8848 # nacos地址
      namespace: "" # namespace,默认为空
      group: DEFAULT_GROUP # 分组,默认是DEFAULT_GROUP
      application: seata-tc-server # seata服务名称
      username: nacos
      password: nacos
  tx-service-group: seata-demo # 事务组名称
  service:
    vgroup-mapping: # 事务组与cluster的映射关系
      seata-demo: SH

How does the microservice find the address of the TC based on these configurations?

We know that microservices registered in Nacos need four pieces of information to determine a specific instance:

  • namespace: Namespace
  • group: grouping
  • application: service name
  • cluster: cluster name

The above four information can be found in the yaml file just now:
insert image description here

The namespace is empty, which is the default public

Combined, the TC service information is: public@DEFAULT_GROUP@seata-tc-server@SH, so that the TC service cluster can be determined. Then you can go to Nacos to pull the corresponding instance information.

Eight, hands-on practice

8.1, XA mode

The XA specification is a distributed transaction processing (DTP, Distributed Transaction Processing) standard defined by the X/Open organization. The XA specification describes the interface between the global TM and the local RM. Almost all mainstream databases support the XA specification. .

8.1.1. Two-phase commit

XA is a specification. Currently, mainstream databases have implemented this specification. The principle of implementation is based on two-phase commit.

normal circumstances:
insert image description here

abnormal situation:
insert image description here

Phase one:

  • The transaction coordinator notifies each transaction participant to perform a local transaction
  • After the execution of the local transaction is completed, report the transaction execution status to the transaction coordinator. At this time, the transaction does not commit and continues to hold the database lock

Phase two:

  • The transaction coordinator judges the next step based on the report of the first stage
    • If all phases are successful, notify all transaction participants and commit the transaction
    • If any participant in a phase fails, notify all transaction participants to roll back the transaction

8.1.2 Seata's XA model

Seata simply encapsulates and transforms the original XA mode to adapt to its own transaction model. The basic architecture is shown in the figure:
insert image description here

The work of the first stage of RM:

​ ① Register the branch transaction to TC

② Execute branch business sql but do not submit

③ Report execution status to TC

The work of the second phase of TC:

  • TC detects the transaction execution status of each branch

    a. If all succeed, notify all RMs to commit the transaction

    b. If there is a failure, notify all RMs to roll back the transaction

The work of the second phase of RM:

  • Receive TC instructions, commit or rollback transactions

8.1.3 Advantages and disadvantages

What are the advantages of XA mode?

  • The strong consistency of transactions meets the ACID principle.
  • Commonly used databases are supported, the implementation is simple, and there is no code intrusion

What are the disadvantages of XA mode?

  • Because the database resources need to be locked in the first stage and released only after the end of the second stage, the performance is poor
  • Relying on Relational Databases to Realize Transactions

8.1.4. Realize the XA mode

Seata's starter has completed the automatic assembly of the XA mode, and the implementation is very simple. The steps are as follows:

1) Modify the application.yml file (each microservice participating in the transaction), and enable the XA mode:

seata:
  data-source-proxy-mode: XA

2) Add the @GlobalTransactional annotation to the entry method that initiates the global transaction:

In this case it is the create method in OrderServiceImpl.

insert image description here

3) Restart the service and test

Restart order-service, test again, and find that no matter what, the three microservices can be rolled back successfully.

8.2, AT mode

The AT mode is also a phased commit transaction model, but it makes up for the long resource locking period in the XA model.

8.2.1, Seata's AT model

Basic flowchart:
insert image description here

Phase 1 RM work:

  • register branch transaction
  • Record undo-log (data snapshot)
  • Execute business sql and submit
  • report transaction status

The work of RM at the time of phase 2 submission:

  • Just delete the undo-log

The work of RM during phase 2 rollback:

  • Restore data to before update according to undo-log

8.2.2. Process review

Let's sort out the principle of the AT mode with a real business.

For example, now there is another database table that records user balances:

id money
1 100

The SQL to be executed by one of the branch services is:

update tb_account set money = money - 10 where id = 1

In AT mode, the current branch transaction execution flow is as follows:

Phase one:

1) TM initiates and registers global transactions to TC

2) TM calls branch transaction

3) Branch transactions are ready to execute business SQL

4) RM intercepts business SQL, queries original data according to where conditions, and forms a snapshot.

{
    
    
    "id": 1, "money": 100
}

5) RM executes business SQL, submits local transactions, and releases database locks. at this timemoney = 90

6) RM reports local transaction status to TC

Phase two:

1) TM notifies TC that the transaction is over

2) TC checks the branch transaction status

​ a) If all succeed, delete the snapshot immediately

b) If a branch transaction fails, it needs to be rolled back. Read snapshot data ( {"id": 1, "money": 100}), restore the snapshot to the database. At this point the database is restored to 100 again

flow chart:
insert image description here

8.2.3 The difference between AT and XA

Briefly describe the biggest difference between AT mode and XA mode?

  • The XA mode does not commit transactions in the first stage, and locks resources; the AT mode commits directly in the first stage, without locking resources.
  • The XA mode relies on the database mechanism to achieve rollback; the AT mode uses data snapshots to achieve data rollback.
  • Strong consistency in XA mode; final consistency in AT mode

8.2.4. Dirty write problem

When multiple threads access distributed transactions in AT mode concurrently, dirty write problems may occur, as shown in the figure:

insert image description here

The solution is to introduce the concept of a global lock. Before releasing the DB lock, get the global lock first. Avoid another transaction to operate the current data at the same time.

insert image description here

8.2.5. Advantages and disadvantages

Advantages of AT mode:

  • Complete the direct submission of transactions in one stage, release database resources, and have better performance
  • Using global locks to achieve read-write isolation
  • No code intrusion, the framework automatically completes rollback and commit

Disadvantages of AT mode:

  • The soft state between the two phases belongs to the final consistency
  • The snapshot function of the framework will affect performance, but it is much better than XA mode

8.2.6. Implement AT mode

Actions such as snapshot generation and rollback in AT mode are automatically completed by the framework without any code intrusion, so the implementation is very simple.

However, AT mode requires a table to record global locks and another table to record data snapshot undo_log.

1) Import the database table and record the global lock

2) Modify the application.yml file and change the transaction mode to AT mode:

seata:
  data-source-proxy-mode: AT # 默认就是AT

3) Restart the service and test

8.3, TCC mode

The TCC mode is very similar to the AT mode, and each stage is an independent transaction. The difference is that TCC implements data recovery through manual coding. Three methods need to be implemented:

  • Try: resource detection and reservation;

  • Confirm: Complete the resource operation business; it is required that the Try succeeds and the Confirm must succeed.

  • Cancel: Reserved resources are released, which can be understood as the reverse operation of try.

8.3.1. Process Analysis

For example, a business that deducts user balance. Assuming that the original balance of account A is 100, the balance needs to be deducted by 30 yuan.

  • Stage 1 (Try) : Check whether the balance is sufficient, if it is sufficient, the frozen amount will be increased by 30 yuan, and the available balance will be deducted by 30 yuan

Initial balance:

insert image description here

The balance is sufficient and can be frozen:

insert image description here

At this point, the total amount = frozen amount + available amount, and the quantity remains unchanged at 100. Transactions commit directly without waiting for other transactions.

  • Phase 2 (Confirm) : If you want to submit (Confirm), the frozen amount will be deducted by 30

Confirm that it can be submitted, but the available amount has been deducted before, so just clear the frozen amount here:

insert image description here

At this point, the total amount = frozen amount + available amount = 0 + 70 = 70 yuan

  • Phase 2 (Cancel) : If you want to roll back (Cancel), the frozen amount will be deducted by 30, and the available balance will be increased by 30

If a rollback is required, the frozen amount must be released and the available amount restored:

[External link picture transfer failed, the source site may have an anti-theft link mechanism, it is recommended to save the picture and upload it directly (img-22bmMZzh-1681625137853)(assets/image-20210724182810734.png)]

8.3.2 Seata's TCC model

The TCC model in Seata still continues the previous transaction architecture, as shown in the figure:

insert image description here

8.3.3 Advantages and disadvantages

What does each stage of TCC mode do?

  • Try: resource checking and reservation
  • Confirm: business execution and submission
  • Cancel: release of reserved resources

What are the advantages of TCC?

  • Complete the direct commit transaction in one stage, release database resources, and have good performance
  • Compared with the AT model, there is no need to generate snapshots, no need to use global locks, and the performance is the strongest
  • Does not rely on database transactions, but relies on compensation operations, which can be used for non-transactional databases

What are the disadvantages of TCC?

  • There is code intrusion, and it is too troublesome to manually write try, confirm and cancel interfaces
  • Soft state, transactions are eventually consistent
  • It is necessary to consider the failure of Confirm and Cancel, and do idempotent processing

8.3.4, transaction suspension and empty rollback

8.3.4.1, empty rollback

When the try phase of a branch transaction is blocked , it may cause the global transaction to time out and trigger the cancel operation of the second phase. When the try operation is not executed, the cancel operation is executed first. At this time, the cancel cannot be rolled back, which is an empty rollback .

As shown in the picture:

insert image description here

When executing the cancel operation, it should be judged whether the try has been executed, and if it has not been executed, it should be rolled back empty.

8.3.4.2. Service Suspension

For the business that has been rolled back empty, the previously blocked try operation resumes, and if the try continues to be executed, it will never be possible to confirm or cancel, and the transaction is always in an intermediate state, which is the business suspension .

When executing the try operation, it should be judged whether the cancel has been executed. If it has been executed, the try operation after the empty rollback should be prevented to avoid suspension

8.3.5. Realize TCC mode

To solve the problem of empty rollback and business suspension, it is necessary to record the current transaction status, is it in try or cancel?

8.3.5.1, Thought Analysis

Here we define a table:

CREATE TABLE `account_freeze_tbl` (
  `xid` varchar(128) NOT NULL,
  `user_id` varchar(255) DEFAULT NULL COMMENT '用户id',
  `freeze_money` int(11) unsigned DEFAULT '0' COMMENT '冻结金额',
  `state` int(1) DEFAULT NULL COMMENT '事务状态,0:try,1:confirm,2:cancel',
  PRIMARY KEY (`xid`) USING BTREE
) ENGINE=InnoDB DEFAULT CHARSET=utf8 ROW_FORMAT=COMPACT;

in:

  • xid: is the global transaction id
  • freeze_money: used to record the frozen amount of the user
  • state: used to record transaction status

At this time, how should we start our business?

  • Try business:
    • Record the frozen amount and transaction status to the account_freeze table
    • Deduct the amount available in the account table
  • Confirm business
    • Delete the frozen record of account_freeze table according to xid
  • Cancel business
    • Modify the account_freeze table, the frozen amount is 0, and the state is 2
    • Modify the account table to restore the available amount
  • How to judge whether the rollback is empty?
    • In the cancel business, query account_freeze according to the xid, if it is null, it means that the try has not been done yet, and an empty rollback is required
  • How to avoid business suspension?
    • In the try business, query account_freeze according to xid, if it already exists, it proves that Cancel has been executed, and refuses to execute the try business

Next, we transform the account-service and use TCC to implement the balance deduction function.

8.3.5.2, declare TCC interface

TCC's Try, Confirm, and Cancel methods all need to be declared in the interface based on annotations.

cn.angyan.account.serviceWe create a new interface in the package in the account-service project , and declare three interfaces of TCC:

@LocalTCC
public interface AccountTCCService {
    
    

    @TwoPhaseBusinessAction(name = "deduct", commitMethod = "confirm", rollbackMethod = "cancel")
    void deduct(@BusinessActionContextParameter(paramName = "userId") String userId,
                @BusinessActionContextParameter(paramName = "money")int money);

    boolean confirm(BusinessActionContext ctx);

    boolean cancel(BusinessActionContext ctx);
}

3) Write the implementation class

Create a new class under the package in the account-service service cn.angyan.account.service.implto implement the TCC service:

@Service
@Slf4j
public class AccountTCCServiceImpl implements AccountTCCService {
    
    

    @Autowired
    private AccountMapper accountMapper;
    @Autowired
    private AccountFreezeMapper freezeMapper;

    @Override
    @Transactional
    public void deduct(String userId, int money) {
    
    
        // 0.获取事务id
        String xid = RootContext.getXID();
        // 1.扣减可用余额
        accountMapper.deduct(userId, money);
        // 2.记录冻结金额,事务状态
        AccountFreeze freeze = new AccountFreeze();
        freeze.setUserId(userId);
        freeze.setFreezeMoney(money);
        freeze.setState(AccountFreeze.State.TRY);
        freeze.setXid(xid);
        freezeMapper.insert(freeze);
    }

    @Override
    public boolean confirm(BusinessActionContext ctx) {
    
    
        // 1.获取事务id
        String xid = ctx.getXid();
        // 2.根据id删除冻结记录
        int count = freezeMapper.deleteById(xid);
        return count == 1;
    }

    @Override
    public boolean cancel(BusinessActionContext ctx) {
    
    
        // 0.查询冻结记录
        String xid = ctx.getXid();
        AccountFreeze freeze = freezeMapper.selectById(xid);

        // 1.恢复可用余额
        accountMapper.refund(freeze.getUserId(), freeze.getFreezeMoney());
        // 2.将冻结金额清零,状态改为CANCEL
        freeze.setFreezeMoney(0);
        freeze.setState(AccountFreeze.State.CANCEL);
        int count = freezeMapper.updateById(freeze);
        return count == 1;
    }
}

8.4, SAGA mode

Saga mode is Seata's upcoming open source long transaction solution, which will be mainly contributed by Ant Financial.

Its theoretical basis is the paper Sagas published by Hector & Kenneth in 1987 .

Seata official website guide for Saga: https://seata.io/zh-cn/docs/user/saga.html

8.4.1 Principle

In the Saga mode, there are multiple participants in the distributed transaction, and each participant is a reversal compensation service, which requires the user to implement its forward operation and reverse rollback operation according to the business scenario.

During the execution of the distributed transaction, the forward operations of each participant are executed sequentially. If all the forward operations are executed successfully, the distributed transaction is committed. If any forward operation fails, the distributed transaction will go back to perform the reverse rollback operation of the previous participants, roll back the submitted participants, and return the distributed transaction to the initial state.

insert image description here

Saga is also divided into two stages:

  • Phase 1: Submit local transactions directly
  • Phase 2: If it succeeds, do nothing; if it fails, it will roll back by writing compensation business

4.4.2. Advantages and disadvantages

advantage:

  • Transaction participants can implement asynchronous calls based on event-driven, high throughput
  • Submit transactions directly in one stage, no locks, good performance
  • It is easy to implement without writing the three stages in TCC

shortcoming:

  • The duration of the soft state is uncertain and the timeliness is poor
  • No locks, no transaction isolation, dirty writes

8.5. Comparison of four modes

We compare the four implementations in the following aspects:

  • Consistency: Can transaction consistency be guaranteed? Strong consistency or eventual consistency?
  • Isolation: How isolated are transactions?
  • Code intrusion: Do you need to modify the business code?
  • Performance: Is there any performance loss?
  • Scenarios: common business scenarios

As shown in the picture:

[External link image transfer failed, the source site may have an anti-leeching mechanism, it is recommended to save the image and upload it directly (img-ZuvNBgRK-1681625705977)(assets/image-20210724185021819.png)]

Nine, high availability

9.1. High availability cluster structure

Building a TC service cluster is very simple, just start multiple TC services and register with nacos.

However, the cluster cannot guarantee 100% security. What if the computer room where the cluster is located fails? Therefore, if the requirements are high, disaster recovery with multiple computer rooms in different places is generally done.

For example, one TC cluster is in Shanghai and another TC cluster is in Hangzhou:

[External link picture transfer failed, the source site may have an anti-leeching mechanism, it is recommended to save the picture and upload it directly (img-LFdsdFRM-1681625979654)(assets/image-20210724185240957.png)]

The microservice finds which TC cluster should be used based on the mapping relationship between the transaction group (tx-service-group) and the TC cluster. When the SH cluster fails, you only need to change the mapping relationship in vgroup-mapping to HZ. Then all microservices will be switched to HZ's TC cluster.

9.2, to achieve high availability cluster

9.2.1, TC cluster that simulates remote disaster recovery

It is planned to start two seata tc service nodes:

node name ip address The port number cluster name
set 127.0.0.1 8091 SH
seata2 127.0.0.1 8092 HZ

We have started a seata service before, the port is 8091, and the cluster name is SH.

Now, copy the seata directory and name it seata2

Modify seata2/conf/registry.conf as follows:

registry {
  # tc服务的注册中心类,这里选择nacos,也可以是eureka、zookeeper等
  type = "nacos"

  nacos {
    # seata tc 服务注册到 nacos的服务名称,可以自定义
    application = "seata-tc-server"
    serverAddr = "127.0.0.1:8848"
    group = "DEFAULT_GROUP"
    namespace = ""
    cluster = "HZ"
    username = "nacos"
    password = "nacos"
  }
}

config {
  # 读取tc服务端的配置文件的方式,这里是从nacos配置中心读取,这样如果tc是集群,可以共享配置
  type = "nacos"
  # 配置nacos地址等信息
  nacos {
    serverAddr = "127.0.0.1:8848"
    namespace = ""
    group = "SEATA_GROUP"
    username = "nacos"
    password = "nacos"
    dataId = "seataServer.properties"
  }
}

Enter the seata2/bin directory, and then run the command:

seata-server.bat -p 8092

Open the nacos console to view the service list:
insert image description here

9.2.2. Configure transaction group mapping to nacos

Next, we need to configure the mapping relationship between tx-service-group and cluster to the nacos configuration center.

Create a new configuration:
insert image description here
the content of the configuration is as follows:

# 事务组映射关系
service.vgroupMapping.seata-demo=SH

service.enableDegrade=false
service.disableGlobalTransaction=false

# 与TC服务的通信配置
transport.type=TCP
transport.server=NIO
transport.heartbeat=true
transport.enableClientBatchSendRequest=false
transport.threadFactory.bossThreadPrefix=NettyBoss
transport.threadFactory.workerThreadPrefix=NettyServerNIOWorker
transport.threadFactory.serverExecutorThreadPrefix=NettyServerBizHandler
transport.threadFactory.shareBossWorker=false
transport.threadFactory.clientSelectorThreadPrefix=NettyClientSelector
transport.threadFactory.clientSelectorThreadSize=1
transport.threadFactory.clientWorkerThreadPrefix=NettyClientWorkerThread
transport.threadFactory.bossThreadSize=1
transport.threadFactory.workerThreadSize=default
transport.shutdown.wait=3

# RM配置
client.rm.asyncCommitBufferLimit=10000
client.rm.lock.retryInterval=10
client.rm.lock.retryTimes=30
client.rm.lock.retryPolicyBranchRollbackOnConflict=true
client.rm.reportRetryCount=5
client.rm.tableMetaCheckEnable=false
client.rm.tableMetaCheckerInterval=60000
client.rm.sqlParserType=druid
client.rm.reportSuccessEnable=false
client.rm.sagaBranchRegisterEnable=false

# TM配置
client.tm.commitRetryCount=5
client.tm.rollbackRetryCount=5
client.tm.defaultGlobalTransactionTimeout=60000
client.tm.degradeCheck=false
client.tm.degradeCheckAllowTimes=10
client.tm.degradeCheckPeriod=2000

# undo日志配置
client.undo.dataValidation=true
client.undo.logSerialization=jackson
client.undo.onlyCareUpdateColumns=true
client.undo.logTable=undo_log
client.undo.compress.enable=true
client.undo.compress.type=zip
client.undo.compress.threshold=64k
client.log.exceptionRate=100

9.2.3. Microservice reads nacos configuration

Next, you need to modify the application.yml file of each microservice to let the microservice read the client.properties file in nacos:

seata:
  config:
    type: nacos
    nacos:
      server-addr: 127.0.0.1:8848
      username: nacos
      password: nacos
      group: SEATA_GROUP
      data-id: client.properties

Restart the microservice. Whether the microservice is connected to tc's SH cluster or tc's HZ cluster is determined by the client.properties of nacos.

Guess you like

Origin blog.csdn.net/shuai_h/article/details/130163134