[SpringCloud Alibaba] (5) Service avalanche and fault tolerance solutions

In the previous article, we implemented remote calls between user microservices, product microservices, and order microservices, and implemented load balancing of service calls.

However, there is an obvious problem in the system now:That is, if the system's concurrency increases, the system does not have fault tolerance, which may cause the system to be unavailable or directly down. Therefore, our system needs to support fault tolerance.

1. Impact of concurrency on the system

When the architecture design of a system adopts the microservice architecture model, the large and complex business will be split into small microservices, and each microservice will call each other in the form of interfaces or RPCs.

In the process of calling, network problems will be involved, coupled with the reasons of the microservice itself, such as it is difficult to achieve 100% high availability, etc. If one or some of the many microservices have a problem, are unavailable, or are down, there will be a delay when other microservices call the interfaces of these microservices. If a large number of requests enter the system at this time, it will cause a large accumulation of request tasks and even cause the overall service to be paralyzed.

2. Pressure test

2.1 Description of pressure test

In order to more intuitively explain the impact of high concurrency and large traffic scenarios on the system when the system does not have fault tolerance, we simulate a concurrent scenario here:

concurrentRequest()Add a new method in the OrderController class of the order microservice shop-order . The source code is as follows:

@GetMapping("/concurrentRequest")
public String concurrentRequest() {
    
    
    log.info("测试业务在高并发场景下是否存在问题");
    return "concurrentRequest";
}


Next, for a better demonstration effect, we limit the maximum number of concurrent requests processed by Tomcat and add the following configuration to the application.yml file in the resources directory of the order microservice shop-order :

server:
  port: 8080
  tomcat:
    max-threads: 20

Tomcat is limited to handling a maximum of 20 requests at a time. Next, we use JMeter
http://localhost:8080/order/submit_orderto perform stress testing on the interface.

Since there is no fault tolerance processing in the order microservice, when http://localhost:8080/order/submit_orderthe request pressure on the interface is too high, http://localhost:8080/order/concurrentRequestwhen we access the interface again, we will find that http://localhost:8080/order/concurrentRequestthe interface will be affected by concurrent requests, and the access will be very slow or even impossible to access at all.

2.2 Actual pressure test

Use JMeter http://localhost:8080/order/submit_orderto perform stress testing on the interface. The configuration process of JMeter is as follows.

1. Open the main interface of JMeter, as shown below:

insert image description here

2. Right-click the test plan in JMeter and add a thread group, as shown below:

insert image description here
3. Configure the number of concurrent threads in the thread group in JMeter, as follows:

insert image description here
The configuration information is as follows:

  • Number of threads: 50
  • Ramp-Up Time: 0
  • Number of cycles: 100

The meaning of the configuration information: means that JMeter will send 50 requests to the system at the same time each time until it is sent 100 times.

4. Right-click the thread group in JMeter and add an HTTP request, as shown below:

insert image description here
5. Configure the HTTP request in JMeter as follows:

insert image description here
The specific configuration is as follows:

  • Protocol: http
  • Server name or IP: localhost
  • Port number: 8080
  • Method: GET
  • 路径:/order/submit_order?userId=1001&productId=1001&count=1
  • Content encoding: UTF-8

6. After configuring JMeter, click the green triangle on JMeter to start the stress test, as shown below:

insert image description here
After clicking, a pop-up box will pop up that needs to save the JMeter script. Just click Save according to actual needs.

7. View the results: Right-click the HTTP request: Add->Listener->View the result tree:

insert image description here

8. Formal stress test. Start user, product, order microservices

insert image description here

After clicking Save, start http://localhost:8080/order/submit_orderstress testing the interface. During the stress testing process, you will find that the order microservice will be stuck when printing logs. At the same time, accessing the interface in a browser or other tools will be stuck, or even cannot be accessed at all http://localhost:8080/order/concurrent_request. .

This shows that once the concurrent access to an interface in the order microservice is too high, other interfaces will also be affected, resulting in the overall order microservice being unavailable. To illustrate this problem, let’s take a look at what the service avalanche is.

3. Service avalanche

After the system adopts a distributed or microservice architecture model, it is difficult for general services to achieve 100% high availability due to problems with the network or the service itself. If there is a problem with one service, it may cause cascading problems in other services. This faulty problem will continue to spread throughout the system, leading to service unavailability or even downtime, which will ultimately have catastrophic consequences for the entire system.

In order to prevent service avalanches to the greatest extent, each microservice that makes up the overall system needs to support service fault tolerance.

4. Service Fault Tolerance Scheme

To a certain extent, service fault tolerance means doing your best to be compatible with the occurrence of error situations, because in distributed and microservice environments, some abnormal situations will inevitably occur. We must consider this when designing distributed and microservice systems. To prevent the occurrence of these abnormal situations, the system has service fault tolerance.

Common service error solutions include: service current limiting, service isolation, service timeout, service circuit breaker, service degradation, etc.

4.1 Service current limit

Service throttling is to limit the traffic entering the system to prevent excessive traffic entering the system from overwhelming the system. Its main function is to protect service nodes or data nodes behind the cluster to prevent excessive instantaneous traffic from crashing services and data (such as a large amount of front-end cache effects) and causing unavailability; it can also be used to smooth requests.

There are two current limiting algorithms, one is a simple total request count, and the other is time window current limiting (usually 1s). For example, the token bucket algorithm and the missing card bucket algorithm are time window current limiting algorithms.

4.2 Service isolation

Service isolation is somewhat similar to the vertical splitting of the system. It divides the system into multiple service modules according to certain rules, and each service module is independent of each other and does not have a strong dependency relationship. If a split service fails, the impact of the failure can be limited to a specific service and will not spread to other services. Naturally, it will not have a fatal impact on the overall service.

Commonly used service isolation methods in the Internet industry include: thread pool isolation and semaphore isolation.

4.3 Service timeout

After the entire system adopts distributed and microservice architecture, the system is split into small services, and there will be a phenomenon of mutual calls between services, thus forming a chain of calls. Among the two services that form a call chain relationship, the service that actively calls other service interfaces is at the upstream of the call chain, and the service that provides interfaces for other services to call is at the downstream of the call chain.

Service timeout is to set a maximum response time when an upstream service calls a downstream service. If the downstream service has not returned a result beyond this maximum response time, the request connection between the upstream service and the downstream service will be disconnected and resources will be released.

4.4 Service circuit breaker

In distributed and microservice systems, if a downstream service responds slowly or keeps failing to call due to excessive access pressure, the upstream service will temporarily disconnect from the downstream service in order to ensure the overall availability of the system. This method is fusing.

Service circuit breakers generally have three states: closed, open, and semi-circuited.

  • Closed state: When the service is normal and there are no faults, there will be no restrictions when the upstream service calls the downstream service.
  • Open state: The upstream service no longer calls the interface of the downstream service, but will directly return the predetermined method in the upstream service.
  • Semi-circuit state: When in the open state, the upstream service will try to resume calls to the downstream service according to certain rules. At this time, the upstream service will call the downstream service with limited traffic, and at the same time, the success rate of the call will be monitored. If the success rate meets expectations, it enters the shutdown state. If it does not meet expectations, it will re-enter the open state.

4.5 Service downgrade

Service downgrade, to put it bluntly, is a service underpinning solution. If the service cannot complete the normal calling process, the default underpinning solution will be used to return data. For example, product introduction information is generally displayed on the product details page. Once the product details page system fails and cannot be called, the product introduction information in the cache will be directly obtained and returned to the front-end page.

code address

The code has been uploaded to Code Cloud, Code Cloud address

Among them, the database file is located dbunder the folder.

Guess you like

Origin blog.csdn.net/sco5282/article/details/131904981