SpringCloud service fault tolerance——Hystrix

Preface

Before understanding this part of the content, let me ask you a few questions. Do you always hear friends working on the Internet around you downgrading or circuit breaker one by one? How do you feel when you hear this?
I want to interrupt, but I don’t know what to say; I want to say something, but I am afraid that I will say the wrong thing and be laughed at.
Downgrades and circuit breakers sound very high-end. In what application scenarios can they be applied? In what application scenarios can downgrade and circuit breaker be used? Does my small workshop’s one-third-acre business also need to be downgraded and circuit breakered? How far is it from our real business?

With these questions, we kick off the SpringCloud service fault tolerance-Hystrix.

Service Fault Tolerance - Hystrix

Yes, it is part of Netflix. Now you have realized how awesome this film and television company is. I didn’t expect that a film and television company would be so careless about its business and make such a big contribution to the microservice architecture. Let alone Let domestic film and television companies do this. This serious IT company is still busy selling fake medicines every day. How can our foreign friends’ company Netfilx be so dedicated?

Hystrix means porcupine. Why is service fault tolerance related to porcupine?
I don’t know either. . . In short, you must name your project so that others cannot guess what your project is about. Then the name will be quite successful.

Okay, without further ado, let’s take a look at the following content.

Service fault tolerance solution

Service avalanche

You may have heard of the cache avalanche, right, but what is more violent than the cache avalanche is the service avalanche. Through the following picture, let everyone see what a service avalanche is:
Insert image description here
In the picture above, we have three services, namely service A, service B, and service C. Since service C has relatively few calls, we allocate relatively few nodes to it.
In our Tomcat thread pool, there are five requests, all of which initiated access to the service through the Tomcat thread pool. If the first two requests are called in service A, and then from service A Service B is called, and the next three requests directly call service B. Then, while these requests call service B, they also call service C inside service B.

Then we simulate a scenario like this:
Insert image description here

Here, if service C has some reasons, such as the database connection pool being exhausted, causing query statements and write statements to be particularly slow, then the service that calls service C will likely get a Timeout response, that is, the service times out, then all arrivals will C's requests have all timed out, so if we don't do any control, the timeout will spread to service B. Service B will be sacrificed, and service A will also be sacrificed.

Then the entire service link, because Timeout is not controlled, causes the timeout problem to spread from service C to B and then to A. It is like the butterfly effect, which starts from a small node and gradually spreads to the entire service cluster and the entire service link. On the business link, then we call it a service avalanche. If this happens, all subsequent requests will be inaccessible, because your entire backend cluster is paralyzed, and more requests will only bring more trouble to the backend. A lot of pressure.

At this time we need to use circuit breaker and downgrade methods.
In fact, compared to downgrading, circuit breaker is a thunderous method. Usually circuit breaker is based on downgrade.

Container thread exhausted

We all know that Tomcat has a thread pool at the container level specifically to receive incoming requests. Let’s take a look at this picture:
Insert image description here
Here we have allocated three service nodes with the same number. Our requests were all accessed at will, but suddenly an accident happened again:
Insert image description here
At this time, the two service Cs hung up again, but they were not completely hung up here. They were in a half-dead state. In other words, it can also handle other requests, but its response is extremely slow due to performance bottlenecks or various reasons. At this time, all your requests will be blocked here for a long time.
It doesn’t matter that these two services C are half-dead, but it will affect the calling service party, which will cause the request calling service C not to be returned for a long time, then this will As time goes by, the threads in the Tomcat thread pool are gradually occupied by requests to access C. In other words, the response caused by service C is too slow, which may be transmitted to the Tomcat container thread pool, causing all access to C to fail. Requests are in a pending state. The longer this lasts, the more likely it is that the container will be filled with requests to access C, which will result in requests to other servers being unable to be processed.

In this scenario, we need to use the thread isolation solution, Hystrix's unique skill.

Guess you like

Origin blog.csdn.net/qq_45455361/article/details/121504776