SpringCloud-Hystrix service fuse and downgrade working principle & source code

First attach the Hystrix source map

In the microservice architecture, it is divided into individual services according to the business, and services can call each other (RPC). In Spring Cloud, RestTemplate+Ribbon and Feign can be used to call. In order to ensure its high availability, a single service is usually deployed in a cluster. Due to network reasons or its own reasons, the service cannot be guaranteed to be 100% available. If there is a problem with a single service, thread blocking will occur when calling this service. At this time, if a large number of requests flood in, the thread resources of the Servlet container will be consumed. , causing the service to be paralyzed. Due to the dependency between services, faults will propagate and have catastrophic consequences for the entire microservice system. This is the "avalanche" effect of service faults.

To solve this problem, the industry proposes the circuit breaker model.

In life, if the load of the circuit is too high, the safe will automatically trip to protect various electrical appliances in the home. This is a living example of a fuse. There is also such a fuse in Hystrix. When the service it depends on is unstable, it can be automatically broken and provide damaged services to protect the stability of the service. During operation, Hystrix will collect and count these data according to the execution status of the interface (success, failure, timeout, and rejection), and make real-time decisions on whether to fuse based on this information.

1. Introduction to Hystrix

Netflix has created a library called Hystrix that implements the circuit breaker pattern. In a microservice architecture it is common to have multiple layers of service calls.

. ——Excerpt from the official website

Netflix has open sourced the Hystrix component to implement the circuit breaker pattern, and SpringCloud has integrated this component. In the microservice architecture, it is very common for a request to call multiple services, as shown in the following figure:

Failure of lower-level services can cause cascading failures. When the unavailability of calls to a particular service reaches a threshold (Hystric is 20 times in 5 seconds) the circuit breaker will be opened.

After the circuit breaker is opened, cascading failures can be avoided, and the fallback method can directly return a fixed value.

What is Hystrix?

In a distributed system, each service may call many other services. The called services are dependent services . Sometimes it is normal for some dependent services to fail.

Hystrix allows us to control calls between services in a distributed system, adding some call delays or fault- tolerant mechanisms that depend on failures .

Hystrix isolates resources of dependent services to prevent a dependent service from spreading among all dependent service calls in the entire system when a dependent service fails; at the same time, Hystrix also provides a fallback degradation mechanism when a failure occurs.

All in all, Hystrix helps us improve the availability and stability of distributed systems through these methods.

History of Hystrix

Hystrix is a framework for high availability assurance. The API team of Netflix (which can be considered as foreign video sites such as Youku or iQiyi) has been doing some work to improve system usability and stability since 2011, and Hystrix has been developed since then.

In 2012, Hystrix became more mature and stable. In Netflix, in addition to the API team, many other teams began to use Hystrix.

Today, there are billions of inter-service calls in Netflix every day through the Hystrix framework, and Hystrix has also helped the Netflix website improve the overall availability and stability.

In November 2018, Hystrix announced on its Github homepage that new features will no longer be available, and developers are recommended to use other open source projects that are still active. The change in maintenance mode in no way means that Hystrix is no longer valuable. On the contrary, Hystrix has inspired many great ideas and projects, and our knowledge of high availability will still be explained for Hystrix.

Hystrix Design Principles

• Control and fault-tolerant protection for invocation delay and invocation failure when invoking dependent services .

• In a complex distributed system, prevent a failure of a dependent service from spreading throughout the system. For example, if a certain service fails, other services will also fail.

• Provide support for fail-fast (fast failure) and fast recovery.

• Provide fallback support for graceful degradation.

•Support near real-time monitoring, alarm and operation and maintenance operations.

• Prevent any dependent service from exhausting all resources, such as all thread resources in tomcat.

•Avoid request queuing and backlog, and use current limiting and fail fast to control failure.

• Provide a fallback downgrading mechanism to deal with failures.

• Use resource isolation techniques, such as bulkhead (bulkwall isolation technology), circuit breaker (circuit breaker technology) to limit the impact of any failure of a dependent service.

• Improve the speed of fault discovery through near real-time statistics/monitoring/alarm functions.

• Improve the speed of fault handling and recovery through near real-time attribute and configuration hot modification functions.

• Protect all failure conditions that depend on service invocation, not just network failure conditions.

2. Demo demonstration

1: Use a circuit breaker on the ribbon

To modify the code of the serice-ribbon project, first add the starting dependency of spring-cloud-starter-hystrix to the pox.xml file:

<dependency>
 <groupId>org.springframework.cloud</groupId>
 <artifactId>spring-cloud-starter-hystrix</artifactId>
</dependency>

Add the @EnableHystrix annotation to the startup class SpringCloudServiceRibbonApplication of the program to enable Hystrix:

@SpringBootApplication
@EnableDiscoveryClient
@EnableHystrix
@EnableHystrixDashboard
public class SpringCloudServiceRibbonApplication {
 public static void main(String[] args) {
     SpringApplication.run(SpringCloudServiceRibbonApplication.class, args);
 } 
 @Bean
 @LoadBalanced
 RestTemplate restTemplate(){
    return new RestTemplate();
 }
}

Transform the UserService class and add the @HystrixCommand annotation to the query method. This annotation creates a fuse function for this method, and specifies the fallbackMethod fuse method. The fuse method returns an object directly. The code is as follows:

@Service
public class UserService {

 @Autowired
 RestTemplate restTemplate;
 
 @HystrixCommand(commandKey="queryCommandKey",groupKey = "queryGroup",threadPoolKey="queryThreadPoolKey",fallbackMethod = "queryFallback",
 commandProperties = {
     @HystrixProperty(name = "execution.isolation.thread.timeoutInMilliseconds", value = "100"),//指定多久超时，单位毫秒。超时进fallback
     @HystrixProperty(name = "circuitBreaker.requestVolumeThreshold", value = "3"),//判断熔断的最少请求数，默认是10；只有在一个统计窗口内处理的请求数量达到这个阈值，才会进行熔断与否的判断
     @HystrixProperty(name = "circuitBreaker.errorThresholdPercentage", value = "50"),//判断熔断的阈值，默认值50，表示在一个统计窗口内有50%的请求处理失败，会触发熔断
 },
 threadPoolProperties = {
     @HystrixProperty(name = "coreSize", value = "30"),
     @HystrixProperty(name = "maxQueueSize", value = "100"),
     @HystrixProperty(name = "keepAliveTimeMinutes", value = "2"),
     @HystrixProperty(name = "queueSizeRejectionThreshold", value = "15"),
     @HystrixProperty(name = "metrics.rollingStats.numBuckets", value = "10"),
     @HystrixProperty(name = "metrics.rollingStats.timeInMilliseconds", value = "100000")
 })
public List<User> query(){
     return restTemplate.getForObject("http://service-user/user/query",List.class);
}
 
public List<User> queryFallback(){
    List<User> list = new ArrayList<>();
     User user = new User();
     user.setId("1211");
     user.setName("queryFallback");
     list.add(user);
     return list;
}

}

Start: service-ribbon project, when we visit http://127.0.0.1:9527/user/query, the browser displays:

[{
		"id": "id0",
		"name": "testname0"
	},
	{
		"id": "id1",
		"name": "testname1"
	},
	{
		"id": "id2",
		"name": "testname2"
	}
]

Close the service-user project at this time, when we visit http://127.0.0.1:9527/user/query again, the browser will display:

[{
	"id": "1211",
	"name": "queryFallback"
}]

This means that when the service-user project is unavailable, when the service-ribbon calls the service-user API interface, it will perform a fast failure and directly return a set of strings instead of waiting for the response to time out, which is a good control of the container The thread is blocked.

2: Using circuit breakers in Feign

Feign has its own circuit breaker. In the D version of Spring Cloud, it is not turned on by default. It needs to be configured in the configuration file to open it, and add the following code to the configuration file:

feign:
 hystrix:
 enabled: true

For transformation based on the service-feign project, you only need to add the specified class of fallback to the annotation of FeignClient's UserService interface:

@FeignClient(value="service-user",fallback = UserServiceFallback.class)
public interface UserService {
 
 @RequestMapping(value="/user/query",method = RequestMethod.GET)
 public List<User> query();
 
}

UserServiceFallback needs to implement the UserService interface and inject it into the Ioc container. The code is as follows:

@Component
public class UserServiceFallback implements UserService {

  @Override
  public List<User> query() {
     List<User> list = new ArrayList<>();
     User user = new User();
     user.setId("1211");
     user.setName("feignFallback");
     list.add(user);
     return list;
  }
  
}

Start the servcie-feign project, and open http://127.0.0.1:9528/user/query in the browser . Note that the service-user project is not started at this time, and the web page displays:

[{
	"id": "1211",
	"name": "feignFallback",
	"date": null
}]

This proves that the circuit breaker is working.

Based on the service-ribbon transformation, Feign's transformation is the same as this.

It is preferred to introduce the starting dependency of spring-cloud-starter-hystrix-dashboard in pom.xml:

<dependency>
 <groupId>org.springframework.boot</groupId>
 <artifactId>spring-boot-starter-actuator</artifactId>
 </dependency>

 <dependency>
 <groupId>org.springframework.cloud</groupId>
 <artifactId>spring-cloud-starter-hystrix-dashboard</artifactId>
 </dependency>

Add the @EnableHystrixDashboard annotation to the main program startup class to enable hystrixDashboard:

@SpringBootApplication
@EnableDiscoveryClient
@EnableHystrix
@EnableHystrixDashboard
public class ServiceRibbonApplication {
 
 public static void main(String[] args) {
     SpringApplication.run(ServiceRibbonApplication.class, args);
 }
 
 @Bean
 @LoadBalanced
 RestTemplate restTemplate() {
     return new RestTemplate();
 }
 
}

3: Hystrix Dashboard (circuit breaker: Hystrix Dashboard)

Open the browser: visit http://localhost:9527/hystrix, the interface is as follows:

Click monitor stream to enter the next interface, visit: http://127.0.0.1:9527/user/query

At this point, the monitoring interface will appear:

4: Introduction to Hystrix Turbine

Looking at the data of a single Hystrix Dashboard is not of much value. If you want to see the Hystrix Dashboard data of this system, you need to use Hystrix Turbine. Hystrix Turbine integrates each service Hystrix Dashboard data. The use of Hystrix Turbine is very simple, you only need to introduce the corresponding dependencies and add annotations and configurations.

Three: Hystrix flowchart

The flow chart below shows how Hystrix works when using Hystrix's dependency request.

The following will analyze in more detail which actions occur in each step:

1. Construct a HystrixCommand or HystrixObservableCommand object.

The first step is to build a HystrixCommand or HystrixObservableCommand object, which will represent one of your dependency requests, and pass the parameters required by the request dependency to the constructor .

If building a dependency in a HystrixCommand returns a single response, for example:

HystrixCommand command = new HystrixCommand(arg1, arg2);

If the dependency needs to return an Observable to emit a response, it needs to be done by constructing a HystrixObservableCommand object, for example:

•HystrixObservableCommand command = new HystrixObservableCommand(arg1, arg2);

2. Execute the command

• There are 4 ways to execute a Hystrix command.

K value = command.execute();

Future<K> fValue = command.queue();

Observable<K> ohValue = command.observe(); //hot observable

•Observable<K> ocValue = command.toObservable(); //cold observable

The synchronous call method execute() is actually calling the queue().get() method, and the queue() method is called toObservable().toBlocking().toFuture(). That is to say, in the end, each HystrixCommand is passed Observable, even if the commands just return a simple single value.

3. Whether the response is cached

• If the request cache of this command has been enabled, and the response of this request already exists in the cache, then an Observable containing the cached response will be returned immediately (the Request Cache section will explain the cache of the request below).

4. Is the circuit breaker open?

When the command is executed, Hystrix checks to see if the looper is turned on.

If the looper is turned on (or tripped), then Hystrix will no longer execute the command, but will route directly to step 8 to get the fallback method and execute the fallback logic.

• If the looper is off, then it will go to step 5 to check if there is enough capacity to perform the task. (The capacity includes the capacity of the thread pool, the capacity of the queue, etc.).

5. Whether the thread pool, queue, and semaphore are full

• If the thread pool or queue related to the command is full, then Hystrix will not execute the command, but immediately jump to step 8 to execute the fallback logic.

6.HystrixObservableCommand.construct() 或者 HystrixCommand.run()

• Here, Hystrix calls the request for the dependency through the method logic you write, by calling one of the following:

HystrixObservableCommand.construct() — returns an Observable that emits a response or sends an onError() notification.

If the run() or construct() method exceeds the command's timeout value, the thread will throw a TimeoutException (or a separate timer thread if the command itself is not running in its own thread).

In this case, Hystrix routes the response to 8. Get fallback, which will discard the final return value of the run() or construct() method if the method does not cancel/interrupt.

Note that there is no way to force the underlying thread to stop working - the best Hystrix can do on the JVM is throw it an InterruptedException.

If the work wrapped by Hystrix does not respect InterruptedExceptions, the thread in the Hystrix thread pool will continue its work, although the client has received a TimeoutException.

This behavior can saturate the Hystrix thread pool despite the load being "correctly shed".

Most Java HTTP client libraries do not interpret InterruptedExceptions.

Therefore, make sure that connection and read/write timeouts are properly configured on the HTTP client.

If the command didn't throw any exceptions and it returned a response, Hystrix returns this response after doing some logging and metrics reporting.

In the case of run(), Hystrix returns an Observable that emits a single response, followed by an onCompleted notification;

In case of construct(), Hystrix returns the same Observable returned by construct().

7. Calculate the circuit index [Circuit Health]

Hystrix will report success, failure, rejection and timeout indicators to the looper. The looper contains a series of sliding window data and uses this data for statistics.

• It uses these statistics to decide whether the circuit breaker should be blown, and if it needs to be blown, it will not depend on the request for a certain period of time [short circuit request], and will re-close the circuit breaker when it checks the health of the request again.

8. Get FallBack

• When the command execution fails, Hystrix will try to execute custom Fallback logic:

Write a fallback method to provide a generic response that does not require network dependencies, and obtain data from memory cache or other static logic. If the network call must be required in the fallback, it is better to use another HystrixCommand or HystrixObservableCommand.

If your command is inherited from HystrixCommand, you can return a single fallback value by implementing the HystrixCommand.getFallback() method.

If your command is inherited from HystrixObservableCommand, you can return an Observable by implementing the HystrixObservableCommand.resumeWithFallback() method, and the Observable can emit a fallback value.

Hystrix will return the response returned by the fallback method to the caller.

If you don't implement a fallback method for your command, Hystrix will still return an Observable when the command throws an exception, but the Observable will not emit any data, and will terminate immediately and call onError() notification. Through this onError notification, the cause of the exception thrown by the command can be returned to the caller.

The result of failure or no fallback will vary depending on how you invoke the Hystrix command:

• execute(): Throws an exception.

•queue(): successfully returns a Future, but if the get() method is called, an exception will be thrown.

•observe(): Returns an Observable that terminates immediately when you subscribe to it and calls the onError() method.

•toObservable(): Returns an Observable, when you subscribe to it, it will terminate immediately and call the onError() method.

9. Return a successful response

• If the Hystrix command is executed successfully, it will return the response to the caller in the form of an Observable. Depending on how you call it in step 2, there may be some transformations before returning the Observablez.

•execute(): Get a Future object by calling queue(), and then call the get() method to get the value contained in the Future.

•queue(): Convert Observable to BlockingObservable, and convert BlockingObservable to a Future.

•observe(): Subscribe to the returned Observable and start executing the logic of the command immediately,

•toObservable(): Returns an Observable that has not changed. You must subscribe to it before it can start executing the logic of the command.

Four: circuit breaker

The diagram below shows how a HystrixCommand or HystrixObservableCommand interacts with a HystrixCircuitBreaker and its logic and decision flow, including how the counters behave in the circuit breaker.

The opening and closing of the circuit breaker has the following situations:

• Assume that the request in the circuit meets a certain threshold (HystrixCommandProperties.circuitBreakerRequestVolumeThreshold())

• Assume that the percentage of error occurrences exceeds the threshold of error occurrences set by HystrixCommandProperties.circuitBreakerErrorThresholdPercentage()

•The status of the circuit breaker changes from CLOSE to OPEN

• If the looper is open, all requests will be fused by the looper.

• After a certain period of time HystrixCommandProperties.circuitBreakerSleepWindowInMilliseconds(), the next request will be passed (in a half-open state), if the request fails to execute, the circuit breaker will return OPEN during the sleep window, if the request is successful, the circuit breaker will be set to Turn off the state, turn on the logic of 1 step again.

Hystrix's fuse is implemented in the HystrixCircuitBreaker class. The more important parameters are as follows:

1、circuitBreaker.enabled

Whether the fuse is enabled, the default is true

2、circuitBreaker.forceOpen

The fuse is forced to open and always remains open, the default is false

3、circuitBreaker.forceClosed

The fuse is forced to close and always remains closed, the default is false

4、circuitBreaker.requestVolumeThreshold

The threshold of the number of requests in the sliding window (10s), only when this threshold is reached, can the fuse be broken. The default is 20. If there are only 19 requests in this time period, even if all of them fail, it will not be automatically broken.

5、circuitBreaker.errorThresholdPercentage

The error rate threshold is 50% by default. For example, there are 100 requests within (10s), and 60 of them are abnormal. Then the error rate during this period is 60. If the error rate threshold has been exceeded, the fuse will be automatically opened.

6、circuitBreaker.sleepWindowInMilliseconds

After the fuse is opened, in order to be able to recover automatically, a request is sent every default 5000ms to test whether the dependent services are restored.

• In the latest code, allowRequest() has been deprecated and replaced by the attemptExecution() method.

Compared with the allowRequest() method, the only improvement is to modify the state value through compareAndSet. The return value of the attemptExecution() method determines whether to execute normal logic or degraded logic.

1. If circuitBreaker.forceOpen=true, it means that the circuit breaker has been forcibly opened, and all requests will be broken.

2. If circuitBreaker.forceClosed =true, it means that the circuit breaker has been forcibly closed, and all requests will be released.

3. circuitOpened defaults to -1, which is used to save the timestamp of the last circuit break.

4. If circuitOpened is not equal to -1, it means that a fuse has occurred. Use isAfterSleepWindow() to judge whether it is necessary to test.

Here is the logic of the automatic recovery of the fuse. If the current time has exceeded the timestamp of the last fuse + the test window 5000ms, then enter the if branch, modify the variable status through compareAndSet, and compete for the test ability. Among them, status represents the current state of the fuse, including CLOSED, OPEN, HALF_OPEN. Only the first request after the test window can execute normal logic, and modify the current state to HALF_OPEN to enter the half-fuse state. Other requests execute compareAndSet(Status.OPEN , Status.HALF_OPEN) returns false to execute the downgrade logic.

5. If an exception occurs in the test request, execute markNonSuccess()

Use compareAndSet to modify the status to the fuse-on state, and update the timestamp of the current fuse-on.

6. If the tentative request returns successfully, execute markSuccess()

Use compareAndSet to modify the status to the fuse closed state, and reset the interface statistics and circuitOpened flag to -1, and subsequent requests start to execute normal logic.

Having said so much, how to realize the automatic fuse has not been mentioned yet. There is a Metric module inside Hystrix, which specifically counts the execution status of each Command, including success, failure, timeout, thread pool rejection, etc. In the fuse, subscribeToStream( ) method, by subscribing to the change of the data stream, the function callback is realized. When there is a new request, the data stream changes, and the callback function onNext is triggered

In the onNext method, the parameter hc saves the request status (total number of requests, number of failures, and failure rate) of the current interface within the first 10s. The main logic is to judge whether the total number of requests reaches the threshold requestVolumeThreshold, and whether the failure rate reaches the threshold errorThresholdPercentage. If all are satisfied, it means that the interface is unstable enough and needs to be fused, then set the status to the fused open state, and update circuitOpened to the current timestamp to record the last time the fused was opened.

Five: Isolation

Hystrix adopts the bulkhead pattern to isolate dependencies between each other and limit concurrent access to any one of them.

Threads and thread pools

The client (third-party package, network call, etc.) will be executed in a separate thread, and will be isolated from the calling thread of the task, so as to prevent the caller from calling the dependency for too long and blocking the caller's thread.

•[Hystrix uses separate, per-dependency thread pools as a way of constraining any given dependency so latency on the underlying executions will saturate the available threads only in that pool]

You can prevent failures without using a thread pool, but this requires the client to be able to fail fast (network connection/read timeouts and retry configuration) and always perform well.

Netflix, designed Hystrix, and chose to use threads and thread pools to implement the isolation mechanism for the following reasons:

•Many applications call multiple different backend services as dependencies.

• Each service will provide its own client library package.

• Each client's library package is constantly in a state of change.

•[Client library logic can change to add new network calls]

• Each client library package may contain retries, data parsing, caching, and other logic.

• Client libraries tend to be "black boxes" to the user, with respect to implementation details, network access patterns. Default configuration etc. are opaque.

•[In several real-world production outages the determination was “oh, something changed and properties should be adjusted” or “the client library changed its behavior.]

• Even if the client itself has not changed, the service itself may change, and these factors will affect the performance of the service, resulting in failure of the client configuration.

• Transitive dependencies can introduce other client libraries that are not expected and perhaps not configured correctly.

• Most network access is performed synchronously.

• Failures and delays can also occur in client code, not just in network calls.

Benefits of using a thread pool

• The benefits of isolating threads in their own thread pool are:

In short, the isolation provided by the thread pool enables the ever-changing and dynamic combination of client libraries and subsystem performance characteristics to be handled gracefully without disruption.

Note : While separate threads provide isolation, your underlying client code should also have timeouts and/or respond to thread interruptions without leaving Hystrix's thread pool in an endless wait state.

Disadvantages of thread pool

The main disadvantage of the thread pool is that it increases the computing overhead of the CPU. Each command will be executed on a separate thread pool. This execution method will involve command queuing, scheduling, and context switching.

• When Netflix designed the system, they decided to accept the cost of this overhead in exchange for the benefits it provided, and believed that the overhead was small enough that there would be no significant cost or performance impact.

thread cost

Hystrix calculates the delay when the child thread executes the construct() method and the run() method, and calculates the total execution time of the parent thread from end to end. So, you can see the Hystrix overhead cost includes (threads, metrics, logs, circuit breakers, etc.).

Netflix API uses thread isolation to process more than 1 billion Hystrix Command tasks every day. Each API instance has more than 40 thread pools, and each thread pool has 5-20 threads (mostly set to 10)

• The image below shows a HystrixCommand executing 60 requests per second on a single API instance (approximately 350 threads per second total per server):

There is no need for a separate thread pool in the middle (or off-line).

On line 90, the cost of a separate thread is 3ms.

On line 99, a separate thread takes 9ms. Note however that the overhead increase in thread cost is much smaller than the increase in execution time jumping from 0 to 9 for a single thread (network request) jumping from 2 to 28.

For most Netflix use cases, the overhead of such a request at 90%+ is considered acceptable for the resiliency benefit.

For very low-latency requests (such as those that primarily trigger memcache), the overhead may be too high, in which case another approach can be used, such as semaphores, which, while they do not allow timeouts, provide most of the benefits of , without incurring overhead. In general, however, the overhead is so small that Netflix usually prefers to implement isolation as a separate thread.

Thread isolation - semaphore

The disadvantages of thread pool isolation mentioned above, when relying on services with extremely low latency, the overhead introduced by thread pool isolation technology outweighs the benefits it brings. Semaphore isolation techniques can be used instead, by setting semaphores to limit the amount of concurrent calls to any given dependency. The following figure illustrates the main difference between thread pool isolation and semaphore isolation:

When using the thread pool, the thread that sends the request is not the same as the thread that executes the dependent service, but when using a semaphore, the thread that sends the request and the thread that executes the dependent service are the same, and both are the threads that initiate the request.

Instead of thread pool/queue sizes, you can use semaphores (or counters) to limit the number of concurrent calls to any given dependency. This allows Hystrix to offload load without using the thread pool, but it does not allow timeouts and leave. This can be used if you trust the client and you just want to offload.

HystrixCommand and HystrixObservableCommand support semaphores in 2 places: Fallback: When Hystrix retrieves a fallback, it always does so on the calling Tomcat thread. Execution: If the property execution.isolation.strategy is set to SEMAPHORE, Hystrix will use semaphores instead of threads to limit the number of concurrent parent threads calling the command. You can configure both uses of semaphores by defining a dynamic property of how many concurrent threads can execute. You should resize them using similar calculations that you use when resizing thread pools (in-memory calls returned in milliseconds can be executed at 5000rps with semaphores of only 1 or 2... but the default for 10). NOTE: If a dependency is isolated from a semaphore and then becomes latent, the parent thread will remain blocked until the underlying network call times out. The semaphore rejection will start after the limit is triggered, but the thread filling the semaphore cannot leave.

Since Hystrix uses the thread pool for thread isolation by default, the use of semaphore isolation needs to explicitly set the property execution.isolation.strategy to ExecutionIsolationStrategy.SEMAPHORE, and configure the number of semaphores at the same time. The default is 10. When a client needs to initiate a request to a dependent service, it must first acquire a semaphore to actually initiate the call. Due to the limited number of semaphores, when the number of concurrent requests exceeds the number of semaphores, subsequent requests will be rejected directly and enter the fallback process.

Semaphore isolation mainly controls the amount of concurrent requests and prevents request threads from being blocked in a large area, so as to achieve the purpose of current limiting and avalanche prevention.

Quarantine summary

Both thread pool and semaphore can be used for thread isolation, but each has its own advantages and disadvantages and supported scenarios. The comparison is as follows:

	thread switch	Asynchronous support	support timeout	Support fuse	Limiting	overhead
amount of signal	no	no	no	yes	yes	Small
Thread Pool	yes	yes	yes	yes	yes	big

Both thread pool and semaphore support circuit breaking and current limiting. Compared to thread pools, semaphores do not require thread switching, thus avoiding unnecessary overhead. However, the semaphore does not support asynchrony, nor does it support timeout. That is to say, when the requested service is unavailable, the semaphore will control the request that exceeds the limit to return immediately, but the thread that already holds the semaphore can only wait for the service response or from the timeout. return, there may be a long wait. In the thread pool mode, when the service has not responded for more than the specified time, Hystrix will notify the thread to end immediately and return by responding to the interruption.

request merge

You can send HystrixCommands ahead of time by using a request combiner (HystrixCollapser is the abstract parent), through which you can combine multiple requests into one backend dependency call.

The graphs below show the number of threads and network connections in two cases, the first graph without request coalescing and the second graph with request coalescing (assuming all connections are "concurrent" within a short time window, 10ms in this case).

Why Use Pull Merge

• Event request coalescing to reduce the number of threads and network connections required to execute concurrent HystrixCommand requests. Request merging is performed in an automated fashion, without the need to encode batched requests at the code level.

Global context (all tomcat threads)

Ideally the coalescing is done at the global application level so that requests from any Tomcat thread from any user can be coalesced together.

For example, if HystrixCommand is configured to support batching of dependencies for any user request to get movie ratings, then when any user thread in the same JVM makes such a request, Hystrix will merge that request with other requests added to the same A network call in the JVM.

• Note that the merger will pass a HystrixRequestContext object to the merged network call, in order for this to be a valid option, the downstream system must handle this case.

User request context (single tomcat thread)

If HystrixCommand is configured to only process batch requests from a single user, Hystrix will only coalesce requests from a single Tomcat thread.

• For example, if a user wants to load 300 movie tags, Hystrix can combine these 300 network calls into one call.

Object Modeling and Code Complexity

Sometimes when you create an object model that makes logical sense for consuming objects, this doesn't match the effective resource utilization of the object's producers.

For example, given you 300 video objects, traverse them, and call their getSomeAttribute() method, but if you simply call it, it may cause 300 network calls (which may quickly fill up resources).

There are some manual ways around this, such as asking the user to declare which video object attributes they want to get before they call the getSomeAttribute() method, so they can all be prefetched.

Alternatively, you could split the object model so that the user has to get the list of videos from one place, and then request properties for that list of videos from another place.

These methods can make your API and object model appear unwieldy, and in this way, it does not conform to mental models and usage patterns. As multiple developers work on the codebase, it can lead to low-level bugs and inefficient development issues. Because an optimization for one use case can be broken by executing another use case and taking a new path through the code.

By moving the merging logic to the Hystrix layer, it doesn't matter how you create the object model, what the order of calls is, or whether different developers know if optimizations are done or not.

• The getSomeAttribute() method can be placed where it is most suitable, and called in any way that fits the usage pattern, and the combiner will automatically place the batch calls to the time window.

Request Cache

HystrixCommand and HystrixObservableCommand implementations can define a cache key, and then use this cache key to cancel the call in the request context in a concurrency-aware manner (the result can be obtained without calling dependencies, because the same request result has been cached according to the cache key).

Here is an example flow involving the lifecycle of an HTTP request, and the two threads performing work within that request:

The benefits of requesting cache are:

• Different code paths can execute Hystrix commands without worrying about duplication of work.

This is especially useful in large code bases where many developers implement different functions.

For example, multiple request paths need to obtain the user's Account object, which can be requested like this:

Account account = new UserGetAccount(accountId).execute();

//or

Observable<Account> accountObservable = new UserGetAccount(accountId).observe();

Hystrix RequestCache will only execute the underlying run() method once, and both threads executing HystrixCommand will receive the same data, despite instantiating multiple different instances.

• Data retrieval is consistent across requests.

Instead of returning a different value (or fallback) each time the command is executed, the first response is cached, and subsequent identical requests will return the cached response.

• Eliminates duplicate thread execution.

Since the request cache sits before the construct() or run() method call, Hystrix can cancel the call before the calling thread executes.

If Hystrix does not implement request caching, each command needs to be implemented in the constructor or run method, which will be queued and executed after a thread.

Six: Source code entry

There is a very decoupled extension mechanism in Spring Boot: Spring Factories. This mechanism is actually implemented after the SPI extension mechanism in java.

What is the SPI mechanism

The full name of SPI is Service Provider Interface, which briefly summarizes the idea of the Java SPI mechanism. Each abstract module in our system often has many different implementation schemes, such as the scheme of the log module, the scheme of the xml parsing module, and the scheme of the jdbc module. In object-oriented design, we generally recommend interface-based programming between modules, and no hard-coding of implementation classes between modules. Once the specific implementation class is involved in the code, it violates the principle of pluggability. If you need to replace an implementation, you need to modify the code. In order to realize that it can not be specified dynamically in the program when the module is assembled, a service discovery mechanism is needed.

Java SPI provides such a mechanism: the mechanism of finding a service implementation for an interface is a bit similar to the idea of IOC, which is to move the control of assembly out of the program. This mechanism is very important in modular design.

SPI mechanism in Spring Boot

There is also a loading mechanism similar to Java SPI in Spring. It configures the implementation class name of the interface in the META-INF/spring.factories file, and then reads these configuration files and instantiates them in the program.

This custom SPI mechanism is the basis for the Spring Boot Starter implementation.

Spring Factories implementation principle

The SpringFactoriesLoader class is defined in the spring-core package, which implements the function of retrieving the META-INF/spring.factories file and obtaining the configuration of the specified interface. Two external methods are defined in this class:

loadFactories obtains an instance of its implementation class according to the interface class, and this method returns a list of objects.

loadFactoryNames obtains the name of its interface class according to the interface, and this method returns a list of class names.

The key to the above two methods is to get the spring.factories file from the specified ClassLoader and parse it to get the list of class names

quote

https://github.com/Netflix/Hystrix/wiki

Author: JD Logistics Feng Zhiwen

Source: JD Cloud developer community Ziqishuo Tech