Micro Services - Principles and use of fuses Hystrix

Foreword

       Distributed systems often appears one of the underlying service is unavailable cause the entire system is unavailable, a phenomenon called avalanche effect service. In order to cope with the avalanche service, a common practice is to manually service degradation. The emergence of Hystrix , it provides us with another option.

Service definition avalanche effect

Services avalanche effect is a provider of service unavailability due to service the caller is not available and the process of gradually enlarge unavailable

The reason avalanche effect service of the form

I put the avalanche service participants to simplify the service provider and service the caller, and the process produces an avalanche service is divided into the following three phases to analyze the causes of:

  1. Service provider is unavailable

  2. Retry increase the flow

  3. Service call is unavailable

Each stage of service avalanches are likely to be caused by different reasons, such as the cause of services not available are:

  • hardware malfunction

  • Program Bug

  • Cache breakdown

  • A large number of user requests

      Hardware failure may be hardware damage caused by the server host is down, can not access the service provider's network hardware failure. 
      Cache breakdown generally occurs in the cache application restart, when all the cache is cleared, and a short time when a large number of cache invalidation. a lot of cache miss, the request Watch the back-end, resulting in a service provider overloaded, causing service is unavailable. before spike and big promotion begins, if not fully prepared, the user initiates a large number of requests will not cause service providers . and with the causes of retries increase traffic are:

  • Users Retry

  • Retry code logic

 

  After the service provider is not available, because the user can not stand the long wait on the interface, and constantly refresh the page and even submit the form.        

  Call the end of the service there will be a lot of logic to retry the service exception. 
  These will further increase the retry request traffic.

Finally, the main reason for the service call is unavailable produced are: resource depletion caused by synchronization wait

      When a service call to use synchronous calls, the waiting thread will generate a lot of system resources. Once a thread resources are exhausted, the service provided by the service call will also be unavailable, so the service had an avalanche effect.

Service Strategies avalanche

  For different reasons avalanche service, you can use different coping strategies:

  1. flow control

  2. Improved cache mode

  3. Service automatic expansion

  4. Service caller downgrade

 Flow control include:

  • Gateway limiting

  • Limiting user interaction

  • Close Retry

  Nginx because of the high performance, current first-line Internet company uses a lot of Nginx + Lua gateway for traffic control, resulting in OpenResty more and more popular.

  Specific measures limiting user interaction are:

  1. Using animation to load, improve the user's patience waiting time.

  2. Submit button to add a mandatory waiting time mechanism.

  Measures to improve cached mode include:

  • Cache preloading

  • Synchronous asynchronous refresh changed

  Automatic expansion of service measures are:

  • AWS的auto scaling

  Service downgraded measures caller services include:

  • Resource isolation

  • Reliance on service classification

  • Not fail-fast service by calling

  The main resource isolation is calling service thread pool isolation.

  Depending on our business, will depend on the services are divided into: strong dependence and if the service is not available depend strongly dependent on the current business will lead to the suspension, while the weak dependence of service is not available will not lead to the suspension of the current service.

Fast service is not available call failure is generally accomplished by time-out mechanism, fuses and fuse after downscaling methods.

Use Hystrix avalanche prevention services

        Hystrix [hɪst'rɪks] the Chinese meaning porcupine, its back covered with thorns, and have the ability to protect themselves. Netflix library of Hystrix is ​​when a timeout handling and fault-tolerant distributed interactive system to help solve, and it also has a ability of the system to protect.

Hystrix design principles include:

  • Resource isolation

  • Fuse

  • Command Mode

Resource isolation

  In order to prevent the proliferation of cargo leakage and fire, it will be divided into a plurality of warehouses, reduce the risk of resource isolation kinds manner is referred to: Bulkheads (bulkhead isolation mode). 

Hystrix the same pattern applied to the service on the caller.

  A highly service-oriented systems, we implemented a business logic often rely on multiple services, such as: 

  Product Details show services will depend on goods and services, prices of services, service Reviews

  Call three dependent services will share listings service thread pool. If the service is unavailable product reviews, appear thread pool threads are all waiting for the response is blocked, resulting in an avalanche service

  Hystrix by each dependent services allocate a separate thread pool resource isolation, so as to avoid an avalanche service. When product reviews service is not available, even if the distribution of goods and services independent of all 20 threads in a synchronized wait state, it will not affect other dependent services It calls.

Fuse mode

  Service health = the number of failed requests / total requests.   

       Fuse switching from the open to the closed state by setting the current serving health and threshold comparison decision.

  1. When the fuse switch is closed, the request is allowed through the fuse. If the current health higher than the set threshold value, the switch will remain off. If the current health below the set threshold value, the switch is switched to the open state.

  2. When the fuse switch is open, the request is prohibited.

  3. When the fuse switch in an open state, after a period of time, the fuse will automatically enter the half-open state, then through a fuse allows only one request. When the request call succeeds, the fuse to return to a closed state. If the request fails , fuse remains open, the subsequent request is prohibited.

         Fuse switch can guarantee service the caller when calling the exception service, fast return results, avoid a lot of synchronization wait. Fuses can continue to detect the results of the request after a period of time, providing possible to restore service call.

Command Mode

  Hystrix use the command mode (inherited HystrixCommand class) to wrap a specific service invocation logic (run method), and added a downgrade logic (getFallback) after a service call fails in command mode.
       Meanwhile, we can define the current in the constructor Command in relevant parameters and fuse the thread pool code shown below:

public class Service1HystrixCommand extends HystrixCommand<Response> { private Service1 service; private Request request; public Service1HystrixCommand(Service1 service, Request request){ supper( Setter.withGroupKey(HystrixCommandGroupKey.Factory.asKey("ServiceGroup")) .andCommandKey(HystrixCommandKey.Factory.asKey("servcie1query")) .andThreadPoolKey(HystrixThreadPoolKey.Factory.asKey("service1ThreadPool")) .andThreadPoolPropertiesDefaults(HystrixThreadPoolProperties.Setter() .withCoreSize(20))//服务线程池数量 .andCommandPropertiesDefaults(HystrixCommandProperties.Setter() .withCircuitBreakerErrorThresholdPercentage(60)//熔断器关闭到打开阈值 .withCircuitBreakerSleepWindowInMilliseconds(3000)//熔断器打开到关闭的时间窗长度 )) this.service = service; this.request = request; ); } @Override protected Response run(){ return service1.call(request); } @Override protected Response getFallback(){ return Response.dummy(); } }

  After using the Command pattern to build a service object, the service will have a fuse and thread pool function. 

Internal processing logic Hystrix

  1. Construction of Hystrix Command object, call the execution method.

  2. Check Hystrix fuse switch is turned on current service, if enabled, downgrade getFallback method is performed.

  3. When the fuse switch is off, then checks whether the current service Hystrix thread pool can receive a new request, if it exceeds the thread pool is full, downgrade getFallback method is performed.

  4. If the thread pool to accept the request, Hystrix started calling service specific logic run method.

  5. If the service fails, the downgrade service getFallback method, and the results reported Metrics update service health.

  6. If the service execution timeout, the downgrade service getFallback method, and the results reported Metrics update service health.

  7. If the service succeeds, the return to normal results.

  8. If getFallback service degradation method is successful, the result is returned downgrade.

  9. If the service degradation method getFallback fails, an exception is thrown.

Realization of Hystrix Metrics

  Metrics Hystrix the stored current state of health services, including the total number of service calls and the number of failed calls and other services. Metrics of the count, the fuse can be calculated so that the current call service failure rate, and threshold setting for comparison to determine the logic state of the fuse switch. Metrics thus achieved is important.

Before sliding window to achieve

  Count Hystrix use their own definitions in these versions of the sliding window data structure to record the current time window events (successes, failures, timeouts, the thread pool refusal, etc.).
  When the event occurs, the data structure is determined based on the current time with the old create a new barrel or barrels counted, and modifications to the counter by the line in the bucket. 
  these modifications are concurrent execution of multiple threads, there are a lot of code locking operation, the logic is more complex.

After the realization of the sliding window

       Hystrix started RxJava in these versions Observable.window () implement a sliding window.
       RxJava the window to create a new bucket using a background thread, to avoid the problem of concurrent creation of the barrel.
       At the same time no single-threaded locking feature RxJava also ensures counting change . thread-safe so that the code more concise. 
       the following is a simple sliding window Metrics RxJava I use the window method to achieve, just a few lines of code will be able to complete statistical functions, proves the strong RxJava:

@Test
public void timeWindowTest() throws Exception{ Observable<Integer> source = Observable.interval(50, TimeUnit.MILLISECONDS).map(i -> RandomUtils.nextInt(2)); source.window(1, TimeUnit.SECONDS).subscribe(window -> { int[] metrics = new int[2]; window.subscribe(i -> metrics[i]++, InternalObservableUtils.ERROR_NOT_IMPLEMENTED, () -> System.out.println("窗口Metrics:" + JSON.toJSONString(metrics))); }); TimeUnit.SECONDS.sleep(3); }

Guess you like

Origin www.cnblogs.com/liboware/p/11908612.html