What is Hystrix? How does it achieve fault tolerance?

 Hystrix is ​​a library open sourced by Netflix for fault tolerance in distributed systems. It is mainly used to deal with inter-service dependencies in distributed systems to ensure that the system can continue to provide limited functionality without completely crashing in the face of dependency failures or high loads. Hystrix achieves fault tolerance by providing the following features:

  1. Downgrade (Fallback)

  When a dependent service fails or times out, Hystrix can provide an alternate, degraded response instead of returning an error or throwing an exception. This helps to keep part of the system functional without being impacted by issues with dependent services.

  2. Circuit Breaker

  Hystrix introduces the concept of a fuse, which is similar to a fuse in a circuit. If the failure rate of a dependency exceeds a certain threshold, Hystrix will open the fuse and temporarily block requests to the dependency to reduce load and avoid cascading failures. After a period of time, the fuse will try to be half-open, allowing some requests to pass through, and if successful, continue to close the fuse, otherwise keep it open.

  3. Resource isolation (Thread Pooling and Request Batching)

  Hystrix allows independent thread pools to be configured for different dependent services to ensure that a problem with a dependent service will not affect the thread resources of the entire system. In addition, Hystrix also supports request batch processing, which can combine multiple requests into one to reduce the load on dependent services.

  4. Real-time monitoring and measurement

  Hystrix provides real-time monitoring and measurement functions, and you can view the performance indicators of dependent services through the dashboard, such as request success rate, failure rate, response time, etc. This helps O&M personnel to discover and solve problems in a timely manner.

  5. Automatic Recovery

  Once the failure rate of the dependent service is reduced to an acceptable level, Hystrix will automatically resume normal request processing for the service and no longer trigger the fuse mechanism.

  6. Timeout processing

  Hystrix can configure the timeout period of each dependent service. If the request times out, it will be regarded as a failure and processed according to the fuse policy.

  In general, Hystrix achieves fault tolerance through policies such as circuit breaker, degradation, resource isolation, and real-time monitoring and measurement. It allows developers to better handle failures of dependent services in distributed systems, improving system availability and stability. However, it should be noted that Hystrix has announced the end of maintenance in Netflix's official GitHub repository, and it is recommended to use more advanced fault tolerance and circuit breaker libraries, such as Resilience4j or Sentinel.

Guess you like

Origin blog.csdn.net/Blue92120/article/details/132598435