Timeout mechanism of Spring Cloud Feign+Ribbon

In a project (data product), it is necessary to connect to third-party applications in Enterprise WeChat. When using Feign's user module to call microservices and using WeChat's code to obtain access_token and user factory information, Feign retry times out and reports an error. This article records the process of solving the problem.

1. Problem recurrence

1. Some of Spring Cloud’s dependencies are as follows:

<parent>    <groupId>org.springframework.boot</groupId>    <artifactId>spring-boot-starter-parent</artifactId>    <version>1.5.3.RELEASE</version></parent><dependencyManagement>    <dependencies>        <dependency>            <groupId>org.springframework.cloud</groupId>            <artifactId>spring-cloud-dependencies</artifactId>            <version>Dalston.SR1</version>            <type>pom</type>            <scope>import</scope>        </dependency>    </dependencies></dependencyManagement><dependencies>    <dependency>        <groupId>org.springframework.boot</groupId>        <artifactId>spring-boot-starter-web</artifactId>    </dependency>    <dependency>        <groupId>org.springframework.cloud</groupId>        <artifactId>spring-cloud-starter-eureka</artifactId>    </dependency>    <dependency>        <groupId>org.springframework.cloud</groupId>        <artifactId>spring-cloud-starter-feign</artifactId>    </dependency>

2. WeChat-related interface documents

After the front-end configures the callback domain name through the enterprise ID, it calls WeChat's API to obtain the code. See documentation:

https://work.weixin.qq.com/api/doc/90000/90135/91022

Note : The code can only be used once, see the documentation, so the access_token obtained needs to be cached. In the project, it is cached in redis for subsequent message push and other functions.

     

picture

3. Request flow chart

      

picture

2. Cause analysis

First of all, in the entire request link, stage 2 is the location of the feign request, but feign is not configured in the yml configuration file. Therefore, it can be concluded that feign uses the default configuration. When the problem occurred, I checked the documentation of feign and found that the default timeout for feign retry is 1s.

Therefore, the timeout period of feign is now reconfigured. The existing configuration of feign is as follows:

feign:  client:    config:      organization:        connectTimeout: 5000        readTimeout: 5000

in,

  • organization represents the service name called by feign.

  • connectTimeout represents the time to establish the connection for the requested connection (this includes obtaining the service list saved in the requested Eureka - speculation)

  • readTimeout indicates the time of request call after the connection is established.

Secondly, in the above configuration, by checking the request logs of the organization and data services, it is found that the requests can be successfully established. However, once the request to the WeChat interface in phase three is delayed, feign's retry will be triggered for the second call.

Since the WeChat interface requested in stage three is not not called, but WeChat does not respond due to network or other reasons, but the code has been consumed, when stage two carries the same code to call the WeChat interface, then it will It appears that the code has been consumed.

Finally, another problem at this time is that the services in the project are all single-instance deployments, and Feign and Ribbon in Spring Cloud components have retry functions. Feign in Spring Cloud integrates Ribbon, but both Feign and Ribbon have retry functions. In order to unify the behavior of the two, Spring Cloud sets Feign's retry strategy to  feign.Retryer#NEVER_RETRY (ie never retry) by default after the C version. Therefore, the essence of Feign's calling is still implemented through Ribbon.

3. Related configuration tests

After testing, it was found that both Feign and Ribbon configurations can achieve timeout fuses.

Version number: SpringCloud Greenwich.SR1

Configuration 1. Configure only Feign related configurations, that is,  Feign  overrides Ribbon's default timeout configuration.

But note that this configuration will trigger Ribbon retries.

feign:  client:    config:      eureka-client:        connectTimeout: 1000        readTimeout: 1000

Configuration 2. Configure Ribbon only

Note : There is a pitfall here. If the MaxAutoRetriesNextServer parameter is not configured to 0, even in a single instance deployment, one retry will still occur. Therefore, if you do not want retries to occur, you need to manually configure MaxAutoRetriesNextServer=0 and MaxAutoRetries=0.

ribbon:  ReadTimeout: 4000  ConnectionTimeout: 4000  OkToRetryOnAllOperations: true  MaxAutoRetriesNextServer: 0 # 当前实例全部失败后可以换1个实例再重试,  MaxAutoRetries: 1 # 在当前实例只重试2次

Configuration 3. Neither F eign nor Ribbon is configured.

Note: After testing, it was found that the default timeout configuration of the ribbon is used here, and the configuration is as follows.

MaxAutoRetriesNextServer=1MaxAutoRetries=0

public LoadBalancerContext(ILoadBalancer lb) {
   
       this.clientName = "default";    this.maxAutoRetriesNextServer = 1;    this.maxAutoRetries = 0;    this.defaultRetryHandler = new DefaultLoadBalancerRetryHandler();    this.okToRetryOnAllOperations =        DefaultClientConfigImpl.DEFAULT_OK_TO_RETRY_ON_ALL_OPERATIONS;    this.lb = lb;}
 
 
 
 

Version number: The test conclusions of SpringCloud Dalston.SR1 and Greenwich.SR1 are consistent

Note: The default timeout of Dalston.SR1 ribbon component

 
 

public static final int DEFAULT_READ_TIMEOUT = 5000;public static final int DEFAULT_CONNECT_TIMEOUT = 2000;

Greenwich.SR1 Ribbon component default timeout

 
 
 
 
public static final int DEFAULT_CONNECT_TIMEOUT = 1000;public static final int DEFAULT_READ_TIMEOUT = 1000;

Guess you like

Origin blog.csdn.net/moshowgame/article/details/132092887