In a project (data product), it is necessary to connect to third-party applications in Enterprise WeChat. When using Feign's user module to call microservices and using WeChat's code to obtain access_token and user factory information, Feign retry times out and reports an error. This article records the process of solving the problem.
1. Problem recurrence
1. Some of Spring Cloud’s dependencies are as follows:
<parent>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-parent</artifactId>
<version>1.5.3.RELEASE</version>
</parent>
<dependencyManagement>
<dependencies>
<dependency>
<groupId>org.springframework.cloud</groupId>
<artifactId>spring-cloud-dependencies</artifactId>
<version>Dalston.SR1</version>
<type>pom</type>
<scope>import</scope>
</dependency>
</dependencies>
</dependencyManagement>
<dependencies>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-web</artifactId>
</dependency>
<dependency>
<groupId>org.springframework.cloud</groupId>
<artifactId>spring-cloud-starter-eureka</artifactId>
</dependency>
<dependency>
<groupId>org.springframework.cloud</groupId>
<artifactId>spring-cloud-starter-feign</artifactId>
</dependency>
2. WeChat-related interface documents
After the front-end configures the callback domain name through the enterprise ID, it calls WeChat's API to obtain the code. See documentation:
https://work.weixin.qq.com/api/doc/90000/90135/91022
Note : The code can only be used once, see the documentation, so the access_token obtained needs to be cached. In the project, it is cached in redis for subsequent message push and other functions.
3. Request flow chart
2. Cause analysis
First of all, in the entire request link, stage 2 is the location of the feign request, but feign is not configured in the yml configuration file. Therefore, it can be concluded that feign uses the default configuration. When the problem occurred, I checked the documentation of feign and found that the default timeout for feign retry is 1s.
Therefore, the timeout period of feign is now reconfigured. The existing configuration of feign is as follows:
feign:
client:
config:
organization:
connectTimeout: 5000
readTimeout: 5000
in,
-
organization represents the service name called by feign.
-
connectTimeout represents the time to establish the connection for the requested connection (this includes obtaining the service list saved in the requested Eureka - speculation)
-
readTimeout indicates the time of request call after the connection is established.
Secondly, in the above configuration, by checking the request logs of the organization and data services, it is found that the requests can be successfully established. However, once the request to the WeChat interface in phase three is delayed, feign's retry will be triggered for the second call.
Since the WeChat interface requested in stage three is not not called, but WeChat does not respond due to network or other reasons, but the code has been consumed, when stage two carries the same code to call the WeChat interface, then it will It appears that the code has been consumed.
Finally, another problem at this time is that the services in the project are all single-instance deployments, and Feign and Ribbon in Spring Cloud components have retry functions. Feign in Spring Cloud integrates Ribbon, but both Feign and Ribbon have retry functions. In order to unify the behavior of the two, Spring Cloud sets Feign's retry strategy to feign.Retryer#NEVER_RETRY (ie never retry) by default after the C version. Therefore, the essence of Feign's calling is still implemented through Ribbon.
3. Related configuration tests
After testing, it was found that both Feign and Ribbon configurations can achieve timeout fuses.
Version number: SpringCloud Greenwich.SR1
Configuration 1. Configure only Feign related configurations, that is, Feign overrides Ribbon's default timeout configuration.
But note that this configuration will trigger Ribbon retries.
feign:
client:
config:
eureka-client:
connectTimeout: 1000
readTimeout: 1000
Configuration 2. Configure Ribbon only
Note : There is a pitfall here. If the MaxAutoRetriesNextServer parameter is not configured to 0, even in a single instance deployment, one retry will still occur. Therefore, if you do not want retries to occur, you need to manually configure MaxAutoRetriesNextServer=0 and MaxAutoRetries=0.
ribbon:
ReadTimeout: 4000
ConnectionTimeout: 4000
OkToRetryOnAllOperations: true
MaxAutoRetriesNextServer: 0 # 当前实例全部失败后可以换1个实例再重试,
MaxAutoRetries: 1 # 在当前实例只重试2次
Configuration 3. Neither F eign nor Ribbon is configured.
Note: After testing, it was found that the default timeout configuration of the ribbon is used here, and the configuration is as follows.
MaxAutoRetriesNextServer=1
MaxAutoRetries=0
public LoadBalancerContext(ILoadBalancer lb) {
this.clientName = "default";
this.maxAutoRetriesNextServer = 1;
this.maxAutoRetries = 0;
this.defaultRetryHandler = new DefaultLoadBalancerRetryHandler();
this.okToRetryOnAllOperations =
DefaultClientConfigImpl.DEFAULT_OK_TO_RETRY_ON_ALL_OPERATIONS;
this.lb = lb;
}
Version number: The test conclusions of SpringCloud Dalston.SR1 and Greenwich.SR1 are consistent
Note: The default timeout of Dalston.SR1 ribbon component
public static final int DEFAULT_READ_TIMEOUT = 5000;
public static final int DEFAULT_CONNECT_TIMEOUT = 2000;
Greenwich.SR1 Ribbon component default timeout
public static final int DEFAULT_CONNECT_TIMEOUT = 1000;
public static final int DEFAULT_READ_TIMEOUT = 1000;