introduction
Eureka is Netflix's open source service for service registration and discovery. Spring Cloud Eureka is based on Eureka for secondary encapsulation, which adds a more user-friendly UI and is more convenient to use. However, due to the existence of many caches in Eureka itself, the service status update is lagging behind. The most common situation is that the status is not updated in time after the service goes offline, and the service consumer calls the offline service, causing the request to fail. Based on Spring Cloud Eureka 1.4.4.RELEASE, this article introduces Eureka's caching mechanism under the premise of default region and zone.
1. AP Features
From the perspective of CAP theory, Eureka is an AP system that prioritizes availability (A) and partition fault tolerance (P), does not guarantee strong consistency (C), but only guarantees final consistency, so more caches are designed in the architecture.
<center>Eureka High Availability Architecture</center>
2. Service status
Eureka service status enum class:com.netflix.appinfo.InstanceInfo.InstanceStatus
condition | illustrate | condition | illustrate |
---|---|---|---|
UP | online | OUT_OF_SERVICE | invalid |
DOWN | offline | UNKNOWN | unknown |
STARTING | turning on |
Third, Eureka Server
In the Eureka high-availability architecture, Eureka Server can also register with other servers as a Client. Multiple nodes register with each other to form an Eureka cluster, and the clusters are regarded as peers. When the Eureka Client registers, renews, and updates the status with the Server, after the accepting node updates its own service registration information, it synchronizes to other peer nodes one by one.
**[Note]** If server-A registers with server-B node unidirectionally, server-A regards server-B as a peer node, and the data accepted by server-A will be synchronized to server-B, but server-B accepts The data will not be synchronized to server-A.
3.1 Cache mechanism
Eureka Server has three variables: ( registry, readWriteCacheMap, readOnlyCacheMap ) to save service registration information. By default, scheduled tasks synchronize readWriteCacheMap to readOnlyCacheMap every 30s, clean up nodes that have not renewed for more than 90s every 60s, and Eureka Client updates from readOnlyCacheMap every 30s Service registration information, and the UI updates service registration information from the registry.
<center>![](./cache mechanism.png)</center>
L3 cache
cache | Types of | illustrate |
---|---|---|
registry | ConcurrentHashMap | Real-time update , class AbstractInstanceRegistry member variable, the UI request is the service registration information here |
readWriteCacheMap | Guava Cache/LoadingCache | Real-time update , member variable of class ResponseCacheImpl, cache time 180 seconds |
readOnlyCacheMap | ConcurrentHashMap | Periodic update , member variable of class ResponseCacheImpl, updated from readWriteCacheMap every 30s by default , Eureka client updates service registration information from here by default, can be configured to update directly from readWriteCacheMap |
Cache related configuration
configure | default | illustrate |
---|---|---|
eureka.server.useReadOnlyResponseCache |
true | Client updates data from readOnlyCacheMap , false skips readOnlyCacheMap and updates directly from readWriteCacheMap |
eureka.server.responsecCacheUpdateIntervalMs |
30000 | readWriteCacheMap is updated to readOnlyCacheMap cycle, default 30s |
eureka.server.evictionIntervalTimerInMs |
60000 | Clear unrenewed node (evict) period, default 60s |
eureka.instance.leaseExpirationDurationInSeconds |
90 | Timeout for clearing unrenewed nodes, default 90s |
key class
class name | illustrate |
---|---|
com.netflix.eureka.registry.AbstractInstanceRegistry |
Save service registration information, hold registry and responseCache member variables |
com.netflix.eureka.registry.ResponseCacheImpl |
Holds readWriteCacheMap and readOnlyCacheMap member variables |
4, Eureka Client
Eureka Client has two roles: service provider and service consumer . As a service consumer, it is generally used with Ribbon or Feign (Feign uses Ribbon internally). After the Eureka Client is started, it immediately registers with the Server as a service provider. By default, it renews the contract every 30s; as a service consumer, it immediately updates the full service registration information to the Server. By default, the service registration information is incrementally updated every 30s; Ribbon delays 1s to obtain the used service registration information from the client. By default, the used service registration information is updated every 30s, and only the services whose status is UP are saved.
L2 cache
cache | Types of | illustrate |
---|---|---|
localRegionApps | AtomicReference | Periodic update , member variable of class DiscoveryClient, Eureka Client saves service registration information, and updates it to the server immediately after startup, incremental update every 30s by default |
upServerListZoneMap | ConcurrentHashMap | Periodic update , member variable of class LoadBalancerStats, Ribbon saves the service registration information that is used and the status is UP , and updates to the Client with a delay of 1s after startup, and updates every 30s by default |
Cache related configuration
configure | default | illustrate |
---|---|---|
eureka.instance.leaseRenewalIntervalInSeconds |
30 | Eureka Client 续约周期,默认30s |
eureka.client.registryFetchIntervalSeconds |
30 | Eureka Client 增量更新周期,默认30s(正常情况下增量更新,超时或与Server端不一致等情况则全量更新) |
ribbon.ServerListRefreshInterval |
30000 | Ribbon 更新周期,默认30s |
关键类
类名 | 说明 |
---|---|
com.netflix.discovery.DiscoveryClient |
Eureka Client 负责注册、续约和更新,方法initScheduledTasks()分别初始化续约和更新定时任务 |
com.netflix.loadbalancer.PollingServerListUpdater |
Ribbon 更新使用的服务注册信息,start初始化更新定时任务 |
com.netflix.loadbalancer.LoadBalancerStats |
Ribbon,保存使用且状态为UP的服务注册信息 |
五、默认配置下服务消费者最长感知时间
Eureka Client | 时间 | 说明 |
---|---|---|
上线 | 30(readOnly)+30(Client)+30(Ribbon)=90s | readWrite -> readOnly -> Client -> Ribbon 各30s |
正常下线 | 30(readonly)+30(Client)+30(Ribbon)=90s | 服务正常下线(kill或kill -15杀死进程)会给进程善后机会,DiscoveryClient.shutdown()将向Server更新自身状态为DOWN,然后发送DELETE请求注销自己,registry和readWriteCacheMap实时更新,故UI将不再显示该服务实例 |
非正常下线 | 30+60(evict)*2+30+30+30= 240s | 服务非正常下线(kill -9杀死进程或进程崩溃)不会触发DiscoveryClient.shutdown()方法,Eureka Server将依赖每60s清理超过90s未续约服务从registry和readWriteCacheMap中删除该服务实例 |
考虑如下情况
- 0s时服务未通知Eureka Client直接下线;
- 29s时第一次过期检查evict未超过90s;
- 89s时第二次过期检查evict未超过90s;
- 149s时第三次过期检查evict未续约时间超过了90s,故将该服务实例从registry和readWriteCacheMap中删除;
- 179s时定时任务从readWriteCacheMap更新至readOnlyCacheMap;
- 209s时Eureka Client从Eureka Server的readOnlyCacheMap更新;
- 239s时Ribbon从Eureka Client更新。
因此,极限情况下服务消费者最长感知时间将无限趋近240s。
<center>![](./最长感知时间.png)</center>
六、应对措施
服务注册中心在选择使用Eureka时说明已经接受了其优先保证可用性(A)和分区容错性(P)、不保证强一致性(C)的特点。如果需要优先保证强一致性(C),则应该考虑使用ZooKeeper等CP系统作为服务注册中心。分布式系统中一般配置多节点,单个节点服务上线的状态更新滞后并没有什么影响,这里主要考虑服务下线后状态更新滞后的应对措施。
6.1 Eureka Server
-
1.缩短readOnlyCacheMap更新周期。缩短该定时任务周期可减少滞后时间。
eureka.server.responsecCacheUpdateIntervalMs: 10000 # Eureka Server readOnlyCacheMap更新周期
-
2.关闭readOnlyCacheMap。中小型系统可以考虑该方案,Eureka Client直接从readWriteCacheMap更新服务注册信息。
eureka.server.useReadOnlyResponseCache: false # 是否使用readOnlyCacheMap
6.2 Eureka Client
-
1.服务消费者使用容错机制。如Spring Cloud Retry和Hystrix,Ribbon、Feign、Zuul都可以配置Retry,服务消费者访问某个已下线节点时一般报ConnectTimeout,这时可以通过Retry机制重试下一个节点。
-
2.服务消费者缩短更新周期。Eureka Client和Ribbon二级缓存影响状态更新,缩短这两个定时任务周期可减少滞后时间,例如配置:
eureka.client.registryFetchIntervalSeconds: 5 # Eureka Client更新周期 ribbon.ServerListRefreshInterval: 2000 # Ribbon更新周期
-
3.服务提供者保证服务正常下线。服务下线时使用kill或kill -15命令,避免使用kill -9命令,kill或kill -15命令杀死进程时将触发Eureka Client的shutdown()方法,主动删除Server的registry和readWriteCacheMap中的注册信息,不必依赖Server的evict清除。
-
4.服务提供者延迟下线。服务下线之前先调用接口使Eureka Server中保存的服务状态为DOWN或OUT_OF_SERVICE后再下线,二者时间差根据缓存机制和配置决定,比如默认情况下调用接口后延迟90s再下线服务即可保证服务消费者不会调用已下线服务实例。
七、网关实现服务下线实时感知
在软件工程中,没有一个问题是中间层解决不了的,而网关是服务提供者和服务消费者的中间层。以Spring Cloud Zuul网关为例,网关作为Eureka Client保存了服务注册信息,服务消费者通过网关将请求转发给服务提供者,只需要做到服务提供者下线时通知网关在自己保存的服务列表中使该服务失效。为了保持网关的独立性,可实现一个独立服务接收下线通知并协调网关集群。下篇文章将详细介绍网关如何实现服务下线实时感知,敬请期待!
作者:冯永彪 内容来源:宜信技术学院