Detailed Eureka caching mechanism

introduction

Eureka is an open source Netflix for implementing service registration and service discovery. Spring Cloud Eureka based on the Eureka secondary packaging, an increase of the UI more humane, more convenient to use. But because there are more cache Eureka itself, service status updates lag, the most common situation is: After the service offline status does not update, service consumers have to call off the assembly line service request failed. Based on the Spring Cloud Eureka 1.4.4.RELEASE, under the premise of the default region and zone, introducing Eureka caching mechanism.

A, AP characteristics

From the CAP Theory, Eureka AP system is a priority to ensure the availability of (A) and partitions fault tolerance (P), does not guarantee strong consistency (C), only guarantee eventual consistency, so the design of a large cache architecture.

eureka_

Second, the service status

Eureka Service Status enum class:com.netflix.appinfo.InstanceInfo.InstanceStatus

status Explanation status Explanation
UP Online OUT_OF_SERVICE Fail
DOWN Downline UNKNOWN unknown
STARTING turning on

Three, Eureka Server

High availability architecture in Eureka, Eureka Server can also be registered as a Client to another server, multi-node registered mutual composition Eureka clusters, each cluster among regarded peer. Eureka Client to Server registration, renewal, updating status, after receiving nodes update their registration information service, individually synchronized to other peer nodes.

[Note] If the server-A server-B registers with the unidirectional nodes, depending on the server-A server-B is a peer node, server-A received data synchronized to the server-B, but the server-B does not accept the data a synchronization server-A.

3.1 caching mechanism

Eureka Server存在三个变量:(registry、readWriteCacheMap、readOnlyCacheMap)保存服务注册信息,默认情况下定时任务每30s将readWriteCacheMap同步至readOnlyCacheMap,每60s清理超过90s未续约的节点,Eureka Client每30s从readOnlyCacheMap更新服务注册信息,而UI则从registry更新服务注册信息。

_

三级缓存

缓存 类型 说明
registry ConcurrentHashMap 实时更新,类AbstractInstanceRegistry成员变量,UI端请求的是这里的服务注册信息
readWriteCacheMap Guava Cache/LoadingCache 实时更新,类ResponseCacheImpl成员变量,缓存时间180秒
readOnlyCacheMap ConcurrentHashMap 周期更新,类ResponseCacheImpl成员变量,默认每30s从readWriteCacheMap更新,Eureka client默认从这里更新服务注册信息,可配置直接从readWriteCacheMap更新

缓存相关配置

配置 默认 说明
eureka.server.useReadOnlyResponseCache true Client从readOnlyCacheMap更新数据,false则跳过readOnlyCacheMap直接从readWriteCacheMap更新
eureka.server.responsecCacheUpdateIntervalMs 30000 readWriteCacheMap更新至readOnlyCacheMap周期,默认30s
eureka.server.evictionIntervalTimerInMs 60000 清理未续约节点(evict)周期,默认60s
eureka.instance.leaseExpirationDurationInSeconds 90 清理未续约节点超时时间,默认90s

关键类

类名 说明
com.netflix.eureka.registry.AbstractInstanceRegistry 保存服务注册信息,持有registry和responseCache成员变量
com.netflix.eureka.registry.ResponseCacheImpl 持有readWriteCacheMap和readOnlyCacheMap成员变量

四、Eureka Client

Eureka Client存在两种角色:服务提供者服务消费者,作为服务消费者一般配合Ribbon或Feign(Feign内部使用Ribbon)使用。Eureka Client启动后,作为服务提供者立即向Server注册,默认情况下每30s续约(renew);作为服务消费者立即向Server全量更新服务注册信息,默认情况下每30s增量更新服务注册信息;Ribbon延时1s向Client获取使用的服务注册信息,默认每30s更新使用的服务注册信息,只保存状态为UP的服务。

二级缓存

缓存 类型 说明
localRegionApps AtomicReference 周期更新,类DiscoveryClient成员变量,Eureka Client保存服务注册信息,启动后立即向Server全量更新,默认每30s增量更新
upServerListZoneMap ConcurrentHashMap 周期更新,类LoadBalancerStats成员变量,Ribbon保存使用且状态为UP的服务注册信息,启动后延时1s向Client更新,默认每30s更新

缓存相关配置

配置 默认 说明
eureka.instance.leaseRenewalIntervalInSeconds 30 Eureka Client 续约周期,默认30s
eureka.client.registryFetchIntervalSeconds 30 Eureka Client 增量更新周期,默认30s(正常情况下增量更新,超时或与Server端不一致等情况则全量更新)
ribbon.ServerListRefreshInterval 30000 Ribbon 更新周期,默认30s

关键类

类名 说明
com.netflix.discovery.DiscoveryClient Eureka Client 负责注册、续约和更新,方法initScheduledTasks()分别初始化续约和更新定时任务
com.netflix.loadbalancer.PollingServerListUpdater Ribbon 更新使用的服务注册信息,start初始化更新定时任务
com.netflix.loadbalancer.LoadBalancerStats Ribbon,保存使用且状态为UP的服务注册信息

五、默认配置下服务消费者最长感知时间

Eureka Client 时间 说明
上线 30(readOnly)+30(Client)+30(Ribbon)=90s readWrite -> readOnly -> Client -> Ribbon 各30s
正常下线 30(readonly)+30(Client)+30(Ribbon)=90s 服务正常下线(kill或kill -15杀死进程)会给进程善后机会,DiscoveryClient.shutdown()将向Server更新自身状态为DOWN,然后发送DELETE请求注销自己,registry和readWriteCacheMap实时更新,故UI将不再显示该服务实例
非正常下线 30+60(evict)*2+30+30+30= 240s 服务非正常下线(kill -9杀死进程或进程崩溃)不会触发DiscoveryClient.shutdown()方法,Eureka Server将依赖每60s清理超过90s未续约服务从registry和readWriteCacheMap中删除该服务实例

考虑如下情况

  • 0s时服务未通知Eureka Client直接下线;
  • 29s时第一次过期检查evict未超过90s;
  • 89s时第二次过期检查evict未超过90s;
  • 149s时第三次过期检查evict未续约时间超过了90s,故将该服务实例从registry和readWriteCacheMap中删除;
  • 179s时定时任务从readWriteCacheMap更新至readOnlyCacheMap;
  • 209s时Eureka Client从Eureka Server的readOnlyCacheMap更新;
  • 239s时Ribbon从Eureka Client更新。

因此,极限情况下服务消费者最长感知时间将无限趋近240s。

_

六、应对措施

服务注册中心在选择使用Eureka时说明已经接受了其优先保证可用性(A)和分区容错性(P)、不保证强一致性(C)的特点。如果需要优先保证强一致性(C),则应该考虑使用ZooKeeper等CP系统作为服务注册中心。分布式系统中一般配置多节点,单个节点服务上线的状态更新滞后并没有什么影响,这里主要考虑服务下线后状态更新滞后的应对措施。

6.1 Eureka Server

  • 1.缩短readOnlyCacheMap更新周期。缩短该定时任务周期可减少滞后时间。

    eureka.server.responsecCacheUpdateIntervalMs: 10000  # Eureka Server readOnlyCacheMap更新周期
  • 2.关闭readOnlyCacheMap。中小型系统可以考虑该方案,Eureka Client直接从readWriteCacheMap更新服务注册信息。

    eureka.server.useReadOnlyResponseCache: false        # 是否使用readOnlyCacheMap

6.2 Eureka Client

  • 1.服务消费者使用容错机制。如Spring Cloud Retry和Hystrix,Ribbon、Feign、Zuul都可以配置Retry,服务消费者访问某个已下线节点时一般报ConnectTimeout,这时可以通过Retry机制重试下一个节点。
  • 2. Service consumers to shorten the update cycle . Ribbon Eureka Client and secondary cache update impact, shortening both the timing duty cycle reduces the latency, such as the configuration:

    eureka.client.registryFetchIntervalSeconds: 5        # Eureka Client更新周期
    ribbon.ServerListRefreshInterval: 2000               # Ribbon更新周期
  • 3. service provider to ensure normal service offline . Registration information registry and readWriteCacheMap use the service offline kill or kill -15 command, avoid using the kill -9 command to kill or kill -15 command will trigger Eureka Client killed when the shutdown process () method, active deleted in Server , without relying on the Server evict cleared.
  • 4. The service provider delay offline . First Call Interface before the service offline saved in Eureka Server service status is DOWN or OUT_OF_SERVICE then offline, according to both the time difference caching mechanism and allocation decisions, such as the default interface to the case after calling off the assembly line again delayed 90s to guarantee service service consumers will not have to call off the assembly line service instance.

Seven gateway service offline to achieve real-time perception

In software engineering, the middle layer is not a problem can not be resolved, and the gateway service providers and service consumers intermediate layer. To Spring Cloud Zuul gateway, for example, as a gateway to save Eureka Client service registration information, service consumers through the gateway forwards the request to the service provider only needs to be done to inform gateway service providers saved in their downline list of services the service failure. In order to maintain the independence of the gateway, enabling an independent service to receive an offline notification and coordination gateway cluster. Next article will detail how to implement the service gateway off the assembly line in real-time perception , so stay tuned!

Author: Fengyong Biao
content source: CreditEase Institute of Technology

Guess you like

Origin yq.aliyun.com/articles/705399