Diagram + source code to explain Eureka Server service culling logic

Get into the habit of writing together! This is the 8th day of my participation in the "Nuggets Daily New Plan·April Update Challenge", click to view the details of the event

Eureka Server service culling logic

Perseverance in achievement is more important than tenacity in failure - Laroshvko

Related articles
Diagram + source code explanation Eureka Server startup process analysis
diagram + source code explanation Eureka Client startup process analysis
diagram + source code explanation Eureka Server registry cache logic
diagram + source code explanation Eureka Client pull registry flow
diagram + source code explanation Eureka Client service registration process
Diagram + source code to explain Eureka Client heartbeat mechanism flow
diagram + source code to explain Eureka Client offline process analysis

Core flow chart

image.png

Where to start your analysis

    The service culling logic initialization work performed when the server is initialized is implemented in this method. In fact, if you think about it carefully, you will know that the culling logic designation of the server side is defined when the server side is initialized.

registry.openForTraffic(applicationInfoManager, registryCount);
复制代码

remove the core idea

    Think carefully about how to remove it. The client will register with the server when initializing, and then send a heartbeat to the server regularly. The server will perform heartbeat statistics. The queue is put into a queue that does not send heartbeats, and random instances are removed, but if there are many expired instances, data comparison will be performed, and the expected number of heartbeats will be compared with the actual number of heartbeats sent.

Eliminate core processes

Calculate the expected number of heartbeats

    The number of heartbeats is calculated according to the number of incoming instances. The number of instances is calculated by the value pulled by the server from other service nodes after local registration, and is calculated by the registry.syncUp() method.

public void openForTraffic(ApplicationInfoManager applicationInfoManager, int count) {
    // Renewals happen every 30 seconds and for a minute it should be a factor of 2.
    // 期待发送心跳的客户端数量
    this.expectedNumberOfClientsSendingRenews = count;
    // 也就是期待一分钟内 实例数量*2*0.85个心跳
    updateRenewsPerMinThreshold();
    /**
     * 每隔60s会运行一次定时调度的后台线程任务,EvictionTask,故障实例摘除任务
     */
    super.postInit();
}
复制代码

    Update calculation heartbeat logic updateRenewsPerMinThreshold(); calculation formula=number of instances*(60s/30s)*0.85

protected void updateRenewsPerMinThreshold() {
    // serverConfig.getExpectedClientRenewalIntervalSeconds() 默认 30s
    // 60.0 / serverConfig.getExpectedClientRenewalIntervalSeconds() == 2、
    // serverConfig.getRenewalPercentThreshold())
    // 2 * 0.85 也就是期待一分钟内 实例数量*2*0.85个心跳
    this.numberOfRenewsPerMinThreshold = (int)
            (this.expectedNumberOfClientsSendingRenews * 
       (60.0 / serverConfig.getExpectedClientRenewalIntervalSeconds()) * 
             serverConfig.getRenewalPercentThreshold());
}
复制代码

Create a driver task to calculate the heartbeat value

    Create a driver task, schedule it, and schedule it every 60s

protected void postInit() {
    renewsLastMin.start();
    if (evictionTaskRef.get() != null) {
        evictionTaskRef.get().cancel();
    }
    evictionTaskRef.set(new EvictionTask());
    /**
     * 默认是60s执行一次
     */
    evictionTimer.schedule(evictionTaskRef.get(),
           serverConfig.getEvictionIntervalTimerInMs(),
           serverConfig.getEvictionIntervalTimerInMs());
}
复制代码

true culling logic

Whether to open the self-protection mechanism

    isSelfPreservationModeEnabled() turns on the self-protection mechanism by default, so calculate the number of heartbeats in the last minute and compare the number of heartbeats per minute. If the number of heartbeats in the last minute is greater than what you expect, then do not implement the self-protection mechanism and go down.

if (!isLeaseExpirationEnabled()) {
     return;
}

public boolean isLeaseExpirationEnabled() {
    if (!isSelfPreservationModeEnabled()) {
        // The self preservation mode is disabled, hence allowing the instances to expire.
        return true;
    }
    return numberOfRenewsPerMinThreshold > 0 && 
        getNumOfRenewsInLastMin() > numberOfRenewsPerMinThreshold;
}
复制代码

Filter out expired instances

    Create a list of expired instances, traverse all instance information in the registry to calculate whether the current instance is expired through the lease.isExpired() method, and if it expires, put it into the expired list for subsequent operations

// 创建一个过期实例列表
List<Lease<InstanceInfo>> expiredLeases = new ArrayList<>();
/**
 * 遍历注册表中所有的服务实例,然后调用Lease的isExpired()方法,
 * 来判断当前这个服务实例的租约是否过期了,是否失效了,服务实例故障了,
 * 如果是故障的服务实例,加入一个列表
 */
for (Entry<String, Map<String, Lease<InstanceInfo>>> groupEntry : registry.entrySet()) {
    Map<String, Lease<InstanceInfo>> leaseMap = groupEntry.getValue();
    if (leaseMap != null) {
        for (Entry<String, Lease<InstanceInfo>> leaseEntry : leaseMap.entrySet()) {
            Lease<InstanceInfo> lease = leaseEntry.getValue();
            if (lease.isExpired(additionalLeaseMs) && lease.getHolder() != null) {
                // 如果过期的话那么就放入过期列表中
                expiredLeases.add(lease);
            }
        }
    }
}
复制代码

Calculate the limit on the number of expired instances

    Get the instance in the local registry, calculate the minimum number of heartbeats to send, the default instance percentage is 0.85, so if there are 40 instances, then it is 40 * 0.85 = 34 heartbeats, then the expiration limit is 40 - 34 = 6

// 获取本地的注册表中的实例数量
int registrySize = (int) getLocalRegistrySize();// 假设只有40个实例信息
/**
 * serverConfig.getRenewalPercentThreshold() 默认是 0.85
   比如有40个实例那么就是
 * registrySizeThreshold = 0.85*40 = 34 个心跳次数
 */
// 计算最少发送心跳的的值
int registrySizeThreshold = (int) (registrySize * 
                                   serverConfig.getRenewalPercentThreshold());
// 计算过期数量限制 利用 40 -34 = 6 
int evictionLimit = registrySize - registrySizeThreshold;
复制代码

Calculate offline instance

    Take out the size ratio of evictionLimit and expired expiredLeases, take out the minimum value, and then randomly drop a few instances in the expired set list, not all expired instances are removed, only 15% of the registry is removed each time Instances are dropped, and the remaining instances are culled when the next task arrives.

/**
 * 取出 evictionLimit 与过期的expiredLeases的size比大小,取出最小值
 */
int toEvict = Math.min(expiredLeases.size(), evictionLimit);
if (toEvict > 0) {
    /**
     * 随机下掉几个实例,不会一次性将所有故障的服务实例都摘除,
     * 每次最多讲注册表中15%的服务实例给摘除掉,所以一次没摘除所有的故障实例,
     * 下次EvictionTask再次执行的时候,会再次摘除,分批摘取机制
     * 在摘除的时候,是从故障实例中随机挑选本次可以摘除的数量的服务实例,
     * 来摘除,随机摘取机制,摘除服务实例的时候,
     * 其实就是调用下线的方法,internelCancel()方法,注册表、
     * recentChangeQueue、invalidate缓存
     */
    Random random = new Random(System.currentTimeMillis());
    for (int i = 0; i < toEvict; i++) {
        int next = i + random.nextInt(expiredLeases.size() - i);
        Collections.swap(expiredLeases, i, next);
        Lease<InstanceInfo> lease = expiredLeases.get(i);

        String appName = lease.getHolder().getAppName();
        String id = lease.getHolder().getId();
        EXPIRED.increment();
        // 下线实例
        internalCancel(appName, id, false);
    }
}
复制代码

Offline instance operation

    The offline instance operation internalCancel is the shutdown operation of the client. We will sort it out when the instance is offline later, because this operation also involves the synchronization of the service cluster after the offline operation, etc., and we will come up with an article in the future. to explain

summary

  1. Calculate the minimum expected number of heartbeats per minute
  2. Whether to open the self-protection mechanism based on the heartbeat of the previous minute and the expected minimum number of heartbeats per minute
  3. If self-protection is not enabled, then select (the number in the expired set) and (the registry minus the minimum expected number of heartbeats per minute) to take a minimum value for random selection
  4. Take the filtered instance offline

Guess you like

Origin juejin.im/post/7084041146124468254