Eureka Series (08) and expire automatically renew heartbeat

Eureka Series (08) and expire automatically renew heartbeat

[TOC]

Spring Cloud Series catalog - Eureka articles

In the previous Eureka series (07) registered and active service off the assembly line in the registration and offline services analyzed, we continue to analyze how Eureka is the heartbeat of renewal.

1. heartbeat renewal

Heartbeat contract there are two cases: one is a client-initiated renewal heartbeat (isReplication = false); the second is initiated when the broadcast server messages heartbeat renewal (isReplication = true). Both heartbeat renewal process is slightly different.

1.1 heartbeat renewal mechanism

When the server receives a heartbeat renew client, first update the lease time on the front of the server, if successful, will be broadcast to other servers heartbeat.

Figure 1: Eureka heartbeat renewal mechanism
sequenceDiagram participant InstanceResource participant PeerAwareInstanceRegistryImpl participant AbstractInstanceRegistry participant PeerEurekaNode note over InstanceResource: PUT: / euraka / apps / {appName} / {id} <br/> renewLease InstanceResource - >> PeerAwareInstanceRegistryImpl: Heartbeat request: renew (appName, id, isReplication) PeerAwareInstanceRegistryImpl - >> AbstractInstanceRegistry: 1. local data update: renew (appName, id, isReplication) loop synchronized to the other nodes Eureka Server PeerAwareInstanceRegistryImpl - >> PeerAwareInstanceRegistryImpl: 2.1 data synchronization: replicateToPeers PeerAwareInstanceRegistryImpl - >> PeerEurekaNode: 2.2 heartbeat -> PUT: / euraka / apps / {appName} / {id} alt 3.1 failure: 404 update each other node PeerEurekaNode - >> PeerEurekaNode: register (info) else 3.2 failure: a node or update their PeerEurekaNode - >> PeerEurekaNode:syncInstancesIfTimestampDiffers PeerEurekaNode -->> PeerEurekaNode : register(infoFromPeer, true) end end

to sum up:

  1. renewLeaseHeartbeat renewal request is InstanceResource # renewLease method for processing. isReplication = false is the client request, true is broadcast request message.
  2. renewLocal server heartbeat process. Treatment success is the heartbeat message broadcasting.
  3. heartbeatBroadcasts heartbeat messages to other servers. It should be noted processing mechanism of heart failure broadcast:
    • If the other server instance or PK failure does not exist, we need to re-register update instances other information services.
    • If the other server PK successful, you need to register in turn updates the local information services.

Receiving heartbeat renewal 1.2 - renewLease

InstanceResource # renewLease heartbeat renewal request processing, the path is PUT /apps/{appName}/{id}.

  1. If the local server process fails (including instance does not exist or state of the instance is UNKNOWN), returns NOT_FOUND, also need to re-register, update instance information.
  2. lastDirtyTimestamp server and client instances were PK. The results of two situations: First PK server instance fails, return NOT_FOUND, re-register the client to update the server instance information; the second is the server instance PK success, returns an instance of information to the client to update the client instance information.
@PUT
public Response renewLease(
    @HeaderParam(PeerEurekaNode.HEADER_REPLICATION) String isReplication,
    @QueryParam("overriddenstatus") String overriddenStatus,
    @QueryParam("status") String status,
    @QueryParam("lastDirtyTimestamp") String lastDirtyTimestamp) {
    boolean isFromReplicaNode = "true".equals(isReplication);
    // 1. 心跳处理,本地心跳处理成功后进行消息广播。
    //    由于消息广播是异步的,实际返回的结果是本地心跳处理的结果。
    boolean isSuccess = registry.renew(app.getName(), id, isFromReplicaNode);

    // 2. 心跳处理失败分两种情况:一是本地服务器不存在该服务实例;
    //    二是本地服务实例和lastDirtyTimestamp进行PK失败,则说明本地服务实例信息不是最新的
    if (!isSuccess) {
        return Response.status(Status.NOT_FOUND).build();
    }
    
    // 3. 本地服务实例和请求的lastDirtyTimestamp进行PK失败,则说明本地服务实例信息不是最新的
    //    后面有时间专门介绍一下 OverriddenStatus
    Response response;
    if (lastDirtyTimestamp != null && serverConfig.shouldSyncWhenTimestampDiffers()) {
        response = this.validateDirtyTimestamp(Long.valueOf(lastDirtyTimestamp), isFromReplicaNode);
        if (response.getStatus() == Response.Status.NOT_FOUND.getStatusCode()
            && (overriddenStatus != null)
            && !(InstanceStatus.UNKNOWN.name().equals(overriddenStatus))
            && isFromReplicaNode) {
            registry.storeOverriddenStatusIfRequired(app.getAppName(), id, InstanceStatus.valueOf(overriddenStatus));
        }
    } else {
        response = Response.ok().build();
    }
    return response;
}

Summary: The following three sections explain the whole process of renewal heartbeat:

  1. How to deal with a local server to renew? Mainly AbstractInstanceRegistry # renew method.
  2. How lastDirtyTimestamp local server and client instances are PK? Mainly InstanceResource # validateDirtyTimestamp method.
  3. How Eureka Client heartbeat is initiated renewal request, and processes the request result? Mainly DiscoveryClient.
  4. How to renew heartbeat message broadcasting deal? The main method is PeerEurekaNode # heartbeat.

1.3 Local renewal process - renew

Local service contract ends, if there is no instance or instances when the state is to return false UNKNOWN, expressed the need for the client to re-register, update server instance information. Of course, when returns true, does not mean that the data is up to date, we need to continue to verify the dirty data in the next step.

public boolean renew(String appName, String id, boolean isReplication) {
    RENEW.increment(isReplication);
    // 1. 获取服务端注册的实例
    Map<String, Lease<InstanceInfo>> gMap = registry.get(appName);
    Lease<InstanceInfo> leaseToRenew = null;
    if (gMap != null) {
        leaseToRenew = gMap.get(id);
    }
    // 2.1 服务实例不存在,返回404
    if (leaseToRenew == null) {
        RENEW_NOT_FOUND.increment(isReplication);
        return false;
    // 2.2 服务实例存在,
    } else {
        InstanceInfo instanceInfo = leaseToRenew.getHolder();
        if (instanceInfo != null) {
            // 实例的状态是 UNKNOWN 时返回 false,否则返回 true
            InstanceStatus overriddenInstanceStatus = this.getOverriddenInstanceStatus(
                instanceInfo, leaseToRenew, isReplication);
            if (overriddenInstanceStatus == InstanceStatus.UNKNOWN) {
                RENEW_NOT_FOUND.increment(isReplication);
                return false;
            }
            if (!instanceInfo.getStatus().equals(overriddenInstanceStatus)) {
                instanceInfo.setStatusWithoutDirty(overriddenInstanceStatus);
            }
        }
        renewsLastMin.increment();
        // 3. 更新最后一次的心跳时间(核心)
        leaseToRenew.renew();
        return true;
    }
}

Summary: it is necessary to continue to check the next dirty data is present and if the instance state is not UNKNOWN. The core of which is a code that leaseToRenew.renew()updates the last heartbeat time, Eureka lease in the Lease management is done.

Dirty data check 1.4 - validateDirtyTimestamp

validateDirtyTimestamp method is mainly the client and server instances local instance PK. PK principle is: Service instance lastDirtyTimestamp large representative is the latest registration information. In fact, the reason is very simple, each service instance update will update the time stamp, the time stamp so much on behalf of instances last updated examples of other information service node have this service instance for synchronization.

private Response validateDirtyTimestamp(Long lastDirtyTimestamp,
                                        boolean isReplication) {
    // 1. 获取本地注册的实例,和客户端的实例进行 PK
    InstanceInfo appInfo = registry.getInstanceByAppAndId(app.getName(), id, false);
    if (appInfo != null) {
        // 2. 客户端和服务端的实例更新的时间戳发生了变化,说明实例信息不同步了,进行PK
        if ((lastDirtyTimestamp != null) && (!lastDirtyTimestamp.equals(appInfo.getLastDirtyTimestamp()))) {
			// 3.1 客户端 PK 成功,客户端需要重新将实例注册一次,更新服务端的实例信息
            if (lastDirtyTimestamp > appInfo.getLastDirtyTimestamp()) {
                return Response.status(Status.NOT_FOUND).build();
			// 3.2 服务端 PK 成功,将实例信息返回给客户端,更新客户端的实例信息
            } else if (appInfo.getLastDirtyTimestamp() > lastDirtyTimestamp) {
                // ture表示Eureka内部之间同步数据,需要更新实例信息
                // 集群内部数据要一致,肯定要同步数据
                if (isReplication) {
                    return Response.status(Status.CONFLICT).entity(appInfo).build();
                // false表示EurekaClient的心跳,不需要同步实例信息给EurekaClient?
                } else {
                    return Response.ok().build();
                }
            }
        }
    }
    return Response.ok().build();
}

Summary: on the word, lastDirtyTimestamp delegates is the latest registration information.

Note: processing cluster internal messages broadcast heartbeat and EurekaClient contract is not the same (3.2):

  • Message broadcast within a cluster: If the data is inconsistent, it must carry out a data synchronization process, to achieve eventual consistency.
  • EurekaClient heartbeat contract, if the server is the latest data, no synchronization to the client.

1.5 client processing - renew

When EurekaClient heartbeat renew instance information if the client is up to date, you need to initiate a re-registration, update instance information server, but the server instance information is up to date, do not update the instance information for the client.

// DiscoveryClient
boolean renew() {
    EurekaHttpResponse<InstanceInfo> httpResponse;
    try {
        httpResponse = eurekaTransport.registrationClient.sendHeartBeat(instanceInfo.getAppName(), instanceInfo.getId(), instanceInfo, null);
        // 404时重新发起注册,更新服务端的实例信息
        if (httpResponse.getStatusCode() == Status.NOT_FOUND.getStatusCode()) {
            REREGISTER_COUNTER.increment();
            long timestamp = instanceInfo.setIsDirtyWithTime();
            boolean success = register();
            if (success) {
                instanceInfo.unsetIsDirty(timestamp);
            }
            return success;
        }
        return httpResponse.getStatusCode() == Status.OK.getStatusCode();
    } catch (Throwable e) {
        return false;
    }
}

1.6 Heartbeat Radio - heartbeat

Heartbeat broadcast focus needs to focus on the processing logic of failure: First, return 404, which is an example client information is up to date, re-initiate registration, update instance information service side; the second is other abnormal, you need to update an example returned from the server registration information for the client. The second point is where the heart contract and EurekaClient different places.

public void heartbeat(final String appName, final String id,
                      final InstanceInfo info, final InstanceStatus overriddenStatus,
                      boolean primeConnection) throws Throwable {
    // 1. primeConnection时不关心心跳续约的结果,发送请求后直接返回
    if (primeConnection) {
        replicationClient.sendHeartBeat(appName, id, info, overriddenStatus);
        return;
    }
    // 2. 关注请求结果,A -> B 发送心跳,成功就不说了
    // 3. 心跳续约失败有两种情况:一是 B 节点不存在该实例或 PK 失败,A -> B 重新发起注册请求;
    //    二是 B 节点存在该实例且 PK 成功,则反过来需要更新 A 节点该实例的注册信息。
    ReplicationTask replicationTask = new InstanceReplicationTask(targetHost, Action.Heartbeat, info, overriddenStatus, false) {
        @Override
        public EurekaHttpResponse<InstanceInfo> execute() throws Throwable {
            return replicationClient.sendHeartBeat(appName, id, info, overriddenStatus);
        }

        @Override
        public void handleFailure(int statusCode, Object responseEntity) throws Throwable {
            super.handleFailure(statusCode, responseEntity);
            // 一是 B 节点不存在该实例,A -> B 重新发起注册请求
            if (statusCode == 404) {
                if (info != null) {
                    register(info);
                }
    		// 二是 B 节点存在该实例且 PK 赢了,则反过来需要更新 A 节点该实例的注册信息
            } else if (config.shouldSyncWhenTimestampDiffers()) {
                InstanceInfo peerInstanceInfo = (InstanceInfo) responseEntity;
                if (peerInstanceInfo != null) {
                    syncInstancesIfTimestampDiffers(appName, id, info, peerInstanceInfo);
                }
            }
        }
    };
    long expiryTime = System.currentTimeMillis() + getLeaseRenewalOf(info);
    batchingDispatcher.process(taskId("heartbeat", info), replicationTask, expiryTime);
}

Summary: Heartbeat Radio is the ultimate guarantee data consistency Eureka important part, as long as the internal cluster heartbeat broadcast has been sent, if the situation would appear inconsistent data for data synchronization, thus ensuring the final data consistency.

// 更新本地实例注册信息
private void syncInstancesIfTimestampDiffers(
    String appName, String id, InstanceInfo info, InstanceInfo infoFromPeer) {
    try {
        if (infoFromPeer != null) {
            // 1. 更新overriddenStatus状态
            if (infoFromPeer.getOverriddenStatus() != null && !InstanceStatus.UNKNOWN.equals(infoFromPeer.getOverriddenStatus())) {
                registry.storeOverriddenStatusIfRequired(appName, id, infoFromPeer.getOverriddenStatus());
            }
            // 2. 更新本地实例注册信息
            registry.register(infoFromPeer, true);
        }
    } catch (Throwable e) {
    }
}

2. automatically expire

Remember in (03) the principle of automatic assembly Spring Cloud Eureka series analysis calls registry.openForTraffic when EurekaServerBootstrap start () method to start a scheduled task automatically expire EvictionTask do? This paper starts from the analysis EvictionTask.

2.1 EvictionTask start a scheduled task

Figure 2: Start timer task automatically expire
graph LR EurekaServerBootstrap -- openForTraffic --> PeerAwareInstanceRegistryImpl PeerAwareInstanceRegistryImpl -- postInit --> AbstractInstanceRegistry AbstractInstanceRegistry -- start --> EvictionTask
// 启动自动过期定时任务 EvictionTask,默认每 60s 执行一次
protected void postInit() {
    renewsLastMin.start();
    if (evictionTaskRef.get() != null) {
        evictionTaskRef.get().cancel();
    }
    evictionTaskRef.set(new EvictionTask());
    evictionTimer.schedule(evictionTaskRef.get(),
                           serverConfig.getEvictionIntervalTimerInMs(),
                           serverConfig.getEvictionIntervalTimerInMs());
}

Summary: The default EvictionTask performed once every 60s, 30s once per client heartbeat renewal, if renewed heartbeat over 90s is offline.

2.2 EvictionTask implementation of the principle of

Figure 2: EvictionTask implementation of the principle of
sequenceDiagram participant EvictionTask participant AbstractInstanceRegistry participant Lease note left of EvictionTask : 60s定时任务 EvictionTask ->> AbstractInstanceRegistry : evict loop 自动过期 AbstractInstanceRegistry ->> Lease : isExpired AbstractInstanceRegistry ->> AbstractInstanceRegistry : internalCancel end

2.2.1 How to determine whether the expired

First, several important properties Lease explained:

private long evictionTimestamp;		// 服务下线时间
private long registrationTimestamp;	// 服务注册时间
private long serviceUpTimestamp;	// 服务UP时间
private volatile long lastUpdateTimestamp;	// 最后一次心跳续约时间
private long duration;				// 心跳过期时间,默认 90s

Lease renewal is updated each time the heartbeat last renewal time lastUpdateTimestamp. If the service is updated offline offline time evictionTimestamp, so evictionTimestamp> 0 indicates that the service would have been off the assembly line. The default heartbeat for more than 90s renew service will automatically expire.

public boolean isExpired(long additionalLeaseMs) {
    return (evictionTimestamp > 0 || System.currentTimeMillis() > 
            (lastUpdateTimestamp + duration + additionalLeaseMs));
}

Summary: additionalLeaseMs a compensation mechanism, it can be used as the default value 0ms.

2.2.2 Service offline

First, determine whether to open a self-protection mechanism, and then calculate the number of instances a maximum off the assembly line, the last call internalCancel instance offline when the service offline.

public void evict(long additionalLeaseMs) {
	// 1. 是否开启自我保护机制
    if (!isLeaseExpirationEnabled()) {
        return;
    }

    // 2. 调用 lease.isExpired 筛选出所有过期的实例
    List<Lease<InstanceInfo>> expiredLeases = new ArrayList<>();
    for (Entry<String, Map<String, Lease<InstanceInfo>>> groupEntry : registry.entrySet()) {
        Map<String, Lease<InstanceInfo>> leaseMap = groupEntry.getValue();
        if (leaseMap != null) {
            for (Entry<String, Lease<InstanceInfo>> leaseEntry : leaseMap.entrySet()) {
                Lease<InstanceInfo> lease = leaseEntry.getValue();
                if (lease.isExpired(additionalLeaseMs) && lease.getHolder() != null) {
                    expiredLeases.add(lease);
                }
            }
        }
    }

   	// 3. 计算一次最多下线的实例个数 toEvict
    int registrySize = (int) getLocalRegistrySize();
    int registrySizeThreshold = (int) (registrySize * serverConfig.getRenewalPercentThreshold());
    int evictionLimit = registrySize - registrySizeThreshold;

    int toEvict = Math.min(expiredLeases.size(), evictionLimit);
    if (toEvict > 0) {
        Random random = new Random(System.currentTimeMillis());
        for (int i = 0; i < toEvict; i++) {
            int next = i + random.nextInt(expiredLeases.size() - i);
            Collections.swap(expiredLeases, i, next);
            Lease<InstanceInfo> lease = expiredLeases.get(i);

            String appName = lease.getHolder().getAppName();
            String id = lease.getHolder().getId();
            EXPIRED.increment();
            // 4. 和自动下线一样,调用internalCancel进行下线
            internalCancel(appName, id, false);
        }
    }
}

Summary: automatically expire and the difference between active downline is automatically expire would consider self-protection services, a maximum number of compute instances off the assembly line, and the rest are the same.


The intentions of recording a little bit every day. Perhaps the content is not important, but the habit is very important!

Guess you like

Origin www.cnblogs.com/binarylei/p/11621403.html