ZK network failure Response Act

Network failure can be said that a distributed system natural enemies. If a network failure will never occur, we can actually design a strong consistent high-availability distributed systems. Unfortunately, it is not a network failure in a distributed environment did not exist, ZK course also need to be careful to deal with network problems.

Let us forget about the occurrence of a failure, first of all to see ZK handling of network connection. ZK client with all available information when starting the server, it randomly selects one server and tries to connect, in the case of a normal successful connection, ZK client and server will establish a session (session), before the session timeout the server responds to the client's request, each new request will refresh the session timeout. When ZK client and the server when the current lost contact, it will try to reconnect to the list of available server in a server.

After collectively seen ZK normal network connection process, let's look at a network failure in the abstract ZK world. Network failure is converted into two abnormal ZK level, one is ConnectionLossException, one is SessionExpireException. The former occurs before the timeout ZK client and a server is disconnected after the latter occurs on the server notifies the client session times out.

ConnectionLossException

This exception is definitely abnormal ZK one of the most vexing of. ZK ZK client and server through socket of a server connected to the client by the ClientCnxnmanagement, the service end by ServerCnxnmanagement. ConnectionLossExceptionZK occurred in the client loses the connection to the server when ZK, ZK it merely indicates that the client finds himself lost and the current server connection, in addition to know nothing. There are three important questions here.

Recover from a failure recoverable in

ConnectionLossExceptionIs a recoverable exception, it only represents the failure of the current connection to the server, the client is entirely possible to connect to another server and restarted later send the request. In the case of the client and ZK unstable connection, we need to deal with such exceptions especially careful. Otherwise, since the upper layer application crash the network jitter is unacceptable. In addition, re-create the ZK client, open a new session will only exacerbate the instability of the network. This is because the only time out to release the connection with the client through the client server session without rewiring, if too many connections leading to respond to instability due to open a new session will only worsen the situation.

A common tolerance ConnectionLossExceptionapproach is redoing actions, processing logic that is shaped like the code below

operation(...) {
  zk.create(path, data, ids, mode, callback, data);
}

callback = (rc, path, ctx, name) -> {
  switch (Code.get(rc)) {
    case CONNECTIONLOSS:
      operation(...);
      break;
    
    // ...
  }
}

On the server may have been operating successfully

When the above-mentioned reply from recoverable fault, the described method of operation by redone. However, redo action is risky. This is because the success of previous actions may have on the client.

ConnectionLossExceptionZK found only that the client is connected with the server himself off, but before the disconnection, the corresponding request may have been completely sent, the server has been reached and processed. Just because the connection client and service side of the disconnection but can not receive a response trigger ConnectionLossExceptionBale.

For read operations, the retry is usually no problem, because we can always get retry succeeds when read as due (or abnormal). For write operations, the situation is slightly nuanced.

For setData operation, the retry success, regardless of the specific business logic, we can see little problem. Since the two nodes are set to the same value idempotent operations for the previous version of the updated operation resulting in the retry operation does not match the version, we can swallow by the abnormality or abnormal triggering logic related to the service.

For the delete operation, retry may cause unexpected NoNodeException, we can swallow the exception or abnormal triggering business logic related.

For create operation, the situation is slightly more complicated. In the non-sequential case, create a trigger or may succeed NodeExistException, it is about handling with delete corresponding. But in the case of sequential, it is possible the previous operation has been successful, and retry the operation successfully. Since we lost the return value of the previous operation, therefore sequential node previous operation became an orphan, which may lead to resource leaks or a more serious problem of consistency. For example, based on ZK's leader election algorithms rely on the sequential ordering of nodes, a minimum number of orphan node will cause the entire algorithm fails, it is because the orphan became the Watcher node on its leader was ConnectionLossExceptiontriggered, nor for any client ownership of it, so it will not be deleted in order to promote the algorithm proceeds.

The client may miss the change of state

ZK The Watcher one-shot, the previous Watcher Watcher is triggered to re-set the interval between the trigger and the event may be lost. This in itself is an important issue ZK upper application to be considered. ConnectionLossExceptionIt will trigger a Watcher received WatchedEvent(EventType.None, KeeperState.Disconnected)event. Upon receipt of this event, ZK client must assume that any state on the ZK may change. For example, it depends on certain states that he is a leader in the application of action, need to suspend the action, and then re-execute the operation after confirming the state to restore the link.

Here are details on a design note, different from the general WatchedEvent will remove it after the trigger Watcher, EventType.Nonethe WatchedEvent not set the system properties zookeeper.disableAutoWatchReset=truein the case of only trigger Watcher without removing it. Meanwhile, after a successful connection to the server will re-present all Watcher by setWatches request to re-register on the server. Zxid server numerical comparison to determine whether the trigger Watcher. Thereby avoiding due to network jitter forcing the user code in the process in the processing logic Watcher ConnectionLossExceptionburden and re-execute the operation setting Watcher. In particular, all of the Watcher currently registered on the client will be affected by network jitter. But be careful Watcher re-register the event listener NodeCreated Watcher may miss the event, it is because it may be deleted after the first created in the process of re-establishing a connection in the node due to the action of other clients, since only in respect of whether the node judgment without zxid to help determine where we encounter the so-called ABA problem.

SessionExpiredException

This exception than ConnectionLossExceptionplace a good deal is that it is strictly unrecoverable failure, ZK client session after a timeout and server success can not be re-connected, so we usually only need to re-create a ZK client instance and start to work. But the session time-out will lead to ephemeral nodes be removed if the upper application logic associated with this, then it requires careful handling SessionExpiredException.

Session Timeout detection

After connecting the client and server ZK successfully established, ClientCnxn.SendThreadwill periodically send a ping message to the server, the server resets the session timeout time when the information processing ping. If the server has not received any new information sent by the client within the timeout period, it will announce the session timeout, and explicitly turn off the corresponding link. ZK-related logic in the session timeout SessionTracker, all examination sessions and timeouts judgments are made by the leader of the so-called arbitration action (quorum operation), so the client timeout is consistent with the consensus of all servers.

After connecting ZK client is shut down the server, the client attempts to reconnect to the server, only when it is reconnected to a server, the session list for the server queries the server and found this reconnection request belong overtime session, by returning non-positive integer timeout remaining time to inform the client session has timed out. Subsequently, the client learned that he has timed out and perform the appropriate exit logic.

There is a very tricky thing is ZK client session timeout is always informed by the service side. Then, in a very reasonable time-out, the server that is linked to the next client and server or partition situation thoroughly, in fact ZK client does not know about your session has timed out. ZK is currently no way to deal with this situation, can only rely on themselves to the upper layer application processing. For example, after determining the active closed ZK client and restart. In Curator by ConnectionStateManager#processEventsperiodically checking after receiving a disconnect event last time in the past, so from the client's perspective injection session timeout event when the inevitable timeout.

Delete ephemeral nodes

The biggest problem with deleting the ephemeral nodes associated with ZK is based on the leader election. ZK provides the leader election recipe reference [1], in general is based on a series of ephemral sequential ordering of nodes to do. When the upper application based on ZK do leader election, if ZK client and server times out, due to the ZK and related operations are often the main thread and the corresponding upper application are separate, so that they are not likely before the top leader in the application make a lot of unauthorized operation.

For example, in the FLINK in theory only become JobManager have written permission leader of the checkpoint, but the message is lost due to the leadership of generating ZK, to the client informed of the news, to inform the upper application, these steps of inter are asynchronous, the previous leader and not the first time learned that he lost the leadership. Meanwhile, the other JobManager elected leader may be notified at the same time. At this time, there will be two cluster JobManager think of themselves as leader. If the action is written to the checkpoint they do not do other restrictions, namely as long as the JobManager think they have rights, that is, you have permission, it could lead to two leader concurrent writes checkpoint leading to an inconsistent state. Note that due to technical problems caused by the response time of the Curator already mentioned [2], due to the low probability of occurrence, but also achieved dependent on the response of remote information "in a timely manner", so although many systems have this in theory the BUG, ​​but often only as precautions to help developers and users to understand what is happening in extreme cases.

FLINK-10333 [3] and the discussion on the mailing list ZK I initiated [4] For more on the challenges and solutions faced in this case.

[1] https://zookeeper.apache.org/doc/r3.5.5/recipes.html#sc_leaderElection

[2] https://cwiki.apache.org/confluence/display/CURATOR/TN10

[3] https://issues.apache.org/jira/browse/FLINK-10333

[4] https://lists.apache.org/x/thread.html/594b66ecb1d60b560a5c4c08ed1b2a67bc29143cb4e8d368da8c39b2@%3Cuser.zookeeper.apache.org%3E

Guess you like

Origin www.cnblogs.com/tisonkun/p/11612868.html