Solution to the "brain split" problem in the high availability of Resource Manager in YARN

"Split brain" problem: Resource Manager failed to respond to the outside world in time due to network crashes or its own failures, resulting in a " fake death " phenomenon, which led to a new round of active/standby switchover of Zookeeper. However, for the "fake death" RM itself That is, it still considers itself to be Active, so multiple Active RMs appear in the entire system.

The "isolation" mechanism solves the problem: when the active/standby switchover occurs, when the RM competes to create a lock node, it will carry the zookeeper's ACL permission to limit it, with the purpose of monopolizing the node. After the active/standby switchover, the original "fake death" RM will update the zookeeper node status after recovery. If the ACL is found to be incorrect and the node was not created by itself, it will automatically update itself to the standby status. This ensures that only An Active RM.

Guess you like

Origin blog.csdn.net/wilde123/article/details/118974386