解决The swarm does not have a leader

1 issue:

Recently, a test environment Swarm clusters hung up, this cluster has two management nodes, perform docker node ls, are reported:

The swarm does not have a leader. It's possible that too few managers are online. Make sure more than half of the managers are online

Obviously the two management nodes are online.

 

2 Analysis:

By docker info command, see an error message

Error: rpc error: code = Unknown desc = The swarm does not have a leader. It's possible that too few managers are online. Make sure more than half of the managers are online.

 

Two nodes one by one analysis of the log and found that the error log periodic print:

The first management node:

Mar 4 09:30:05 manager1 dockerd: time="2020-03-04T09:30:05.663865244+08:00" level=error

msg="error sending message to peer" error="rpc error: code = Internal desc = connection error: desc = \"transport: x509: certificate has expired or is not yet valid\""

The second node management report:

Mar 4 09:08:01 manager2 dockerd: time="2020-03-04T09:08:01.446858105+08:00" level=warning

msg="error renewing TLS certificate: rpc error: code = Internal desc = connection error: desc = \"transport: remote error: tls: bad certificate\""

The preliminary conclusion, the second node management certificate in question, and very likely expired,

According to information literally guess what: There seems to be a BUG, ​​need to refresh the local certificate request from a remote node, requesting remote node also reported certificate does not form a paradox.

 

View time two machines, were normal time

 

3 Verification:

By command

docker swarm ca | openssl x509 -noout -text

View second node certificate management, command error can not be displayed certificate information

Two access nodes directly through the Google browser 2377 port https: // xxxx: 2377

Click the certificate, view the certificate, found that the current time is not within the validity period, then proceed to update the certificate is valid

 

 

 

 

Then faced with the problem: where to store the certificate? How to update? Reference to the contents of the following addresses:

Certificate-related  discussions on github

 

4 final resolution:

Because the certificate management node two fail, it leaves the cluster initiative directly

docker swarm leave --force

 

A management node is still not normal, execute commands on a management node

docker swarm init --force-new-cluster --advertise-addr x.x.x.x

Found it impossible to perform a normal restart the process docker

systemctl restart docker

Wait for a long time, after performing again

docker swarm init --force-new-cluster --advertise-addr x.x.x.x

Before cluster back to normal, and the deployment and configuration still exists, the problem be solved

 

(Completed)

Guess you like

Origin www.cnblogs.com/flying607/p/12407952.html