etcd3 cluster management

[tap]

When are runtime changes to the cluster required

  • Maintaining and Upgrading Multiple Machines
    If you need to move multiple nodes to a new machine due to planned maintenance such as hardware upgrades or network outages, it is best to move them node by node, one at a time. It is safe to move the leader node, but after the leader node goes offline, it will take more time to elect a new node, so it is recommended to move it last. If the cluster has more than 50M of data, it is better to migrate nodes instead of deleting old nodes and adding new nodes to achieve node movement.
  • Change the size of the cluster
  • replace a broken node
  • Cluster restart after most outages
    Create a new cluster based on the original data; force a node to become the leader node, and finally add other nodes to this new cluster by adding new nodes through runtime changes.

Actions changed while the cluster is running

In general, cluster changes typically do the following:

  • To upgrade the peerURLs of a single node, you need to perform an update node operation
  • To replace a node, you need to perform an add node operation first, and then perform a delete node operation
  • Changing the cluster size from 3 to 5 requires two add node operations
  • Reducing the cluster size from 5 to 3 requires two delete node operations

update a node

If you want to update a peer's IP (peerURLS), you first need to know the ID of that peer. You can list all nodes and find out the ID of the corresponding node:

etcdctl member list
4d4f508502c31ddc, started, name=etcd3 http://10.5.12.18:2380, http://10.5.12.18:2379
d20b3f1647802774, started, name=etcd2 http://10.5.12.17:2380, http://10.5.12.17:2379
fdbaf2aa62569cb3, started name=etcd1 http://10.5.12.16:2380, http://10.5.12.16:2379

Assuming that the peerURLs of the node whose ID is fdbaf2aa62569cb3 is to be updated is http://10.5.12.20:2380, the operation is as follows:

etcdctl member update fdbaf2aa62569cb3 http://10.5.12.20:2380

delete a node

Delete the node with ID fdbaf2aa62569cb3:

etcdctl member remove fdbaf2aa62569cb3

Note: If the leader node is deleted, additional time is required for re-election

add a node

Adding a node is divided into two steps: register a new node through etcdctl or the corresponding api, and then start the new node according to the relevant parameters given when registering the new node

Assuming that the newly added node is named etcd4, and the peerURLs are http://10.5.12.10:2380, the configuration is as follows:

etcdctl member add etcd4 http://10.5.12.10:2380

After etcd registers the new node, it will return a prompt, including three environment variables, as follows:

ETCD_NAME="etcd4"
ETCD_INITIAL_CLUSTER="etcd1=http://10.5.12.16:2830,etcd2=http://10.5.12.17:2830,etcd3=http://10.5.12.18:2830,etcd4=http://10.5.12.10:2830"
ETCD_INITIAL_CLUSTER_STATE=existing

When starting a new node, you can bring these three variables. The /opt/kubernetes/cfg/etcd.conf section of the new node is configured as follows:

......
ETCD_NAME="etcd4"
ETCD_INITIAL_CLUSTER="etcd0=http://10.5.12.16:2830,etcd1=http://10.5.12.17:2830,etcd2=http://10.5.12.18:2830,etcd3=http://10.5.12.10:2830"
ETCD_INITIAL_CLUSTER_STATE=existing
......

In addition, it should also be noted that if the new etcd data exists in the --data-dir directory of the newly added node, the directory needs to be cleared first. After the node is deleted, the member information in the cluster will be updated. The new node joins the cluster as a brand new node. If there is data in --data-dir, etcd will read the existing data when it starts, and it will still be used at startup. The old member ID will also cause the cluster to be unable to join, so be sure to clear the --data-dir of the new node.

Note: If the original cluster has only 1 node, the new cluster will not be properly formed until the new node is successfully started. Because the original single-node cluster cannot complete the leader election. The new cluster cannot be formed correctly until the new node is started and a connection is established with the original node.

Node Migration and Disaster Recovery

Migrate a node

There are two ways to move nodes: delete old nodes, add new nodes; migrate nodes. When the data of the cluster exceeds 50M, it is recommended to move the node by migrating the node.

The core of the migration node is the migration of the data directory. Because each node of etcd will store its own ID under its own data directory, migrating a node will not change the node ID.

The steps for migrating nodes are simple, including the following steps:

  • Stop the service of the node that needs to be migrated
  • Copy the data directory from the old machine to the new machine
  • Change the node's peerURLs to the new machine's IP:port via an update operation that is changed at cluster runtime
  • Specify the copied data directory on the new machine and start the etcd node service

The following is a concrete example.
Assuming an existing cluster example is as follows:

etcdctl member list
4d4f508502c31ddc, started, name=etcd3 http://10.5.12.18:2380, http://10.5.12.18:2379
d20b3f1647802774, started, name=etcd2 http://10.5.12.17:2380, http://10.5.12.17:2379
fdbaf2aa62569cb3, started name=etcd1 http://10.5.12.16:2380, http://10.5.12.16:2379

Move etcd0 from 10.5.12.16 to 10.5.12.19:

  1. Stop the etcd process on etcd1:

    pkill etcd
  2. Copy the data directory from 10.5.12.16 to 10.5.12.19:

    tar -zcf etcd1.tar.gz /data/etcd
    scp etcd1.tar.gz 10.5.12.19:/data
  3. Change the peerURLs of etcd1:

    etcdctl member update fdbaf2aa62569cb3 http://10.5.12.19:2380
  4. Start etcd on the new machine:

    tar xf etcd1.tar.gz -C /data/etcd
    etcd --name etcd1 --listen-peer-urls http://10.5.12.19:2380 --listen-client-urls http://10.5.12.19:2379,http://127.0.0.1:2379 --advertise-client-urls http://10.5.12.19:2379,http://127.0.0.1:2379

disaster recovery

  1. Backing up data
    needs to be done on a live node:

    etcdctl backup --data-dir /data/etcd --backup-dir /data/backup/etcd

    This command will back up the original data to the /data/backup/etcd directory and flush related metadata such as node ID and cluster ID. This means that the backup data contains only the data, not the identity information

  2. Rebuild a single node cluster with backup data

    etcd --name etcd1 --data-dir=/data/backup/etcd --force-new-cluster --listen-peer-urls http://10.5.12.19:2380 --listen-client-urls http://10.5.12.19:2379,http://127.0.0.1:2379 --advertise-client-urls http://10.5.12.19:2379,http://127.0.0.1:2379 --initial-advertise-peer-urls http://10.5.12.19:2380

    When it is determined that the new cluster is normal, you can delete the original cluster data, suspend the new cluster, copy the data directory of the new cluster to the original data location, and restart

    pkill etcd
    rm -rf /data/etcd
    mv /data/backup/etcd /data/etcd
    etcd --name etcd0 --data-dir /data/etcd ....

    Note: If you still use the previous node to create a cluster, be sure to kill the previous etcd process and clear the previous data

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324400544&siteId=291194637