tidb集群pd节点故障下线、修复上线过程记录

1、移除pd节点

https://pingcap.com/docs-cn/op-guide/ansible-deployment-scale/
1)查看 node3 节点的 name:/data0/tidb-ansible-2.1-rc.3/resources/bin/pd-ctl  -u "http://172.168.8.43:2379" -d member
2)移除节点 /data0/tidb-ansible-2.1-rc.3/resources/bin/pd-ctl -u "http://172.168.8.43:2379" -d member delete name pd3
3)下线成功后,停止 node3 上的服务:ansible-playbook stop.yml -l 172.168.8.24
4)编辑 inventory.ini 文件,移除节点信息:
5)滚动升级集群 ansible-playbook rolling_update.yml -f 30
6)更新 Prometheus 配置并重启:

2、添加pd节点

1)编辑 inventory.ini 文件,添加节点信息置于 [pd_servers] 主机组最后一行:
2)初始化新增节点, ansible-playbook bootstrap.yml -l 172.168.8.24
3)部署新节点,ansible-playbook deploy.yml -l 172.168.8.24
4)登陆新增的pd节点,编辑启动脚本:{deploy_dir}/scripts/run_pd.sh
4.1)移除 --initial-cluster="xxxx" \ 配置。
4.2)添加 --join="http://172.168.8.24:2379" \,IP 地址 (172.16.10.1) 可以是集群内现有 PD IP 地址中的任意一个。
[tidb@juhy-3] /data0/tidb/deploy/scripts$ vim run_pd.sh

#!/bin/bash
set -e
ulimit -n 1000000

# WARNING: This file was auto-generated. Do not edit!
# All your edit might be overwritten!
DEPLOY_DIR=/data0/tidb/deploy

cd "${DEPLOY_DIR}" || exit 1


exec bin/pd-server \
--name="pd3" \
--client-urls="http://172.168.8.24:2379" \
--advertise-client-urls="http://172.168.8.24:2379" \
--peer-urls="http://172.168.8.24:2380" \
--advertise-peer-urls="http://172.168.8.24:2380" \
--data-dir="/data0/tidb/deploy/data.pd" \
--join="http://172.168.8.43:2379" \
#--initial-cluster="pd1=http://172.168.8.43:2380,pd2=http://172.168.8.48:2380,pd3=http://172.168.8.24:2380" \
--config=conf/pd.toml \
--log-file="/data0/tidb/deploy/log/pd.log" 2>> "/data0/tidb/deploy/log/pd_stderr.log"


4.3)在新增 PD 节点中手动启动 PD 服务:{deploy_dir}/scripts/start_pd.sh
4.4)在原来的pd中控机上,使用 pd-ctl 检查新节点是否添加成功:
/data0/tidb-ansible-2.1-rc.3/resources/bin/pd-ctl -u "http://172.168.8.43:2379" -d member
{
"header": {
"cluster_id": 6571299051475546645
},
"members": [
{
"name": "pd3",
"member_id": 1593382661436092983,
"peer_urls": [
"http://172.168.8.24:2380"
],
"client_urls": [
"http://172.168.8.24:2379"
]
},

5)滚动升级整个集群 ansible-playbook rolling_update.yml -f 30
6)更新 Prometheus 配置并重启:ansible-playbook rolling_update_monitor.yml --tags=prometheus

三、环境记录:

tidb集群版本:v2.1-rc.3

pd节点数:3个

猜你喜欢

转载自blog.csdn.net/mchdba/article/details/85059058