ceph从osd恢复mon

前言：

原本mon数据库是放在系统卡上的，生产环境为了防止系统卡集体损坏，导致整个mon数据库损坏，整个ceph集群不可访问，特地在ceph环境中添加了一个共享盘功能（即每个设备前六个槽位上的磁盘，分别开出一个相同大小的空间，使用一个路径挂载上去），将mon数据库放在共享盘上，可以有效防止mon数据库损坏而引起的严重问题。
但是如果可以通过osd来恢复mon数据库，就不用担心上述问题了。也就可以去掉共享盘这个规避手段。
在开始下文前需要先了解一下cephx相关的知识，见另一篇博文：
ceph秘钥管理机制
了解了这部分内容，还需要学习一下ceph auth相关的命令，见博文：
ceph auth 命令收集
 ceph-authtool命令学习
 monmaptool命令学习

如果集群所有mon损坏，执行以下步骤

停止集群所有osd服务（每个存储节点执行如下命令）

for i in `lsblk|grep /var/lib|awk 'BEGIN{FS="-"}{print $2}'`;do hcli ceph disk stop --name ceph -d $i & done

检查是否所有osd状态标记为down

ceph osd tree |grep up

如果存在up状态的osd，进行如下处理

1、ceph osd find osd.xx  查看osd所在节点
2、ssh xx,xx,xx,xx登录osd所在节点
3、查看osd服务是否运行ps -ef | grep ceph-osd
4、如果服务运行，执行命令hcli ceph disk stop --name ceph -d $i。
5、如果服务停止但是osd状态为up，执行命令ceph osd down osd.xx
6、待所有osd标记为down后再进行下面操作

查看mon所在节点

执行命令：ceph mon stat 打印如下：
e3: 5 mons at {node4=10.193.56.104:6789/0,node5=10.193.56.105:6789/0,node6=10.193.56.106:6789/0,node7=10.193.56.107:6789/0,node8=10.193.56.108:6789/0}, election epoch 114, leader 0 node4, quorum 0,1,2,3,4 node4,node5,node6,node7,node8
表示mon所在节点分别为：
node4=10.193.56.104
node5=10.193.56.105
node6=10.193.56.106
node7=10.193.56.107
node8=10.193.56.108

备份mon数据库，停mon、mgr进程

bcli dm mv /var/lib/ceph/mon/ceph-`hostname`  /var/lib/ceph/mon/ceph-`hostname`_bak
systemctl stop ceph-mon@`hostname`
systemctl stop ceph-mgr@`hostname`

查看总共需要恢复的osdmap数量

osd map恢复够这个数量，能够保证osd集群map正常恢复
ceph-kvstore-tool rocksdb /var/lib/ceph/mon/ceph-node3_bak/store.db list|grep osdmap|grep -v full|grep -v com|grep -v heal|wc -l

收集数据库信息
强调：下述命令需要在所有mon节点依次执行，每执行完一个节点，需要将/tmp/mon-store中的数据scp到下一个mon节点，然后重复执行下面命令

mkdir /tmp/mon-store
cd /tmp/mon-store
for i in `lsblk|grep /var/lib|awk 'BEGIN{FS="-"}{print $2}'`;do ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-"$i"/ --op update-mon-db --mon-store-path /tmp/mon-store/;done
cp -ra db/* store.db/

当所有mon节点信息收集完成之后，指定最终收集的信息，重建mon数据库

ceph-monstore-tool /tmp/mon-store rebuild

给mon所在节点新建默认数据库目录(依次在mon节点执行如下命令)

mkdir /var/lib/ceph/mon/ceph-`hostname`

将重建好的数据库文件拷贝到刚刚创建的目录下

cp -ra /tmp/mon-store/* /var/lib/ceph/mon/ceph-`hostname`

增加数据库文件

touch /var/lib/ceph/mon/ceph-`hostname`/done
touch /var/lib/ceph/mon/ceph-`hostname`/systemd

添加mon秘钥

cp  /etc/ceph/ceph.mon.keyring /var/lib/ceph/mon/ceph-`hostname`/keyring

修改数据库用户组

chown ceph:ceph -R /var/lib/ceph/mon/

检查mon是否可以拉起

1、执行如下命令
/usr/bin/ceph-mon -f --cluster ceph --id `hostname` --setuser ceph --setgroup ceph
若打印如下：
2019-08-07 17:07:26.335226 7f3af661de40 -1 mon.node3@-1(probing) e0 error: cluster_uuid file exists with value 64470064-15c0-4dbe-9b48-a542fd152dd9, != our uuid 447006ff-15c0-f4db-ff9b-48fa542fd152
则需要重新创建monmap
否则ctrl + C退出
2、如果需要重新创建mon则执行如下命令：
	a、创建新map，注：fsid见/etc/ceph/ceph.conf
	monmaptool --create --fsid 64470064-15c0-4dbe-9b48-a542fd152dd9 --add node2 10.193.55.133:6789  monmap 
	b、将map注入数据库
	ceph-mon -i node3 --inject-monmap monmap

增加mgr秘钥

1、增加秘钥
ceph auth add mgr.`hostname` --name mon. --keyring /var/lib/ceph/mon/ceph-`hostname`/keyring 
注：如果上述命令hang住，执行nohup /usr/bin/ceph-mon -f --cluster ceph --id `hostname` --setuser ceph --setgroup ceph &即可
2、给mgr秘钥增加权限
ceph -n mon. --keyring /var/lib/ceph/mon/ceph-`hostname`/keyring  auth caps mgr.`hostname` mds 'allow *' osd 'allow *' mon 'allow profile mgr'
3、修改配置文件和新生成秘钥一致
ceph auth get mgr.`hostname` --name mon. --keyring /var/lib/ceph/mon/ceph-`hostname`/keyring
vim /var/lib/ceph/mgr/ceph-`hostname`/keyring

增加client.admin秘钥

1、增加秘钥
ceph auth add client.admin --name mon. --keyring /var/lib/ceph/mon/ceph-`hostname`/keyring
2、给client.admin增加权限
ceph -n mon. --keyring /var/lib/ceph/mon/ceph-`hostname`/keyring  auth caps client.admin mds 'allow *' osd 'allow *' mon 'allow *' mgr 'allow *'
3、修改配置文件和新生成秘钥一致
ceph auth get client.admin --name mon. --keyring /var/lib/ceph/mon/ceph-`hostname`/keyring
vim /etc/ceph/ceph.client.admin.keyring

增加client.bootstrap-mds秘钥

1、增加秘钥
ceph auth add client.bootstrap-mds --name mon. --keyring /var/lib/ceph/mon/ceph-`hostname`/keyring
2、给client.admin增加权限
ceph -n mon. --keyring /var/lib/ceph/mon/ceph-`hostname`/keyring  auth caps client.bootstrap-mds  mon 'allow profile bootstrap-mds' 
3、修改配置文件和新生成秘钥一致
ceph auth get client.bootstrap-mds --name mon. --keyring /var/lib/ceph/mon/ceph-`hostname`/keyring
vim /etc/ceph/ceph.bootstrap-mds.keyring

增加client.bootstrap-mgr秘钥

1、增加秘钥
ceph auth add client.bootstrap-mgr --name mon. --keyring /var/lib/ceph/mon/ceph-`hostname`/keyring
2、给client.admin增加权限
ceph -n mon. --keyring /var/lib/ceph/mon/ceph-`hostname`/keyring  auth caps client.bootstrap-mgr  mon 'allow profile bootstrap-mgr' 
3、修改配置文件和新生成秘钥一致
ceph auth get client.bootstrap-mgr --name mon. --keyring /var/lib/ceph/mon/ceph-`hostname`/keyring
vim /etc/ceph/ceph.bootstrap-mgr.keyring

增加client.bootstrap-osd秘钥

1、增加秘钥
ceph auth add client.bootstrap-osd --name mon. --keyring /var/lib/ceph/mon/ceph-`hostname`/keyring
2、给client.admin增加权限
ceph -n mon. --keyring /var/lib/ceph/mon/ceph-`hostname`/keyring  auth caps client.bootstrap-osd  mon 'allow profile bootstrap-osd' 
3、修改配置文件和新生成秘钥一致
ceph auth get client.bootstrap-osd --name mon. --keyring /var/lib/ceph/mon/ceph-`hostname`/keyring
vim /etc/ceph/ceph.bootstrap-osd.keyring

增加client.bootstrap-rgw秘钥

1、增加秘钥
ceph auth add client.bootstrap-rgw --name mon. --keyring /var/lib/ceph/mon/ceph-`hostname`/keyring
2、给client.admin增加权限
ceph -n mon. --keyring /var/lib/ceph/mon/ceph-`hostname`/keyring  auth caps client.bootstrap-rgw  mon 'allow profile bootstrap-rgw' 
3、修改配置文件和新生成秘钥一致
ceph auth get client.bootstrap-rgw --name mon. --keyring /var/lib/ceph/mon/ceph-`hostname`/keyring
vim /etc/ceph/ceph.bootstrap-rgw.keyring

依次重启mon和mgr节点服务

systemctl restart ceph-mon@`hostname`
systemctl restart ceph-mgr@`hostname`

重启节点所有osd

for i in `lsblk|grep /var/lib|awk 'BEGIN{FS="-"}{print $2}'`;do hcli ceph disk restart --name ceph -d  $i & done

异常排查：
1、ceph服务主要包括mon、mgr、osd、mds，重启这些服务一旦出现Operation not permitted启动失败字样，则进行如下处理（举例osd秘钥恢复）

1、获取数据库中记录的osd秘钥
ceph auth get osd.1 --name mon. --keyring /var/lib/ceph/mon/ceph-`hostname`/keyring
2、查看osd配置文件中记录的osd秘钥
cat /var/lib/ceph/osd/ceph-1/keyring
3、如果两个秘钥不一样，则修改配置文件中的秘钥为数据库中记录信息，重新启动即可
4、如果获取数据库中的osd秘钥返回为空，则重新生成osd秘钥
5、添加osd秘钥
ceph auth add osd.1 --name mon. --keyring /var/lib/ceph/mon/ceph-`hostname`/keyring
6、给秘钥添加权限
ceph -n mon. --keyring /var/lib/ceph/mon/ceph-`hostname`/keyring  auth caps osd.1 mds 'allow *' osd 'allow *' mon 'allow *'
7、修改/var/lib/ceph/osd/ceph-1/keyring为新添加的秘钥
8、重新启动服务

参考文章：
https://blog.csdn.net/penglaixy/article/details/79296873
https://toutiao.io/subjects/285

官网原文：
https://docs.ceph.com/docs/master/rados/troubleshooting/troubleshooting-mon/

RECOVERY USING OSDS

But what if all monitors fail at the same time? Since users are encouraged to deploy at least three (and preferably five) monitors in a Ceph cluster, the chance of simultaneous failure is rare. But unplanned power-downs in a data center with improperly configured disk/fs settings could fail the underlying filesystem, and hence kill all the monitors. In this case, we can recover the monitor store with the information stored in OSDs.:

ms=/root/mon-store 
mkdir $ms

 # collect the cluster map from stopped OSDs 
 for host in $hosts; do 
 rsync -avz $ms/. user@$host:$ms.remote 
 rm -rf $ms 
 ssh user@$host <<EOF 
 for osd in /var/lib/ceph/osd/ceph-*; do 
 ceph-objectstore-tool --data-path \$osd --no-mon-config --op update-mon-db --mon-store-path $ms.remote 
 done 
 EOF
rsync -avz user@$host:$ms.remote/. $ms done 
# rebuild the monitor store from the collected map, if the cluster does not 
# use cephx authentication, we can skip the following steps to update the
# keyring with the caps, and there is no need to pass the "--keyring" option. # i.e. just use "ceph-monstore-tool $ms rebuild" instead 
ceph-authtool /path/to/admin.keyring -n mon. \ --cap mon 'allow *' 
ceph-authtool /path/to/admin.keyring -n client.admin \ --cap mon 'allow *' --cap osd 'allow *' --cap mds 'allow *'
 ceph-monstore-tool $ms rebuild -- --keyring /path/to/admin.keyring 
 # make a backup of the corrupted store.db just in case! repeat for 
 # all monitors. 
 mv /var/lib/ceph/mon/mon.foo/store.db /var/lib/ceph/mon/mon.foo/store.db.corrupted 
 # move rebuild store.db into place. repeat for all monitors.
  mv $ms/store.db /var/lib/ceph/mon/mon.foo/store.db 
  chown -R ceph:ceph /var/lib/ceph/mon/mon.foo/store.db

The steps above
1、collect the map from all OSD hosts,
2、then rebuild the store,
3、fill the entities in keyring file with appropriate caps
4、replace the corrupted store on mon.foo with the recovered copy.

KNOWN LIMITATIONS

Following information are not recoverable using the steps above:

some added keyrings: all the OSD keyrings added using ceph auth add command are recovered from the OSD’s copy. And the client.admin keyring is imported using ceph-monstore-tool. But the MDS keyrings and other keyrings are missing in the recovered monitor store. You might need to re-add them manually.

creating pools: If any RADOS pools were in the process of being creating, that state is lost. The recovery tool assumes that all pools have been created. If there are PGs that are stuck in the ‘unknown’ after the recovery for a partially created pool, you can force creation of the emptyPG with the ceph osd force-create-pg command. Note that this will create an empty PG, so only do this if you know the pool is empty.

MDS Maps: the MDS maps are lost.

学无止境966

发布了297 篇原创文章 · 获赞 6 · 访问量 8567

私信关注

前言：

如果集群所有mon损坏，执行以下步骤

猜你喜欢