GlusterFS故障模拟

GlusterFS故障模拟

1、硬盘故障

如果底层做了raid,则直接更换硬盘即可。

如果没有做raid的处理方法:

  1. 正常node执行gluster volume status 记录故障节点的uuid
  2. 正常node执行getfattr -d -m '.*' /brick
  3. 记录trusted.glusterfs.volume-id 及 trusted.gfid
  4. setfattr -n trusted.glusterfs.volume-id -v 记录值 brickpath

    setfattr -n trusted.gfid -v 记录值 brickpath
  5. 拷贝.glusterfs目录
  6. 重启glusterd

故障模拟

虚拟机移除一个磁盘,然后新加一个磁盘

[root@node2 gv1]# gluster volume status
Status of volume: gv1
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick node1:/storage/brick2                 49153     0          Y       19812
Brick node2:/storage/brick2                 N/A       N/A        N       N/A
Brick node1:/storage/brick1                 49154     0          Y       19834
Brick node2:/storage/brick1                 49154     0          Y       19103
Self-heal Daemon on localhost               N/A       N/A        Y       19126
Self-heal Daemon on node1                   N/A       N/A        Y       19857

Task Status of Volume gv1
------------------------------------------------------------------------------
There are no active volume tasks

[root@node2 gv1]# mkfs.xfs -f /dev/sdc
meta-data=/dev/sdc               isize=512    agcount=4, agsize=131072 blks
         =                       sectsz=512   attr=2, projid32bit=1
         =                       crc=1        finobt=0, sparse=0
data     =                       bsize=4096   blocks=524288, imaxpct=25
         =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096   ascii-ci=0 ftype=1
log      =internal log           bsize=4096   blocks=2560, version=2
         =                       sectsz=512   sunit=0 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0
[root@node2 gv1]# mount -a
[root@node2 gv1]# df -h
Filesystem               Size  Used Avail Use% Mounted on
/dev/mapper/centos-root   17G  1.9G   16G  11% /
devtmpfs                 485M     0  485M   0% /dev
tmpfs                    496M     0  496M   0% /dev/shm
tmpfs                    496M  7.1M  489M   2% /run
tmpfs                    496M     0  496M   0% /sys/fs/cgroup
/dev/sda1               1014M  130M  885M  13% /boot
tmpfs                    100M     0  100M   0% /run/user/0
/dev/sdb                 2.0G   35M  2.0G   2% /storage/brick1
127.0.0.1:/gv1           4.0G  110M  3.9G   3% /mnt/gv1
/dev/sdc                 2.0G   33M  2.0G   2% /storage/brick2
[root@node2 gv1]# ls /storage/brick2

[root@node1 brick2]# getfattr -d -m '.*' /storage/brick2
getfattr: Removing leading '/' from absolute path names
# file: storage/brick2
trusted.afr.dirty=0sAAAAAAAAAAAAAAAA
trusted.afr.gv1-client-3=0sAAAAAAAAAAAAAAAF
trusted.gfid=0sAAAAAAAAAAAAAAAAAAAAAQ==
trusted.glusterfs.dht=0sAAAAAQAAAAAAAAAAf////g==
trusted.glusterfs.dht.commithash="3688746489"
trusted.glusterfs.volume-id=0sTGg/nUxsS+eaTMeppJ3aRw==
setfattr -n trusted.afr.dirty -v 0sAAAAAAAAAAAAAAAA /storage/brick2
setfattr -n trusted.afr.gv1-client-3 -v 0sAAAAAAAAAAAAAAAF /storage/brick2
setfattr -n trusted.gfid -v 0sAAAAAAAAAAAAAAAAAAAAAQ== /storage/brick2
setfattr -n trusted.glusterfs.dht -v 0sAAAAAQAAAAAAAAAAf////g== /storage/brick2
setfattr -n trusted.glusterfs.dht.commithash -v "3688746489" /storage/brick2
setfattr -n trusted.glusterfs.volume-id -v 0sTGg/nUxsS+eaTMeppJ3aRw== /storage/brick2

==可以不必设置这么多,只设置trusted.glusterfs.volume-id、trusted.gfid即可==

[root@node1 brick2]# getfattr -d -m '.*' /storage/brick2
getfattr: Removing leading '/' from absolute path names
# file: storage/brick2
trusted.afr.dirty=0sAAAAAAAAAAAAAAAA
trusted.afr.gv1-client-3=0sAAAAAAAAAAAAAAAF
trusted.gfid=0sAAAAAAAAAAAAAAAAAAAAAQ==
trusted.glusterfs.dht=0sAAAAAQAAAAAAAAAAf////g==
trusted.glusterfs.dht.commithash="3688746489"
trusted.glusterfs.volume-id=0sTGg/nUxsS+eaTMeppJ3aRw==

[root@node1 brick2]# systemctl restart glusterd

根据以上步骤重试多次均无法恢复:

在上述步骤的基础上将node1上的/storage/brick2/.glusterfs文件夹复制到node2上的对应的文件夹,在执行
systemctl restart glusterd
恢复成功

2、一台主机故障

  1. 找一台完全一样的机器,至少要保证硬盘数量和大小一致,安装系统,配置和故障机同样的ip,安装gluster软件,保证配置都一样,在其他的健康节点执行命令gluster peer status ,查看故障服务器的uuid。
  2. 修改新加机器的/var/lib/glusterd/glusterd.info和故障机一样
  3. 新机器上配置磁盘操作(同磁盘故障)
  4. 新机器加入集群gluster peer probe node2
  5. 重启systemctl restart glusterd
  6. 同步数据
    gluster volume sync master all(不知是否可以不执行,未经测试)
    重启systemctl restart glusterd
  7. 在任意节点执行gluster volume heal gv1 full

猜你喜欢

转载自www.cnblogs.com/banyungong666/p/9644947.html
今日推荐