GlusterFS 安装报错汇总

1. Mounting glusterfs on /mnt/gluster0 failed.

[root@instance-bt783gwn ~]# mount -t glusterfs gfs-node0:/data0 /mnt/gluster0
Mounting glusterfs on /mnt/gluster0 failed.

检查日志文件:

[root@instance-bt783gwn ~]# cat /var/log/glusterfs/mnt-gluster0.log
[2020-03-23 08:46:52.167046] I [MSGID: 100030] [glusterfsd.c:2685:main] 0-/usr/local/sbin/glusterfs: Started running version [{arg=/usr/local/sbin/glusterfs}, {version=8dev}, {cmdlinestr=/usr/local/sbin/glusterfs --process-name fuse --volfile-server=gfs-node0 --volfile-id=/data0 /mnt/gluster0}]
[2020-03-23 08:46:52.168045] I [glusterfsd.c:2420:daemonize] 0-glusterfs: Pid of current running process is 2691
[2020-03-23 08:46:52.170606] I [socket.c:4230:ssl_setup_connection_params] 0-glusterfs: SSL support on the I/O path is ENABLED
[2020-03-23 08:46:52.170620] I [socket.c:4233:ssl_setup_connection_params] 0-glusterfs: SSL support for glusterd is ENABLED
[2020-03-23 08:46:52.170632] I [socket.c:4244:ssl_setup_connection_params] 0-glusterfs: using certificate depth 1
[2020-03-23 08:46:52.170772] I [socket.c:4289:ssl_setup_connection_params] 0-glusterfs: failed to open /etc/ssl/dhparam.pem, DH ciphers are disabled
[2020-03-23 08:46:52.170881] E [socket.c:4354:ssl_setup_connection_params] 0-glusterfs: could not load our cert at /etc/ssl/glusterfs.pem
[2020-03-23 08:46:52.170897] E [socket.c:239:ssl_dump_error_stack] 0-glusterfs:   error:02001002:system library:fopen:No such file or directory
[2020-03-23 08:46:52.170903] E [socket.c:239:ssl_dump_error_stack] 0-glusterfs:   error:20074002:BIO routines:FILE_CTRL:system lib
[2020-03-23 08:46:52.170910] E [socket.c:239:ssl_dump_error_stack] 0-glusterfs:   error:140DC002:SSL routines:SSL_CTX_use_certificate_chain_file:system lib
[2020-03-23 08:46:52.176324] I [MSGID: 101190] [event-epoll.c:670:event_dispatch_epoll_worker] 0-epoll: Started thread with index [{index=0}]
[2020-03-23 08:46:52.186937] I [MSGID: 101190] [event-epoll.c:670:event_dispatch_epoll_worker] 0-epoll: Started thread with index [{index=1}]
[2020-03-23 08:46:52.187740] W [socket.c:767:__socket_rwv] 0-glusterfs: writev on 127.0.0.1:24007 failed (Input/output error)
[2020-03-23 08:46:52.187772] E [socket.c:239:ssl_dump_error_stack] 0-glusterfs:   error:140840FF:SSL routines:ssl3_connect:unknown state
[2020-03-23 08:46:52.187833] I [glusterfsd-mgmt.c:2641:mgmt_rpc_notify] 0-glusterfsd-mgmt: disconnected from remote-host: gfs-node0
[2020-03-23 08:46:52.187841] I [glusterfsd-mgmt.c:2661:mgmt_rpc_notify] 0-glusterfsd-mgmt: Exhausted all volfile servers
[2020-03-23 08:46:52.188035] W [glusterfsd.c:1439:cleanup_and_exit] (-->/usr/local/lib/libgfrpc.so.0(+0xeb43) [0x7fddf34c4b43] -->/usr/local/sbin/glusterfs() [0x411578] -->/usr/local/sbin/glusterfs(cleanup_and_exit+0x5f) [0x409baf] ) 0-: received signum (1), shutting down
[2020-03-23 08:46:52.188066] I [fuse-bridge.c:6994:fini] 0-fuse: Unmounting '/mnt/gluster0'.
[2020-03-23 08:46:52.190582] I [fuse-bridge.c:6999:fini] 0-fuse: Closing fuse connection to '/mnt/gluster0'.

这是像我这样的笨蛋才会犯的错误，O(∩_∩)O哈哈~。

mount -t glusterfs gfs-node0:/data0 /mnt/gluster0

看行命令，gfs-node:/data0，这里不应该是我的磁盘挂载点。而是，要将/mnt/gluster0挂载到创建好的volume卷上，所以命令，应该为:

mount -t glusterfs gfs-node0:test-volume0 /mnt/gluster0

一下午，就这，果然是心猿意马惹的祸。。。

2. volume delete: test-volume0: failed: Some of the peers are down

错误提示

[root@VM_0_2_centos glusterfs]# gluster volume delete test-volume0
Deleting volume will erase all information about the volume. Do you want to continue? (y/n) y
volume delete: test-volume0: failed: Some of the peers are down

检查各个节点中peer的状态：

[root@VM_0_2_centos glusterfs]# gluster volume delete test-volume0
Deleting volume will erase all information about the volume. Do you want to continue? (y/n) y
volume delete: test-volume0: failed: Some of the peers are down

检测volume中brick中信息

[root@VM_0_2_centos glusterfs]# gluster volume info
Volume Name: test-volume0
Type: Replicate
Volume ID: b3357d7e-4277-4161-8f2d-b8c3c43d8ddb
Status: Stopped
Snapshot Count: 0
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: gfs-node0:/home/data0
Brick2: gfs-node1:/home/data0
Options Reconfigured:
performance.client-io-threads: off
nfs.disable: on
transport.address-family: inet
storage.fips-mode-rchecksum: on

先移出对应节点中的brick

[root@VM_0_2_centos glusterfs]# gluster volume remove-brick test-volume0 replica 1 gfs-node0:/home/data0 force
Remove-brick force will not migrate files from the removed bricks, so they will no longer be available on the volume.
Do you want to continue? (y/n) y
volume remove-brick commit force: success

再移出对应的peer

[root@VM_0_2_centos glusterfs]# gluster peer detach gfs-node0 force
All clients mounted through the peer which is getting detached need to be remounted using one of the other active peers in the trusted storage pool to ensure client gets notification on any changes done on the gluster configuration and if the same has been done do you want to proceed? (y/n) y
peer detach: success

销毁本地的volume

[root@VM_0_2_centos glusterfs]# gluster volume delete test-volume0
Deleting volume will erase all information about the volume. Do you want to continue? (y/n) y
volume delete: test-volume0: success

成功！

3. gfs-node1 is either already part of another cluster or having volumes configured

检查防火墙是否已经关闭

setenforce 0						# 关闭selinux

systemctl status firewalld 			# 检查防火墙的状态

如果是找不到本地ip，建议将/etc/hosts中的本地ip地址更改为127.0.0.1，例如：

[root@instance-bt783gwn /]# cat /etc/hosts
127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6
**.**.**.** instance-bt783gwn instance-bt783gwn.novalocal

127.0.0.1       gfs-node0   # 本地ip建议使用127.0.0.1
**.**.**.**  gfs-node1

can’t find gfs-node1:test-volume0 in /etc/fstab

我的解决办法应该没有参考意义，具体失败原因是由于源码改动导致的问题。

一台主机经过多次挂载后，出现挂载失败的错误！另一台主机，挂载并没有问题。

[root@VM_0_2_centos /]# mount -t glusterfs gfs-node1:test-volume0 /mnt/gluster0
Mounting glusterfs on /mnt/gluster0 failed.

为了看出更多错误提示信息

[root@VM_0_2_centos /]# mount -t glusterfs gfs-node1:test-volume0
mount: can't find gfs-node1:test-volume0 in /etc/fstab

分析后过滤掉网络连接不通的情况。认为极有可能是源码出现改动导致的问题。

所以git diff后发现，在main函数的parse_cmdline函数中误删除了一个语句:
process_mode = gf_get_process_mode(argv[0]);
这条语句是解析出cli运行的是哪一个进程:glusterfsd, glusterfs, glusterd。

也可以看到日志文件报错为，并没有分析出所以然来:

[2020-03-25 07:57:28.870198] E [MSGID: 114058] [client-handshake.c:1201:client_query_portmap_cbk] 0-test-volume0-client-0: failed to get the port number for remote subvolume. Please run gluster volume status on server to see if brick process is running []
[2020-03-25 07:58:00.578927] W [MSGID: 114031] [client-rpc-fops_v2.c:2625:client4_0_lookup_cbk] 0-test-volume0-client-0: remote operation failed. [{path=/}, {gfid=00000000-0000-0000-0000-000000000001}, {errno=107}, {error=Transport endpoint is not connected}]

修复完成代码后重新编译，如果直接挂载仍旧会挂载不上。这是因为，此时运行的守护进程仍旧是老的，此时需要重启client端和server端进程。
相应的就是重启相应的卷。

gluster volume stop ***
gluster volume start ***

此处也可以分析出来，gluster volume create **时，是创建出配置文件。当gluster volume start时是创建出server端和client端的守护进程。
至此，成功解决！

fighting--sky

发布了21 篇原创文章 · 获赞 0 · 访问量 2611

私信关注