rabbitmq高可用集群搭建踩坑
搭建rabbtmq集群时,执行 rabbitmqctl join_cluster rabbit@rabbit-node1报错
Clustering node rabbit@slave1 with rabbit@rabbit-node1 Error: unable
to perform an operation on node ‘rabbit@rabbit-node1’. Please see
diagnostics information and suggestions below.
Most common reasons for this are:
Target node is unreachable (e.g. due to hostname resolution, TCP connection or firewall issues)
CLI tool fails to authenticate with the server (e.g. due to CLI tool’s Erlang cookie not matching that of the server)
Target node is not running
In addition to the diagnostics info below:See the CLI, clustering and networking guides on http://rabbitmq.com/documentation.html to learn more
Consult server logs on node rabbit@rabbit-node1
DIAGNOSTICS
attempted to contact: [‘rabbit@rabbit-node1’]
rabbit@rabbit-node1: * connected to epmd (port 4369) on rabbit-node1 * epmd reports node ‘rabbit’ uses port 25672 for
inter-node and CLI tool traffic * TCP connection succeeded but
Erlang distribution failedHostname mismatch: node “rabbit@master” believes its host is different. Please ensure that hostnames resolve the same way locally
and on “rabbit@master”Current node details: * node name: ‘rabbitmqcli-14907-rabbit@slave1’ * effective user’s home
directory: /var/lib/rabbitmq * Erlang cookie hash:
N9VmcKjlLemcjmGbsPIdkw==
定位问题
文中有这样的提示:
Most common reasons for this are:
- Target node is unreachable (e.g. due to hostname resolution, TCP connection or firewall issues)
- CLI tool fails to authenticate with the server (e.g. due to CLI tool’s Erlang cookie not matching that of the server)
- Target node is not running
- 1.检查防火墙和网络连接:发现防火墙是关闭的,3台机之间ping hostname可以ping通
- 2.检查cookie文件:本人是使用rpm安装的,cookie文件的路径:/var/lib/rabbitmq/.erlang.cookie,3台机的.erlang.cookie文件都是一样的,且权限都是400
- 3.检查rabbit-node1节点上rabbitmq-server状态:目标节点运行正常
- 发现问题不在这里,往下看
Hostname mismatch: node “rabbit@master” believes its host is
different. Please ensure that hostnames resolve the same way locally
and on “rabbit@master”
- 于是修改rabbitmq-env.conf配置文件(rabbitmq默认路径:/etc/rabbitmq/rabbitmq-env.conf)
- 在集群每台机器上执行 vim /etc/rabbitmq/rabbitmq-env.conf(该文件默认不存在,需手动添加),添加配置如下
[root@master rabbitmq]# vim /etc/rabbitmq/rabbitmq-env.conf
RABBITMQ_NODENAME=rabbit@rabbit-node1
~
~
~
- rabbit@后面是rabbit集群每台机器hosts中配置的hostname,如:
192.168.72.127 rabbit-node1
192.168.72.128 rabbit-node2
192.168.72.129 rabbit-node3
每台机器配置好后,执行ps -aux|grep mq 查看所有rabbitmq进程,然后kill -9 杀死所有RabbitMQ进程
[root@slave2 home]# ps -aux|grep mq
Warning: bad syntax, perhaps a bogus '-'? See /usr/share/doc/procps-3.2.8/FAQ
rabbitmq 8106 0.0 0.2 10968 560 ? S 02:33 0:00 /usr/lib64/erlang/erts-10.2.1/bin/epmd -daemon
root 24738 0.0 0.0 108700 136 pts/1 S 05:21 0:00 /bin/sh /etc/init.d/rabbitmq-server start
root 24810 0.0 0.2 108208 472 pts/1 S 05:21 0:00 /bin/bash -c ulimit -S -c 0 >/dev/null 2>&1 ; /usr/sbin/rabbitmq-server
root 24812 0.0 0.2 130732 524 pts/1 S 05:21 0:00 /sbin/runuser -s /bin/sh -- rabbitmq /usr/lib/rabbitmq/bin/rabbitmq-server
rabbitmq 24842 0.0 0.2 106108 484 pts/1 S 05:21 0:00 sh /usr/lib/rabbitmq/bin/rabbitmq-server
rabbitmq 25010 0.4 18.4 1813924 41988 pts/1 Sl 05:21 0:19 /usr/lib64/erlang/erts-10.2.1/bin/beam.smp -W w -A 64 -MBas ageffcbf -MHas ageffcbf -MBlmbcs 512 -MHlmbcs 512 -MMmcs 30 -P 1048576 -t 5000000 -stbt db -zdbbl 128000 -K true -B i -- -root /usr/lib64/erlang -progname erl -- -home /var/lib/rabbitmq -- -pa /usr/lib/rabbitmq/lib/rabbitmq_server-3.7.9/ebin -noshell -noinput -s rabbit boot -sname rabbit@slave2 -boot start_sasl -kernel inet_default_connect_options [{nodelay,true}] -sasl errlog_type error -sasl sasl_error_logger false -rabbit lager_log_root "/var/log/rabbitmq" -rabbit lager_default_file "/var/log/rabbitmq/[email protected]" -rabbit lager_upgrade_file "/var/log/rabbitmq/rabbit@slave2_upgrade.log" -rabbit enabled_plugins_file "/etc/rabbitmq/enabled_plugins" -rabbit plugins_dir "/usr/lib/rabbitmq/plugins:/usr/lib/rabbitmq/lib/rabbitmq_server-3.7.9/plugins" -rabbit plugins_expand_dir "/var/lib/rabbitmq/mnesia/rabbit@slave2-plugins-expand" -os_mon start_cpu_sup false -os_mon start_disksup false -os_mon start_memsup false -mnesia dir "/var/lib/rabbitmq/mnesia/rabbit@slave2" -kernel inet_dist_listen_min 25672 -kernel inet_dist_listen_max 25672
rabbitmq 25108 0.0 0.1 4064 388 ? Ss 05:21 0:00 erl_child_setup 1024
rabbitmq 25135 0.0 0.1 10800 448 ? Ss 05:21 0:00 inet_gethost 4
rabbitmq 25136 0.0 0.3 17128 696 ? S 05:21 0:00 inet_gethost 4
重启RabbitMQ服务加入集群
[root@slave1 rabbitmq]# service rabbitmq-server start
Starting rabbitmq-server: SUCCESS
rabbitmq-server.
[root@slave1 rabbitmq]# rabbitmqctl stop_app
Stopping rabbit application on node rabbit@rabbit-node2 ...
[root@slave1 rabbitmq]# rabbitmqctl join_cluster rabbit@rabbit-node1
Clustering node rabbit@rabbit-node2 with rabbit@rabbit-node1
[root@slave1 rabbitmq]# rabbitmqctl start_app
Starting node rabbit@rabbit-node2 ...
completed with 3 plugins.
查看集群状态
[root@slave1 ~]# rabbitmqctl cluster_status
Cluster status of node rabbit@rabbit-node2 ...
[{nodes,[{disc,['rabbit@rabbit-node1','rabbit@rabbit-node2',
'rabbit@rabbit-node3']}]},
{running_nodes,['rabbit@rabbit-node3','rabbit@rabbit-node1',
'rabbit@rabbit-node2']},
{cluster_name,<<"rabbit@slave2">>},
{partitions,[]},
{alarms,[{'rabbit@rabbit-node3',[]},
{'rabbit@rabbit-node1',[]},
{'rabbit@rabbit-node2',[]}]}]
可以看到rabbit-node2和rabbit-node3已成功加入集群,问题解决。