rabbitmq线上遇到的问题以及集群部署遇到的坑

版权声明: https://blog.csdn.net/zhuangzi123456/article/details/83858000

操作系统发行版:CentOS7

RabbitMQ版本:3.6.11

服务器主机规划:

10.168.17.102 mq07.mq-cluster.mall.lt.com

10.168.17.98 mq08.mq-cluster.mall.lt.com

10.168.17.64 mq09.mq-cluster.mall.lt.com

1,在三台服务器上分别编辑以下文件:

vim /etc/rabbitmq/rabbitmq-env.conf

NODENAME=rabbit@mq07-mq-cluster

vim /etc/rabbitmq/rabbitmq-env.conf

NODENAME=rabbit@mq08-mq-cluster

vim /etc/rabbitmq/rabbitmq-env.conf

NODENAME=rabbit@mq09-mq-cluster

这里最好配置一下NODENAME。

2,添加解析,修改配置文件/etc/hosts

10.168.17.102 mq07.mq-cluster.mall.lt.com mq07-mq-cluster

10.168.17.98 mq08.mq-cluster.mall.lt.com mq08-mq-cluster

10.168.17.64 mq09.mq-cluster.mall.lt.com mq09-mq-cluster

注意:hosts中配置的这几条后面的简称主机名必须跟上面的NODENAME变量中@后面的那个字符串一致

3,/usr/lib/systemd/system/rabbitmq-server.service

务必注意,centos7上的rabbitmq和es之类的service文件中必须指定下面标黄的两个参数,不然systemd不会去读取/etc/security/limits.conf配置,也就是不生效,rabbitmq的disk节点一旦打满会导致整个集群挂掉;今天就是遇到了这个线上的问题,可打开文件描述符耗尽,导致rabbitmq集群挂掉,而且重启后立即挂掉,因为业务比较繁忙,所以导致重启后的rabbitmq会立即耗尽1024。

说明:默认安装rabbitmq之后,直接启动,文件描述符为1024,proc也是1024,即使你修改了/etc/security/limits.conf以及limits.conf.d目录下的子文件为65536,依然如此,这一点务必注意;

[Unit]

Description=RabbitMQ broker

After=syslog.target network.target

[Service]

Type=notify

User=rabbitmq

Group=rabbitmq

LimitNOFILE=65536

LimitNPROC=65535

WorkingDirectory=/var/lib/rabbitmq

ExecStart=/usr/sbin/rabbitmq-server

ExecStop=/usr/sbin/rabbitmqctl stop

ExecStop=/bin/sh -c "while ps -p $MAINPID >/dev/null 2>&1; do sleep 1; done"

NotifyAccess=all

TimeoutStartSec=3600

[Install]

WantedBy=multi-user.target

4,配置文件

默认是0.4,现在改成是0.8,机器的内存为64G。

创建或修改配置文件:

/etc/rabbitmq/rabbitmq.config

[

{rabbit,

[

{vm_memory_high_watermark, 0.8}

%% {vm_memory_high_watermark, {absolute, "40G"}}

]

}

].

注意:最后面的点结尾“.”

5,问题:

[root@mq08 ~]# journalctl -xe

Oct 19 19:48:04 mq08.mq-cluster.mall.lt.com systemd[1]: rabbitmq-server.service: main process exited, code=exited, status=1/FAILURE

Oct 19 19:48:05 mq08.mq-cluster.mall.lt.com rabbitmqctl[4516]: Error: Failed to initialize erlang distribution: {{shutdown,

Oct 19 19:48:05 mq08.mq-cluster.mall.lt.com rabbitmqctl[4516]: {failed_to_start_child,

Oct 19 19:48:05 mq08.mq-cluster.mall.lt.com rabbitmqctl[4516]: auth,

Oct 19 19:48:05 mq08.mq-cluster.mall.lt.com rabbitmqctl[4516]: {"Cookie file ./.erlang.cookie must be accessible by owner only",

Oct 19 19:48:05 mq08.mq-cluster.mall.lt.com rabbitmqctl[4516]: [{auth,init_cookie,0,

Oct 19 19:48:05 mq08.mq-cluster.mall.lt.com rabbitmqctl[4516]: [{file,"auth.erl"},

Oct 19 19:48:05 mq08.mq-cluster.mall.lt.com rabbitmqctl[4516]: {line,286}]},

Oct 19 19:48:05 mq08.mq-cluster.mall.lt.com rabbitmqctl[4516]: {auth,init,1,

Oct 19 19:48:05 mq08.mq-cluster.mall.lt.com rabbitmqctl[4516]: [{file,"auth.erl"},

Oct 19 19:48:05 mq08.mq-cluster.mall.lt.com rabbitmqctl[4516]: {line,140}]},

Oct 19 19:48:05 mq08.mq-cluster.mall.lt.com rabbitmqctl[4516]: {gen_server,init_it,2,

Oct 19 19:48:05 mq08.mq-cluster.mall.lt.com rabbitmqctl[4516]: [{file,

Oct 19 19:48:05 mq08.mq-cluster.mall.lt.com rabbitmqctl[4516]: "gen_server.erl"},

Oct 19 19:48:05 mq08.mq-cluster.mall.lt.com rabbitmqctl[4516]: {line,365}]},

Oct 19 19:48:05 mq08.mq-cluster.mall.lt.com rabbitmqctl[4516]: {gen_server,init_it,6,

Oct 19 19:48:05 mq08.mq-cluster.mall.lt.com rabbitmqctl[4516]: [{file,

Oct 19 19:48:05 mq08.mq-cluster.mall.lt.com rabbitmqctl[4516]: "gen_server.erl"},

Oct 19 19:48:05 mq08.mq-cluster.mall.lt.com rabbitmqctl[4516]: {line,333}]},

Oct 19 19:48:05 mq08.mq-cluster.mall.lt.com rabbitmqctl[4516]: {proc_lib,

Oct 19 19:48:05 mq08.mq-cluster.mall.lt.com rabbitmqctl[4516]: init_p_do_apply,3,

Oct 19 19:48:05 mq08.mq-cluster.mall.lt.com rabbitmqctl[4516]: [{file,"proc_lib.erl"},

Oct 19 19:48:05 mq08.mq-cluster.mall.lt.com rabbitmqctl[4516]: {line,247}]}]}}},

Oct 19 19:48:05 mq08.mq-cluster.mall.lt.com rabbitmqctl[4516]: {child,undefined,

Oct 19 19:48:05 mq08.mq-cluster.mall.lt.com rabbitmqctl[4516]: net_sup_dynamic,

Oct 19 19:48:05 mq08.mq-cluster.mall.lt.com rabbitmqctl[4516]: {erl_distribution,

Oct 19 19:48:05 mq08.mq-cluster.mall.lt.com rabbitmqctl[4516]: start_link,

Oct 19 19:48:05 mq08.mq-cluster.mall.lt.com rabbitmqctl[4516]: [['rabbitmq-cli-27',

Oct 19 19:48:05 mq08.mq-cluster.mall.lt.com rabbitmqctl[4516]: shortnames],

Oct 19 19:48:05 mq08.mq-cluster.mall.lt.com rabbitmqctl[4516]: false]},

Oct 19 19:48:05 mq08.mq-cluster.mall.lt.com rabbitmqctl[4516]: permanent,1000,supervisor,

Oct 19 19:48:05 mq08.mq-cluster.mall.lt.com rabbitmqctl[4516]: [erl_distribution]}}.

Oct 19 19:48:05 mq08.mq-cluster.mall.lt.com systemd[1]: rabbitmq-server.service: control process exited, code=exited status=75

Oct 19 19:48:05 mq08.mq-cluster.mall.lt.com systemd[1]: Failed to start RabbitMQ broker.

-- Subject: Unit rabbitmq-server.service has failed

-- Defined-By: systemd

-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel

--

-- Unit rabbitmq-server.service has failed.

--

-- The result is failed.

Oct 19 19:48:05 mq08.mq-cluster.mall.lt.com systemd[1]: Unit rabbitmq-server.service entered failed state.

Oct 19 19:48:05 mq08.mq-cluster.mall.lt.com systemd[1]: rabbitmq-server.service failed.

Oct 19 19:48:05 mq08.mq-cluster.mall.lt.com polkitd[1055]: Unregistered Authentication Agent for unix-process:4237:24929114 (system bus name :1.6179, object path /org/freedesktop/PolicyKit1/AuthenticationAgen

解决办法:

chown rabbitmq:rabbitmq /var/lib/rabbitmq/.erlang.cookie

chmod 600 /var/lib/rabbitmq/.erlang.cookie

6,创建账号

rabitmqctl enable rabbitmq_management

rabbitmqctl add_user limu 123456

rabbitmqctl set_user_tags limu administrator

rabbitmqctl set_permissions -p / limu ".*" ".*" ".*"

7,问题

[root@mq07 ~]# systemctl status rabbitmq-server.service

● rabbitmq-server.service - RabbitMQ broker

Loaded: loaded (/usr/lib/systemd/system/rabbitmq-server.service; enabled; vendor preset: disabled)

Active: failed (Result: exit-code) since Fri 2018-10-19 20:02:17 CST; 9s ago

Process: 20821 ExecStop=/bin/sh -c while ps -p $MAINPID >/dev/null 2>&1; do sleep 1; done (code=exited, status=0/SUCCESS)

Process: 20481 ExecStop=/usr/sbin/rabbitmqctl stop (code=exited, status=0/SUCCESS)

Process: 20202 ExecStart=/usr/sbin/rabbitmq-server (code=exited, status=1/FAILURE)

Main PID: 20202 (code=exited, status=1/FAILURE)

Oct 19 20:02:17 mq07.mq-cluster.mall.lt.com rabbitmqctl[20481]: attempted to contact: ['rabbit@mq07-mq-cluster']

Oct 19 20:02:17 mq07.mq-cluster.mall.lt.com rabbitmqctl[20481]: rabbit@mq07-mq-cluster:

Oct 19 20:02:17 mq07.mq-cluster.mall.lt.com rabbitmqctl[20481]: * unable to connect to epmd (port 4369) on mq07-mq-cluster: address (cannot connect to host/port)

Oct 19 20:02:17 mq07.mq-cluster.mall.lt.com rabbitmqctl[20481]: current node details:

Oct 19 20:02:17 mq07.mq-cluster.mall.lt.com rabbitmqctl[20481]: - node name: 'rabbitmq-cli-79@mq07'

Oct 19 20:02:17 mq07.mq-cluster.mall.lt.com rabbitmqctl[20481]: - home dir: .

Oct 19 20:02:17 mq07.mq-cluster.mall.lt.com rabbitmqctl[20481]: - cookie hash: 5lJVl9Km+lOXAsr8i4xIVA==

Oct 19 20:02:17 mq07.mq-cluster.mall.lt.com systemd[1]: Failed to start RabbitMQ broker.

Oct 19 20:02:17 mq07.mq-cluster.mall.lt.com systemd[1]: Unit rabbitmq-server.service entered failed state.

Oct 19 20:02:17 mq07.mq-cluster.mall.lt.com systemd[1]: rabbitmq-server.service failed.

最终问题:

这个报错信息的意思是:无法解析mq07-mq-cluster主机名,或者解析了该域名得到的IP地址不是本机的。

解决办法:

1,场景一:本机机器IP为10.168.17.102,但/etc/hosts错配置成了10.168.17.10 mq07.mq-cluster.mall.lt.com mq07-mq-cluster。

修正IP即可10.168.17.102 mq07.mq-cluster.mall.lt.com mq07-mq-cluster

2,场景二:/etc/rabbitmq/rabbitmq-env.conf文件中NODENAME=rabbit@mq09-mq-cluster,但是/etc/hosts中配置的是

10.168.17.64 mq09.mq-cluster.mall.lt.com mq09-cluster

解决办法:把/etc/hosts中的mq09-cluster改成mq09-mq-cluster

8,添加镜像队列的策略

因为策略是针对vhost添加的,所以每添加一个vhost,都要执行一下添加镜像队列的这条命令

rabbitmqctl set_policy -p /admin "ha-allqueue" '{"ha-mode":"all","ha-sync-mode":"automatic"}

猜你喜欢

转载自blog.csdn.net/zhuangzi123456/article/details/83858000
今日推荐