RabbitMQ hands-on record – part 5-node cluster high availability (multi-server)

The previous part " RabbitMQ Getting Started Records – Part 4-Node Cluster (Single Machine Multi-Node) " introduced some concepts of RabbitMQ cluster and realized running multiple nodes on a single machine, and combining multiple nodes into a cluster.

 

Usually, the cluster nodes are not all placed on one server. The actual situation is distributed on different servers, so here we will deploy the cluster nodes on multiple servers. At the same time, based on the cluster nodes, we will then implement load balancing, and use code to demonstrate that after a node goes down, it can automatically connect to another node in the cluster.

 

This part of the drill will use load balancing technology, which is not detailed. This technology is a very basic technology in distributed systems. The software load balancing I use here is HAProxy (also not detailed, it is easy to find information).

 

The following walkthrough is based on the CentOS7 operating system

 

The first step is to build a multi-server node cluster

a. Prepare three CentOS servers

Three virtual machines are prepared here, and RabbitMQ is installed respectively. The IP addresses are as follows

Add dot hostname IP

Node 1 (primary node) bogon 192.168.115.136

Node 2 worker1 192.168.115.138

Node 3 worker2 192.168.115.139

 

In order to facilitate the use of host name communication, it is necessary to set the corresponding IP address of the host file on each node

sudo vim /etc/hosts

add the following

192.168.115.136 bogon
192.168.115.138 worker1
192.168.115.139 worker2

 

b. Build a cluster

The steps are similar to the method of " RabbitMQ Getting Started Recording --part 4-Node Cluster (Single Machine Multi-Node) ", but it is performed on different servers. The most important thing is to ensure that the same .erlang.cookie is used by each RabbitMQ service.

 

sync .erlang.cookie

.erlang.cookie

The file is located at /var/lib/rabbitmq/.erlang.cookie

The function of this file is similar to a kind of authentication information, which can be called a token. Different RabbitMQ services need to use the same token for normal communication. If multiple nodes are implemented on a single machine, the same file on the machine is used, so no setting is required.

 

Well, after a brief introduction to the .erlang.cookie file, then we need to copy the content of .erlang.cookie of node 1 to .erlang.cookie of node 2 and node 3.

 

Output file content, and then manually copy to other servers

sudo cat /var/lib/rabbitmq/.erlang.cookie

 

Open firewall ports on the master node

sudo firewall-cmd --add-port=25672/tcp --permanent

sudo firewall-cmd --add-port=5672/tcp --permanent

sudo firewall-cmd --add-port=4369/tcp --permanent

 

will join the cluster from the node

Execute the following command on node 2

sudo rabbitmqctl stop_app

sudo rabbitmqctl reset

sudo rabbitmqctl join_cluster rabbit@bogon

 

Start the node and view the cluster status

sudo rabbitmqctl start_app

sudo rabbitmqctl cluster_status

 

You should see output similar to the following

[{nodes,[{disc,[rabbit@bogon,rabbit@worker1]}]},
  {running_nodes,[rabbit@bogon,rabbit@worker1]},

 

然后依次在节点3执行重复执行上述命令,完成集群节点添加

 

注意:这里执行命令的时候都不需要指定哪个节点,因为我们这里默认是一个服务器一个节点

 

第二步使用HAProxy

接下来利用刚配好的集群,我们来设置HAProxy的配置。

 

安装HAProxy

话说HAProxy的网站已经打不开了,估计需要用科学上网才行,但是找个安装包还是很容易的。

在主节点下载并安装HAProxy

下载

http://www.rpmfind.net/linux/centos/7.4.1708/os/x86_64/Packages/haproxy-1.5.18-6.el7.x86_64.rpm

然后

yum -y install haproxy-1.5.18-6.el7.x86_64.rpm

 

HAProxy配置

新建文件haproxy_rabbitmq.cfg,内容如下

global
     log 127.0.0.1    local0 info
     maxconn 4096
     stats socket /tmp/haproxy.socket uid haproxy mode 770 level admin
     daemon

defaults
     log    global
     mode    tcp
     option    tcplog
     option    dontlognull
     retries    3
     option    redispatch
     maxconn    2000
     timeout connect    5s
     timeout    client    120s
     timeout    server    120s

listen    rabbitmq_local_cluster 192.168.115.136:5670
     mode tcp
     balance roundrobin
     server rabbit_1 192.168.115.136:5672 check inter 5000 rise 2 fall 3
     server rabbit_2 192.168.115.138:5672 check inter 5000 rise 2 fall 3
     server rabbit_3 192.168.115.139:5672 check inter 5000 rise 2 fall 3

listen    private_monitoring :8100
     mode http
     option httplog
     stats enable
     stats uri    /stats
     stats refresh    5s

大概描述一下上述配置

1.配置了站点集群配置rabbitmq_local_cluster,监听的端口号是5670

2.这个集群后面有三个服务器,分别对应的是我们集群的三个RabbitMQ服务节点

3.check inter 5000 rise 2 fall 3

   check inter 5000 检查后端服务是否可用的时间间隔为5秒

   rise 2                   一个失败节点如果有2次检查到是可用的,则重新标记为可用

   fall  3                   如果节点有3次不可用,那么会被标记为失败节点

4.listen    private_monitoring :8100

    通过浏览器访问8100端口可以查看HAProxy的状态信息

  

具体去查看HAProxy的文档。

 

然后开启8100端口

sudo firewall-cmd --add-port=8100/tcp –permanent

 

启动HAProxy

sudo haproxy -f haproxy_rabbitmq.cfg

 

打开浏览器访问http://192.168.115.136:8100/stats

能看到如下节点统计信息,可以看到目前三个节点都是在运行状态的

image

 

 

第三步使用节点集群

前面两步分别搭建了RabbitMQ的节点集群和利用HAproxy实现负载均衡,现在来看看如何在代码中使用节点集群。

 

在介绍程序之前,先明确节点失效重连的问题。节点集群主要的功能就是要实现高可用,在某个节点失效的情况下,还能继续提供服务。当客户端正在使用的节点失效时,对应的连接会失效,然后重新连接集群后会连接到另外一个可用的节点。所以之前定义的exchange,queue都不能确保还能重用,因此在失效转移之后,我们可以认为是重新连接到了一个全新的节点,涉及的exchange,queue和bindings都需要重新定义。

 

a.创建RabbitMQ用户

创建一个用于在客户端连接RabbitMQ集群的用户,之前默认的guest用户只能在服务器本机使用。

在主节点执行如下命令,创建用户admin,密码为123456,设置为管理员并且设置对默认的vhost有所有访问权(这里只是演示,通常情况下不会给这么广的权限)

sudo rabbitmqctl add_user admin 123456

sudo rabbitmqctl set_user_tags admin administrator

sudo rabbitmqctl set_permissions -p / admin ".*" ".*" ".*"

 

b.消费者代码

这里还是使用Python的实现,新建cluster_node_consumer.py,代码如下

import sys, json, pika, time, traceback


def msg_rcvd(channel, method, header, body):
     message = json.loads(body)
     print "Received: %(content)s/%(time)d" % message
     channel.basic_ack(delivery_tag=method.delivery_tag)


if __name__ == "__main__":
     AMQP_SERVER = sys.argv[1]
     AMQP_PORT = int(sys.argv[2])
     AMQP_USER = sys.argv[3]
     AMQP_PWD = sys.argv[4]

    creds_broker = pika.PlainCredentials(AMQP_USER, AMQP_PWD)
     conn_params = pika.ConnectionParameters(
         AMQP_SERVER,
         port=AMQP_PORT,
         virtual_host="/",
         credentials=creds_broker)

    while True:
         try:
             conn_broker = pika.BlockingConnection(conn_params)
             channel = conn_broker.channel()
             channel.exchange_declare(
                 exchange="cluster_test", exchange_type="direct", auto_delete=False)
             channel.queue_declare(queue="cluster_test", auto_delete=False)
             channel.queue_bind(
                 queue="cluster_test",
                 exchange="cluster_test",
                 routing_key="cluster_test")

            print "Ready for testing!"
             channel.basic_consume(msg_rcvd, queue="cluster_test", no_ack=False, consumer_tag="cluster_test")
             channel.start_consuming()
         except Exception, e:
             traceback.print_exc()

上述代码从命令行读取服务器的地址,端口,用户名和密码,然后创建连接。

最关键的是while True部分,这里使用了try catch捕获所有异常,确保在出现异常之后能重新创建连接。

重新连接之后,会重新执行创建queue,exchange和binding。

 

执行如下命令运行消费者程序

python cluster_node_consumer.py 192.168.115.136 5670 admin 123456

192.168.115.136 是HAproxy配置的对外地址

5670 是HAproxy配置的访问端口

admin 123456 分别是在主节点创建的RabbitMQ用户名和密码

 

输出

Ready for testing!

 

看到这输出并且没有其他异常信息时,说明已经连接上了集群。

这时候我们可以在HAproxy的状态统计站点http://192.168.115.136:8100/stats查看当前连接的是哪个节点。

查看Cur这一列,这里显示的节点当前的连接数,可以看到rabbit_3有一个连接,那么就是刚刚消费者程序创建的,并且目前正在使用。

 

image

 

这时候将rabbit_3停止,登录节点3,执行如下命令

sudo rabbitmqctl stop_app

 

那么在运行消费者程序的控制台就会看到如下输出

Traceback (most recent call last):
   File "cluster_node_consumer.py", line 37, in <module>
     channel.start_consuming()
   File "/home/junwen/anaconda2/lib/python2.7/site-packages/pika/adapters/blocking_connection.py", line 1780, in start_consuming
     self.connection.process_data_events(time_limit=None)
   File "/home/junwen/anaconda2/lib/python2.7/site-packages/pika/adapters/blocking_connection.py", line 707, in process_data_events
     self._flush_output(common_terminator)
   File "/home/junwen/anaconda2/lib/python2.7/site-packages/pika/adapters/blocking_connection.py", line 474, in _flush_output
     result.reason_text)
ConnectionClosed: (320, "CONNECTION_FORCED - broker forced connection closure with reason 'shutdown'")
Ready for testing!

可以看到一连串的异常信息输出,最后是Ready for testing!说明客户端连接被断开了,抛出了异常信息,然后又重新连接了。

 

我们再看看目前HAProxy的状态信息

image

可以看到rabbit_3目前是不可用状态。rabbit_1有一个连接,就是刚才程序连接失效之后重新创建的连接。

 

c.生产者代码

生产者的这端的代码比较简单,跟普通的调用没有区别,只是链接的地址是HAProxy的服务地址。

新建cluster_node_producer.py,代码如下

import sys, time, json, pika

AMQP_SERVER = sys.argv[1]
AMQP_PORT = int(sys.argv[2])
AMQP_USER = sys.argv[3]
AMQP_PWD = sys.argv[4]

creds_broker = pika.PlainCredentials(AMQP_USER, AMQP_PWD)
conn_params = pika.ConnectionParameters(
     AMQP_SERVER, port=AMQP_PORT, virtual_host="/", credentials=creds_broker)

conn_broker = pika.BlockingConnection(conn_params)

channel = conn_broker.channel()

msg = json.dumps({"content": "Cluster Test!", "time": time.time()})

msg_props = pika.BasicProperties(content_type="application/json")

channel.basic_publish(
     body=msg,
     exchange="cluster_test",
     properties=msg_props,
     routing_key="cluster_test")

print "Sent cluster test message."

上述代码中链接服务器部分跟消费者的代码一致,都是从命令行读取参数,然后创建连接。

创建连接之后,立即发布一个消息,消息内容也是写死的。

 

新开一个终端,运行生产者代码

python cluster_node_producer.py 192.168.115.136 5670 admin 123456

 

在消费者代码运行的那个终端可以看到如下输出

Received: Cluster Test!/1524405533

 

这里生产者的代码实际上只是演示可以连接到HAProxy服务,所以消息发送之后就会关闭连接。

完整的代码文件请参考https://github.com/shenba2014/RabbitMQ/tree/master/cluster

 

OK,到此跨服务器的节点集群演示完毕,后续研究一下如何解决多数据中心高可用的问题。

 

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324675990&siteId=291194637