Distributed topic-RabbitMQ02-RabbitMQ high availability of distributed message communication

Preface

In the previous chapters, we analyzed "Kafka Use and Principle Analysis" , "ActiveMQ Use and Principle Analysis" , and a preliminary understanding of RabbitMQ. In this section, we continue to talk about RabbitMQ.

About RabbitMQ, it is divided into two sections to explain

In this section, I will focus on the following points

  • Reliable delivery
  • Highly available architecture
  • Network partition
  • WAN synchronization solution
  • Summary of practical experience

Tips: at the end of this articlewelfareOh, don't miss it! In addition, the code demonstrated in this section has been uploaded to GitHub, and I will also send the address at the end of the article~

Reliable delivery

First of all, it needs to be clear that efficiency and reliability cannot be achieved at the same time. If you want to ensure that every link is successful, it will inevitably affect the efficiency of message sending and receiving.

If it is an occasion where the real-time consistency of the business is not particularly high, some reliability can be sacrificed in exchange for efficiency.
Insert picture description here
① On behalf of the message sent from the producer to the Exchange;

② On behalf of the message routing from Exchange to Queue;

③ The representative message is stored in the Queue;

④ On behalf of consumers, subscribe to Queue and consume messages.

1. Make sure that the message is sent to the RabbitMQ server

The failure may be caused by network or Broker problems, and the producer cannot know whether the message is sent to the Broker correctly.

There are two solutions, the first is the Transaction mode, and the second is the Confirm mode.

// 将channel设置成事务模式
channel.txSelect();

// 提交事务
channel.txCommit();

// 事务回滚
channel.txRollback();

After opening the transaction through the channel.txSelect method, we can publish a message to RabbitMQ. If the transaction is submitted successfully, the message must have arrived in RabbitMQ. If an exception is thrown due to RabbitMQ's abnormal crash or other reasons before the transaction is submitted and executed, this At that time, we can capture it, and then implement the transaction rollback by executing the channel.txRollback method. Using the transaction mechanism will "suck up" the performance of RabbitMQ, and it is generally not recommended.

// 将channel设置为Confirm模式 
channel.confirmSelect();

if (channel.waitForConfirms()) {
    
    
     // 消息发送成功
}

The producer sets the channel to confirm mode by calling the channel.confirmSelect method (that is, the Confirm.Select command). Once the message is delivered to all matching queues, RabbitMQ will send an acknowledgment (Basic.Ack) to the producer (containing the unique ID of the message), which makes the producer know that the message has reached the destination correctly.

Code demonstration address:
rabbitmq-demo/rabbitmq-javaapi/com.test.transaction
rabbitmq-demo/rabbitmq-javaapi/com.test.confirm

2. Make sure the message is routed to the correct queue

It may be because the routing keyword is wrong, or the queue does not exist, or the queue name is wrong, resulting in ② failure.
Using the mandatory parameter and ReturnListener, the message can be returned to the producer when the message cannot be routed.
Another way is to use an alternate-exchange. Unrouted messages will be sent to this exchange.

      // 在声明交换机的时候指定备份交换机
        Map<String,Object> arguments = new HashMap<String,Object>();
        arguments.put("alternate-exchange","ALTERNATE_EXCHANGE");
        channel.exchangeDeclare("TEST_EXCHANGE","topic", false, false, false, arguments);

Code demonstration address: rabbitmq-demo/rabbitmq-javaapi/com.test.returnlistener

3. Ensure that the message is stored correctly in the queue

The messages stored in the queue may be lost due to system downtime, restart, shutdown, etc., that is, there is a problem ③.

solution:

  1. Queue persistence
//String queue, boolean durable, boolean exclusive, boolean autoDelete, Map<String, Object> arguments

channel.queueDeclare(QUEUE_NAME, true, false, false, null);
  1. Switch persistence
// String exchange, boolean durable

channel.exchangeDeclare("MY_EXCHANGE","true");
  1. Information endurance
AMQP.BasicProperties properties = new AMQP.BasicProperties.Builder()
// 2代表持久化,其他代表瞬态
.deliveryMode(2)	
.build();
channel.basicPublish("", QUEUE_NAME, properties, msg.getBytes());
  1. Cluster, mirror queue, refer to the next section

4. Ensure that the message is correctly delivered from the queue to the consumer

If the consumer receives the message and can handle it in the future, an exception occurs, or an exception occurs during processing, which will cause ④ to fail.

In order to ensure that messages from the queue reliably reach consumers, RabbitMQ provides a message acknowledgement mechanism (message acknowledgement). Consumers can specify the autoAck parameter when subscribing to the queue. When autoAck is equal to false, RabbitMQ will wait for the consumer to explicitly reply to the confirmation signal before removing the message from the queue.

If the message consumption fails, you can also call Basic.Reject or Basic.Nack to reject the current message instead of confirming it. If the requeue parameter is set to true, the message can be re-stored in the queue to be sent to the next consumer (of course, when there is only one consumer, this method may have an infinite loop of repeated consumption, which can be delivered to In the new queue, or just print the exception log).

5. Consumer callback

After the consumer processes the message, he can send another message to the producer, or call the producer's API to inform that the message has been processed.

Reference: The receipt of asynchronous communication in the second-generation payment, multiple interactions. After a bill of lading APP sends a broken screen saver message, the consumer must call back the API.

6. Compensation mechanism

For messages that have not received a response for a certain period of time, a mechanism for timing retransmission can be set, but the number of retransmissions must be controlled, such as 3 retransmissions at most, otherwise it will cause the accumulation of messages.

Reference: When the ATM deposit is not answered, the confirmation is sent 5 times; when the ATM withdrawal is not answered, the confirmation is sent 5 times. Do a retransmission based on the status of the business table.

7. Message idempotence

The server does not have this kind of control and can only be controlled on the consumer side.

How to avoid repeated consumption of messages?

There may be two reasons for repeated messages:

  1. The problem of the producer, the link ① sends the message repeatedly, for example, the Confirm mode is turned on but the confirmation is not received.

  2. There was a problem in link ④. The message was delivered repeatedly because the consumer did not send an ACK or other reasons.

For repeated messages, a unique business ID can be generated for each message, and repetitive control can be done through logs or table creation.

Reference: The bank's reaccount control link.

8. The order of messages

The order of messages means that the order in which consumers consume messages is consistent with the order in which messages are generated by producers.

In RabbitMQ, when there are multiple consumers in a queue, because different consumers consume messages at different rates, the order cannot be guaranteed.

Reference: The message is in 1. New store 2. Bind product 3. Store activation. In this case, the message consumption order cannot be reversed.

Highly available architecture

  • High-availability architecture diagram of RabbitMQ based on Haproxy+Keepalived
    Insert picture description here
  • High-availability architecture diagram of RabbitMQ based on LVS load
    Insert picture description here

RabbitMQ cluster

The cluster is mainly used to achieve high availability and load balancing.

RabbitMQ uses /var/lib/rabbitmq/.erlang.cookie to verify identity, which needs to be consistent on all nodes.

There are two types of nodes in a cluster, one is a disk node and the other is a memory node. At least one disk node is required in the cluster to implement metadata persistence. If the type is not specified, the default is a disk node.

The cluster communicates in pairs through port 25672, and the firewall port needs to be opened.

It should be noted that the RabbitMQ cluster cannot be built on the WAN unless plug-ins such as federation or shovel are used.

Cluster configuration steps:

  1. Configure hosts The hosts of the
    three machines are configured with
    vi /etc/hosts

192.168.200.111 rabbit1 (disk node)
192.168.200.112 rabbit2 (memory node)
192.168.200.113 rabbit3 (memory node)

  1. Synchronize erlang.cookie to
    keep the .erlang.cookie of the three machines synchronized

/var/lib/rabbitmq/.erlang.cookie

Execute on the second machine 200.112:

scp .erlang.cookie [email protected]:/var/lib/rabbitmq/
chown rabbitmq:rabbitmq .erlang.cookie

Execute on the third machine 200.113:

scp .erlang.cookie [email protected]:/var/lib/rabbitmq/
chown rabbitmq:rabbitmq .erlang.cookie

Restart service

systemctl stop rabbitmq-server.service
systemctl start rabbitmq-server.service

or:

systemctl restart rabbitmq-server.service
View service status: systemctl status rabbitmq-server.service

如果启动报错:
Job for rabbitmq-server.service failed because the control process exited with error code. See “systemctl status rabbitmq-server.service” and “journalctl -xe” for details.

If it is because the service cannot be stopped, kill the port.

  1. Join the cluster

First open the cluster communication port:

# firewall-cmd --permanent --add-port={5672/tcp,4369/tcp,25672/tcp}
# firewall-cmd --reload
setsebool -P nis_enabled 1

Execute on the second machine 200.112 and the third machine 200.113:

rabbitmqctl stop_app
rabbitmqctl reset
rabbitmqctl join_cluster rabbit@rabbit1 --ram

Create user: all three servers execute

firewall-cmd --permanent --add-port=15672/tcp
firewall-cmd –-reload
rabbitmqctl add_user admin admin
rabbitmqctl set_user_tags admin administrator
rabbitmqctl set_permissions -p / admin “." ".” “.*”

  1. Highly available cluster

Send a message at any node, other nodes can receive the message.
Insert picture description here

RabbitMQ mirror queue

In cluster mode, queues and messages cannot be synchronized between nodes, so RabbitMQ's mirror queue mechanism needs to be used for synchronization.

Operation method Command or step
rabbitmqctl(Windows) rabbitmqctl set_policy ha-all “^ha.” “{”“ha-mode”":"“all”"}"
HTTP API PUT /api/policies/%2f/ha-all {“pattern”:"^ha.", “definition”:{“ha-mode”:“all”}}
Web UI Navigate to Admin> Policies> Add / update a policyName input: mirror_image Pattern input: ^ (means matching all) Definition click HAmode, and input on the right: all

As shown in the figure:
Insert picture description here
Reference:
Mirror queue of RabbitMQ

HAproxy load + Keepalived high availability

VIP is 192.168.200.1

1) Install Keepalived

yum -y install keepalived

2) Modify the configuration file

vim /etc/keepalived/keepalived.conf

Change the content to (physical network card and current host IP need to be modified):

global_defs {
    
    
   notification_email {
    
    
     [email protected]
     [email protected]
     [email protected]
   }
   notification_email_from [email protected]
   smtp_server 192.168.200.1
   smtp_connect_timeout 30
   router_id LVS_DEVEL
   vrrp_skip_check_adv_addr
   # vrrp_strict    # 注释掉,不然访问不到VIP
   vrrp_garp_interval 0
   vrrp_gna_interval 0
}
global_defs {
    
    
   notification_email {
    
    
     [email protected]
     [email protected]
     [email protected]
   }
   notification_email_from [email protected]
   smtp_server 192.168.200.1
   smtp_connect_timeout 30
   router_id LVS_DEVEL
   vrrp_skip_check_adv_addr
   # vrrp_strict    # 注释掉,不然访问不到VIP
   vrrp_garp_interval 0
   vrrp_gna_interval 0
}

# 检测任务
vrrp_script check_haproxy {
    
    
    # 检测HAProxy监本
    script "/etc/keepalived/script/check_haproxy.sh"
    # 每隔两秒检测
    interval 2
    # 权重
    weight 2
}

# 虚拟组
vrrp_instance haproxy {
    
    
    state MASTER # 此处为`主`,备机是 `BACKUP`【此处要修改】
    interface ens33 # 物理网卡,根据情况而定 【此处要修改】
    mcast_src_ip 192.168.200.111 # 当前主机ip 【此处要修改】
    virtual_router_id 51 # 虚拟路由id,同一个组内需要相同
    priority 100 # 主机的优先权要比备机高
    advert_int 1 # 心跳检查频率,单位:秒
    authentication {
    
     # 认证,组内的要相同
        auth_type PASS
        auth_pass 1111
    }
    # 调用脚本
    track_script {
    
    
        check_haproxy
    }
    # 虚拟ip,多个换行
    virtual_ipaddress {
    
    
        192.168.200.2
    }
}

3) Start keepalived

keepalived -D

Network partition

Why is there a partition? Because RabbitMQ is very sensitive to network latency, in order to ensure data consistency and performance, cluster nodes will be partitioned when network failures occur.

Insert picture description here

RabbitMQ Network Partitions

RabbitMQ Network Partitions processing strategy

Simulate RabbitMQ network partition

WAN synchronization solution

federation plugin

shovel plugin

Summary of practical experience

1. Configuration files and naming conventions

Centralized in the properties file

Reflect the metadata type (_VHOST _EXCHANGE _QUEUE);

Reflect the source and destination of the data (XXX_TO_XXX);

2. Call package

The Template can be further encapsulated to simplify the sending of messages.

3. Information storage + timing tasks

Store the message to be sent in the database, which can realize the traceability and repetition control of the message, and it needs to cooperate with the timing task to realize it.

4. Operation and maintenance monitoring

Reference:
zabbix series zabbix3.4 monitor rabbitmq

5. Plug-in

tracing
https://www.rabbitmq.com/plugins.html
Insert picture description here

6. How to reduce the number of connections

For sending combined messages, it is recommended that a single message should not exceed 4M (4096KB)

Thinking

Will consumer clusters or multiple instances of microservices receive messages repeatedly? Does the producer send the message first or register the business table first? (Example of payment error) Who creates the objects (exchange, queue, binding relationship)?

What's the problem with repeated creation?

Can persistent queues and non-persistent switches be bound? can

How to design an MQ service? http://www.xuxueli.com/xxl-mq/#/

postscript

For more architectural knowledge, please pay attention to this series of Java articles, address navigation : The growth path of Java architects

Guess you like

Origin blog.csdn.net/qq_34361283/article/details/105746846