RocketMQ series: Operation and maintenance of cluster node offline operation

Operation and maintenance scenarios

I have built a 3m-3s rocketmq cluster (reference: 3m-3s rocketmq construction ). Assuming that the machine limited to a set of master nodes fails and needs to be removed, how to perform smooth operation and maintenance?

For example, I built a 3m-3s broker cluster with the following architecture:

 

 

What if I want to remove the broker-a node from the cluster?

First of all, there are three points to be clear:

The first point: First, you must change broker-a into a state where new messages cannot be received (that is, new messages fall on broker-b and broker-c), but it cannot affect the placement of messages being processed.

The second point: After setting the first step, broker-as is its slave machine; the first step cannot affect the messages that have been placed before consumption from the slave.

The third point: After the removal of broker-a, the scale of the cluster becomes smaller, whether it can withstand the current production and consumption of TPS.

The third step can be assessed. Generally, the maximum TPS of the cluster is 2/3. For example, the current maximum TPS of the cluster is 10,000. The actual TPS of the production environment is recommended not to exceed 7000. If it exceeds 7000, the broker needs to be expanded. Clustered.

operating

First of all, we understand the perm attribute of topic, which can be viewed in the console:

Perm refers to the read and write attributes of topic:

6: Read and write

4: Read only

2: write only

After understanding perm, we can use perm to write articles.

step 1

In order to achieve the first point, as long as all topics on broker-a are set to read-only, then the message will not continue to be written to broker-a, so for all topics that have been created and associated with broker-a , The perm of these topics needs to be set to 4.

the way:

1. You can click the TOPIC sub-page of the console. In topic, click the "topic config" of each item, and modify the perm of broker-a to 4, as shown below:

  

2. You can also modify the permissions of the corresponding topic through mqadmin updateTopicPerm on the command line, as shown below:

Step 2

1. View broker-a write traffic

查看broker-a的master节点的InTPS,直到InTPS为0,说明该topic已经不再接受新的消息。

2. 查看broker-a-s的读出流量

观察broker-a-s上的OutTPS(消费),OutTPS也为0之后,说明消息理论上已经全部被消费完毕。

步骤3

查看broker-a上的consumer是否全部消费完毕,且没有diff(有diff说明有的落盘的消息没有被消费到,一般是客户端consumer有bug)

步骤4

确认上述后,为了保险起见,建议将broker-a(master和slave机器)分别保留3天,等三天后通过mqshutdown命令停止broker-a和broker-a-s

至此,摘除故障机器的操作就OK了

博主:测试生财

座右铭:专注测试与自动化,致力提高研发效能;通过测试精进完成原始积累,通过读书理财奔向财务自由。

csdn:https://blog.csdn.net/ccgshigao

博客园:https://www.cnblogs.com/qa-freeroad/

51cto:https://blog.51cto.com/14900374


Guess you like

Origin blog.51cto.com/14900374/2540010