RocketMQ (6) Best Practice

Best Practices


1 Producer

1.1 Notes for sending messages

1 Use of Tags

An application uses a topic as much as possible, and the message subtype can be identified by tags. The tags can be set freely by the application. Only the producer sets the tags when sending the message, and the consumer can use the tags to filter the message through the broker when subscribing to the message: message.setTags("TagA").

2 Use of Keys

The unique identification code of each message at the service level should be set to the keys field to facilitate the location of the message loss problem in the future. The server will create an index (hash index) for each message, and the application can query the content of the message by topic and key, and who consumes the message. Since it is a hash index, please ensure that the key is as unique as possible to avoid potential hash collisions.

   // 订单Id   
   String orderId = "20034568923546";   
   message.setKeys(orderId);   

3 log printing

If the message is sent successfully or failed, the message log must be printed, and the SendResult and key fields must be printed. As long as the send message method does not throw an exception, it means that the message was sent successfully. There are multiple statuses for successful sending, which are defined in sendResult. The following describes each state:

  • SEND_OK

The message was sent successfully. It should be noted that the success of the message does not mean that it is reliable. To ensure that no messages will be lost, you should also enable synchronous Master server or synchronous flashing, namely SYNC_MASTER or SYNC_FLUSH.

  • FLUSH_DISK_TIMEOUT

The message was sent successfully but the server flushing timed out. At this point, the message has entered the server queue (memory), and the message will only be lost if the server is down. In the message storage configuration parameters, you can set the flashing mode and the length of time for synchronous flashing. If the Broker server has set the flashing mode to be synchronous flashing, that is, FlushDiskType=SYNC_FLUSH (the default is asynchronous flashing mode), when the Broker server is not flashing synchronously If the flashing is completed within the flashing time (the default is 5s), it will return to this state-flashing timeout.

  • FLUSH_SLAVE_TIMEOUT

The message was sent successfully, but the server timed out when synchronizing to the Slave. At this point, the message has entered the server queue, and the message will be lost only if the server is down. If the role of the Broker server is the synchronous master, that is, SYNC_MASTER (the default is asynchronous master or ASYNC_MASTER), and the slave Broker server does not complete synchronization with the master server within the synchronous flashing time (the default is 5 seconds), it will return to this state— —Data synchronization to the Slave server timed out.

  • SLAVE_NOT_AVAILABLE

The message was sent successfully, but the slave is not available at this time. If the role of the Broker server is the synchronous Master, that is, SYNC_MASTER (the default is the asynchronous Master server, that is, ASYNC_MASTER), but there is no slave Broker server configured, it will return to this state-no slave server is available.

1.2 How to handle the failure of message sending

The send method of Producer itself supports internal retry, and the retry logic is as follows:

  • Up to 2 multiple attempts (2 times for synchronous transmission and 0 times for asynchronous transmission).
  • If the sending fails, then turn to the next Broker. The total time of this method does not exceed the value set by sendMsgTimeout, which is 10s by default.
  • If sending a message to the broker itself generates a timeout exception, it will not try again.

The above strategy also guarantees to a certain extent that the message can be sent successfully. If the business has high requirements for message reliability, it is recommended that the application add corresponding retry logic: for example, when the send synchronization method fails to send, try to store the message in the db, and then retry periodically by the background thread to ensure that the message must arrive at the Broker.

Why the above db retry method is not integrated into the MQ client, but requires the application to complete it, mainly based on the following considerations: First, the MQ client is designed in a stateless mode, which is convenient for arbitrary horizontal expansion, and The consumption of machine resources is only cpu, memory, and network. Secondly, if the MQ client integrates a KV storage module, the data can only be placed in a synchronous disk to be more reliable, and the performance overhead of the synchronous disk is relatively high, so asynchronous disk placement is usually used, and the application shutdown process is not affected by MQ operation. Maintenance personnel control, it may often happen that kill -9 is shut down in a violent way, resulting in data loss due to not being placed in a timely manner. Third, the reliability of the machine where the Producer is located is low, generally a virtual machine, which is not suitable for storing important data. In summary, it is recommended that the retry process be controlled by the application.

1.3 Choose oneway to send

Usually the sending of a message is such a process:

  • The client sends a request to the server
  • Server processing request
  • The server returns a response to the client

Therefore, the time-consuming time of a message sending is the sum of the above three steps, and some scenarios require very short time, but the reliability requirements are not high, such as log collection applications, which can be called in oneway form The oneway form only sends a request without waiting for a response, and the sending request is just an overhead of an operating system system call at the client implementation level, that is, writing data into the client's socket buffer. This process usually takes microseconds.

2 consumers

2.1 The consumption process is idempotent

RocketMQ cannot avoid message repetition (Exactly-Once), so if the business is very sensitive to consumption repetition, it must be deduplicated at the business level. You can use a relational database for deduplication. First, you need to determine the unique key of the message, which can be msgId or a unique identification field in the message content, such as order Id. Before consumption, determine whether the unique key exists in the relational database. If it does not exist, insert and consume, otherwise skip. (The actual process should consider the issue of atomicity to determine whether there is an attempt to insert. If the primary key conflict is reported, the insertion will fail and skip directly)

msgId must be a globally unique identifier, but in actual use, there may be two different msgIds for the same message (consumer retransmission, repetition caused by the client retransmission mechanism, etc.). This situation requires Make business fields repeated consumption.

2.2 Ways to deal with slow consumption

1 Improve consumption parallelism

Most of the message consumption behavior is IO-intensive, that is, it may be operating the database or calling RPC. The consumption speed of this type of consumption behavior lies in the throughput of the back-end database or external system. By increasing the consumption parallelism, the total can be improved. Consumption throughput, but the degree of parallelism increases to a certain extent, but it will decrease. Therefore, the application must set a reasonable degree of parallelism. There are several ways to modify the parallelism of consumption as follows:

  • Under the same ConsumerGroup, increase the number of Consumer instances to increase the degree of parallelism (note that Consumer instances exceeding the number of subscription queues are invalid). You can add a machine or start multiple processes on an existing machine.
  • Improve the consumption parallel threads of a single Consumer by modifying the parameters consumeThreadMin and consumeThreadMax.

2 Batch consumption

If some business processes support batch consumption, consumption throughput can be greatly improved. For example, order deduction applications, it takes 1 s to process one order at a time, and it may only take 2 s to process 10 orders at a time. This can greatly improve the throughput of consumption. By setting the consumerMessageBatchMaxSize parameter, the default is 1, that is, only one message is consumed at a time. For example, if it is set to N, the number of messages consumed each time is less than or equal to N.

3 Skip non-important messages

When message accumulation occurs, if the consumption speed has not kept up with the sending speed, if the business does not require high data, you can choose to discard unimportant messages. For example, when the number of messages in a certain queue accumulates to more than 100,000, try to discard some or all of the messages, so that you can quickly catch up with the speed of sending messages. The sample code is as follows:

    public ConsumeConcurrentlyStatus consumeMessage(
            List<MessageExt> msgs,
            ConsumeConcurrentlyContext context) {
        long offset = msgs.get(0).getQueueOffset();
        String maxOffset =
                msgs.get(0).getProperty(Message.PROPERTY_MAX_OFFSET);
        long diff = Long.parseLong(maxOffset) - offset;
        if (diff > 100000) {
            // TODO 消息堆积情况的特殊处理
            return ConsumeConcurrentlyStatus.CONSUME_SUCCESS;
        }
        // TODO 正常消费过程
        return ConsumeConcurrentlyStatus.CONSUME_SUCCESS;
    }    

4 Optimize the consumption process of each message

For example, the consumption process of a certain message is as follows:

  • Query from DB according to the message [Data 1]
  • Query from DB according to the message [Data 2]
  • Complex business calculations
  • Insert [Data 3] into DB
  • Insert [Data 4] into DB

There are 4 interactions with DB during the consumption process of this message. If it is calculated as 5ms each time, it will take a total of 20ms. Assuming that the business calculation takes 5ms, the total time will be 25ms, so if you can interact with the DB 4 times The optimization is 2 times, then the total time can be optimized to 15ms, that is, the overall performance is increased by 40%. Therefore, if the application is sensitive to delay, you can deploy the DB on the SSD hard disk. Compared with the SCSI disk, the RT of the former will be much smaller.

2.3 Consumption print log

If the amount of messages is small, it is recommended to print messages in the consumption entry method, which consumes time, etc., to facilitate subsequent troubleshooting.

   public ConsumeConcurrentlyStatus consumeMessage(
            List<MessageExt> msgs,
            ConsumeConcurrentlyContext context) {
        log.info("RECEIVE_MSG_BEGIN: " + msgs.toString());
        // TODO 正常消费过程
        return ConsumeConcurrentlyStatus.CONSUME_SUCCESS;
    }   

If you can print each message and consume time, it will be more convenient when troubleshooting online problems such as slow consumption.

2.4 Other consumer suggestions

1 About consumers and subscriptions

The first thing to note is that different consumer groups can consume some topics independently, and each consumer group has its own consumption offset. Please ensure that the subscription information of each consumer in the same group is consistent. .

2 About orderly messages

Consumers will lock each message queue to ensure that they are consumed one by one. Although this will cause performance degradation, it is useful when you care about the order of messages. We do not recommend throwing exceptions, you can return ConsumeOrderlyStatus.SUSPEND_CURRENT_QUEUE_A_MOMENT as an alternative.

3 About concurrent consumption

As the name implies, consumers will consume these messages concurrently. It is recommended that you use it for good performance. We do not recommend throwing exceptions. You can return ConsumeConcurrentlyStatus.RECONSUME_LATER instead.

4 Consume Status

For concurrent consumption listeners, you can return RECONSUME_LATER to notify the consumer that the message cannot be consumed now, and hope that it can be consumed again later. Then, you can continue to consume other messages. For an ordered message listener, because you care about its order, you cannot skip the message, but you can return SUSPEND_CURRENT_QUEUE_A_MOMENT to tell the consumer to wait for a while.

5 About Blocking

It is not recommended to block the listener, because it will block the thread pool and may eventually terminate the consumer process

6 About setting the number of threads

Consumers use ThreadPoolExecutor to consume messages internally, so you can change it by setting setConsumeThreadMin or setConsumeThreadMax.

7 About consumption sites

When creating a new consumer group, you need to decide whether you need to consume historical messages that already exist in Broker. CONSUME_FROM_LAST_OFFSET will ignore historical messages and consume any messages generated afterwards. CONSUME_FROM_FIRST_OFFSET will consume every information that exists in the Broker. You can also use CONSUME_FROM_TIMESTAMP to consume messages generated after the specified timestamp.

3 Broker

3.1 Broker role

Broker roles are divided into ASYNC_MASTER (asynchronous master), SYNC_MASTER (synchronous master) and SLAVE (slave). If the reliability of the message is stricter, you can use the SYNC_MASTER plus SLAVE deployment method. If the message reliability is not high, you can use ASYNC_MASTER plus SLAVE deployment mode. If it is only convenient for testing, you can choose to deploy only ASYNC_MASTER or SYNC_MASTER only.

3.2 FlushDiskType

SYNC_FLUSH (synchronous refresh) will lose a lot of performance compared to ASYNC_FLUSH (asynchronous processing), but it is also more reliable, so you need to make a trade-off based on the actual business scenario.

3.3 Broker configuration

parameter name Defaults Description
listenPort 10911 The listening port that accepts client connections
namesrvAddr null nameServer address
brokerIP1 InetAddress of the network card IP currently monitored by the broker
brokerIP2 Same as brokerIP1 When there is a master-slave broker, if the brokerIP2 attribute is configured on the broker master node, the broker slave node will connect to the brokerIP2 configured by the master node for synchronization
brokerName null the name of the broker
brokerClusterName DefaultCluster The name of the Cluser to which this broker belongs
brokerId 0 broker id, 0 means master, other positive integers mean slave
storePathCommitLog $HOME/store/commitlog/ Path to store commit log
storePathConsumerQueue $HOME/store/consumequeue/ The path to store the consume queue
mappedFileSizeCommitLog 1024 * 1024 * 1024(1G) Mapping file size of commit log
deleteWhen 04 At what time of day delete the commit log that has exceeded the file retention time
fileReservedTime 72 File retention time in hours
brokerRole ASYNC_MASTER SYNC_MASTER/ASYNC_MASTER/SLAVE
flushDiskType ASYNC_FLUSH SYNC_FLUSH/ASYNC_FLUSH The broker in SYNC_FLUSH mode guarantees to flush the message before receiving the confirmation from the producer. The broker in ASYNC_FLUSH mode uses the mode of flashing a group of messages to achieve better performance.

4 NameServer

In RocketMQ, Name Servers are designed for simple routing management. Its responsibilities include:

  • Brokers regularly register routing data with each name server.
  • The name server provides the latest routing information for clients, including producers, consumers and command line clients.

5 Client configuration

Compared with RocketMQ's Broker cluster, both producers and consumers are clients. This section mainly describes the public behavior configuration of producers and consumers.

5.1 Client addressing mode

RocketMQ can make the client find the Name Server, and then find the Broker through the Name Server. There are multiple configuration methods as shown below, the priority is from high to low, and the high priority will override the low priority.

  • Specify the Name Server address in the code, separate multiple namesrv addresses with semicolons
producer.setNamesrvAddr("192.168.0.1:9876;192.168.0.2:9876");  

consumer.setNamesrvAddr("192.168.0.1:9876;192.168.0.2:9876");
  • Specify the Name Server address in the Java startup parameters
-Drocketmq.namesrv.addr=192.168.0.1:9876;192.168.0.2:9876  
  • The environment variable specifies the Name Server address
export   NAMESRV_ADDR=192.168.0.1:9876;192.168.0.2:9876   
  • HTTP static server addressing (default)

After the client is started, it will periodically visit a static HTTP server, the address is as follows: http://jmenv.tbsite.net:8080/rocketmq/nsaddr, the return content of this URL is as follows:

192.168.0.1:9876;192.168.0.2:9876   

The client accesses this HTTP server every 2 minutes by default and updates the local Name Server address. The URL has been hard-coded in the code. You can change the server to be accessed by modifying the /etc/hosts file. For example, add the following configuration to /etc/hosts:

10.232.22.67    jmenv.taobao.net   

It is recommended to use the HTTP static server addressing method. The advantage is that the client deployment is simple and the Name Server cluster can be hot upgraded.

5.2 Client configuration

DefaultMQProducer, TransactionMQProducer, DefaultMQPushConsumer, and DefaultMQPullConsumer all inherit from the ClientConfig class, which is a public configuration class for the client. The configuration of the client is in the form of get and set. Each parameter can be configured with spring or in the code. For example, the parameter namesrvAddr can be configured like this, producer.setNamesrvAddr("192.168.0.1:9876"), other parameters Similarly.

1 Public configuration of the client

parameter name Defaults Description
namesrvAddr Name Server address list, multiple NameServer addresses are separated by semicolons
clientIP Local IP The client's local IP address, some machines will not recognize the client's IP address, it needs to be specified in the code forcibly
instanceName DEFAULT The name of the client instance. The multiple Producers and Consumers created by the client actually share an internal instance (this instance includes network connections, thread resources, etc.)
clientCallbackExecutorThreads 4 Number of asynchronous callback threads in the communication layer
pollNameServerInteval 30000 Interval of polling Name Server, in milliseconds
heartbeatBrokerInterval 30000 The interval between sending heartbeats to Broker, in milliseconds
persistConsumerOffsetInterval 5000 Persistent Consumer consumption progress interval, in milliseconds

2 Producer configuration

parameter name Defaults Description
producerGroup DEFAULT_PRODUCER Producer group name, if multiple Producers belong to an application and send the same message, they should be grouped into the same group
createTopicKey TBW102 When sending a message, a topic that does not exist on the server is automatically created, and a key needs to be specified. The key can be used to configure the default route of the topic where the message is sent.
defaultTopicQueueNums 4 The number of queues created by default when sending messages and automatically creating topics where the server does not exist
sendMsgTimeout 10000 Timeout of sending message, in milliseconds
compressMsgBodyOverHowmuch 4096 When the message body exceeds the size to start compression (Consumer will automatically decompress the message when it receives it), in bytes
retryAnotherBrokerWhenNotStoreOK FALSE If sending a message returns sendResult, but sendStatus!=SEND_OK, whether to retry sending
retryTimesWhenSendFailed 2 If the message fails to be sent, the maximum number of retries, this parameter only works in synchronous sending mode
maxMessageSize 4MB The message size limited by the client exceeds the error, and the server will also limit it, so it needs to be used in conjunction with the server.
transactionCheckListener Transaction message back check listener, if sending transaction message, it must be set
checkThreadPoolMinSize 1 The minimum number of threads in the thread pool when the Broker checks the producer transaction status
checkThreadPoolMaxSize 1 The maximum number of threads in the thread pool when the Broker checks the Producer transaction status
checkRequestHoldMax 2000 When the Broker checks the Producer transaction status, the Producer local buffers the request queue size
RPCHook null 该参数是在Producer创建时传入的,包含消息发送前的预处理和消息响应后的处理两个接口,用户可以在第一个接口中做一些安全控制或者其他操作。

3 PushConsumer配置

参数名 默认值 说明
consumerGroup DEFAULT_CONSUMER Consumer组名,多个Consumer如果属于一个应用,订阅同样的消息,且消费逻辑一致,则应该将它们归为同一组
messageModel CLUSTERING 消费模型支持集群消费和广播消费两种
consumeFromWhere CONSUME_FROM_LAST_OFFSET Consumer启动后,默认从上次消费的位置开始消费,这包含两种情况:一种是上次消费的位置未过期,则消费从上次中止的位置进行;一种是上次消费位置已经过期,则从当前队列第一条消息开始消费
consumeTimestamp 半个小时前 只有当consumeFromWhere值为CONSUME_FROM_TIMESTAMP时才起作用。
allocateMessageQueueStrategy AllocateMessageQueueAveragely Rebalance算法实现策略
subscription 订阅关系
messageListener 消息监听器
offsetStore 消费进度存储
consumeThreadMin 10 消费线程池最小线程数
consumeThreadMax 20 消费线程池最大线程数
consumeConcurrentlyMaxSpan 2000 单队列并行消费允许的最大跨度
pullThresholdForQueue 1000 拉消息本地队列缓存消息最大数
pullInterval 0 拉消息间隔,由于是长轮询,所以为0,但是如果应用为了流控,也可以设置大于0的值,单位毫秒
consumeMessageBatchMaxSize 1 批量消费,一次消费多少条消息
pullBatchSize 32 批量拉消息,一次最多拉多少条

4 PullConsumer配置

参数名 默认值 说明
consumerGroup DEFAULT_CONSUMER Consumer组名,多个Consumer如果属于一个应用,订阅同样的消息,且消费逻辑一致,则应该将它们归为同一组
brokerSuspendMaxTimeMillis 20000 长轮询,Consumer拉消息请求在Broker挂起最长时间,单位毫秒
consumerTimeoutMillisWhenSuspend 30000 长轮询,Consumer拉消息请求在Broker挂起超过指定时间,客户端认为超时,单位毫秒
consumerPullTimeoutMillis 10000 非长轮询,拉消息超时时间,单位毫秒
messageModel BROADCASTING 消息支持两种模式:集群消费和广播消费
messageQueueListener 监听队列变化
offsetStore 消费进度存储
registerTopics 注册的topic集合
allocateMessageQueueStrategy AllocateMessageQueueAveragely Rebalance算法实现策略

5 Message数据结构

字段名 默认值 说明
Topic null 必填,消息所属topic的名称
Body null 必填,消息体
Tags null 选填,消息标签,方便服务器过滤使用。目前只支持每个消息设置一个tag
Keys null 选填,代表这条消息的业务关键词,服务器会根据keys创建哈希索引,设置后,可以在Console系统根据Topic、Keys来查询消息,由于是哈希索引,请尽可能保证key唯一,例如订单号,商品Id等。
Flag 0 选填,完全由应用来设置,RocketMQ不做干预
DelayTimeLevel 0 选填,消息延时级别,0表示不延时,大于0会延时特定的时间才会被消费
WaitStoreMsgOK TRUE 选填,表示消息是否在服务器落盘后才返回应答。

6 系统配置

本小节主要介绍系统(JVM/OS)相关的配置。

6.1 JVM选项

推荐使用最新发布的JDK 1.8版本。通过设置相同的Xms和Xmx值来防止JVM调整堆大小以获得更好的性能。简单的JVM配置如下所示:
​ ​-server -Xms8g -Xmx8g -Xmn4g ​​​
如果您不关心RocketMQ Broker的启动时间,还有一种更好的选择,就是通过“预触摸”Java堆以确保在JVM初始化期间每个页面都将被分配。那些不关心启动时间的人可以启用它:​ -XX:+AlwaysPreTouch
禁用偏置锁定可能会减少JVM暂停,​ -XX:-UseBiasedLocking
至于垃圾回收,建议使用带JDK 1.8的G1收集器。

-XX:+UseG1GC -XX:G1HeapRegionSize=16m   
-XX:G1ReservePercent=25 
-XX:InitiatingHeapOccupancyPercent=30

这些GC选项看起来有点激进,但事实证明它在我们的生产环境中具有良好的性能。另外不要把-XX:MaxGCPauseMillis的值设置太小,否则JVM将使用一个小的年轻代来实现这个目标,这将导致非常频繁的minor GC,所以建议使用rolling GC日志文件:

-XX:+UseGCLogFileRotation   
-XX:NumberOfGCLogFiles=5 
-XX:GCLogFileSize=30m

如果写入GC文件会增加代理的延迟,可以考虑将GC日志文件重定向到内存文件系统:

-Xloggc:/dev/shm/mq_gc_%p.log123   

6.2 Linux内核参数

os.sh脚本在bin文件夹中列出了许多内核参数,可以进行微小的更改然后用于生产用途。下面的参数需要注意,更多细节请参考/proc/sys/vm/*的文档

  • vm.extra_free_kbytes,告诉VM在后台回收(kswapd)启动的阈值与直接回收(通过分配进程)的阈值之间保留额外的可用内存。RocketMQ使用此参数来避免内存分配中的长延迟。(与具体内核版本相关)
  • vm.min_free_kbytes,如果将其设置为低于1024KB,将会巧妙的将系统破坏,并且系统在高负载下容易出现死锁。
  • vm.max_map_count,限制一个进程可能具有的最大内存映射区域数。RocketMQ将使用mmap加载CommitLog和ConsumeQueue,因此建议将为此参数设置较大的值。(agressiveness --> aggressiveness)
  • vm.swappiness,定义内核交换内存页面的积极程度。较高的值会增加攻击性,较低的值会减少交换量。建议将值设置为10来避免交换延迟。
  • File descriptor limits,RocketMQ需要为文件(CommitLog和ConsumeQueue)和网络连接打开文件描述符。我们建议设置文件描述符的值为655350。
  • Disk scheduler,RocketMQ建议使用I/O截止时间调度器,它试图为请求提供有保证的延迟。

Guess you like

Origin blog.csdn.net/shang_xs/article/details/110492364