Best Practices
1 Producer
1.1 Notes for sending messages
1 Use of Tags
An application uses a topic as much as possible, and the message subtype can be identified by tags. The tags can be set freely by the application. Only the producer sets the tags when sending the message, and the consumer can use the tags to filter the message through the broker when subscribing to the message: message.setTags("TagA").
2 Use of Keys
The unique identification code of each message at the service level should be set to the keys field to facilitate the location of the message loss problem in the future. The server will create an index (hash index) for each message, and the application can query the content of the message by topic and key, and who consumes the message. Since it is a hash index, please ensure that the key is as unique as possible to avoid potential hash collisions.
// 订单Id
String orderId = "20034568923546";
message.setKeys(orderId);
3 log printing
If the message is sent successfully or failed, the message log must be printed, and the SendResult and key fields must be printed. As long as the send message method does not throw an exception, it means that the message was sent successfully. There are multiple statuses for successful sending, which are defined in sendResult. The following describes each state:
- SEND_OK
The message was sent successfully. It should be noted that the success of the message does not mean that it is reliable. To ensure that no messages will be lost, you should also enable synchronous Master server or synchronous flashing, namely SYNC_MASTER or SYNC_FLUSH.
- FLUSH_DISK_TIMEOUT
The message was sent successfully but the server flushing timed out. At this point, the message has entered the server queue (memory), and the message will only be lost if the server is down. In the message storage configuration parameters, you can set the flashing mode and the length of time for synchronous flashing. If the Broker server has set the flashing mode to be synchronous flashing, that is, FlushDiskType=SYNC_FLUSH (the default is asynchronous flashing mode), when the Broker server is not flashing synchronously If the flashing is completed within the flashing time (the default is 5s), it will return to this state-flashing timeout.
- FLUSH_SLAVE_TIMEOUT
The message was sent successfully, but the server timed out when synchronizing to the Slave. At this point, the message has entered the server queue, and the message will be lost only if the server is down. If the role of the Broker server is the synchronous master, that is, SYNC_MASTER (the default is asynchronous master or ASYNC_MASTER), and the slave Broker server does not complete synchronization with the master server within the synchronous flashing time (the default is 5 seconds), it will return to this state— —Data synchronization to the Slave server timed out.
- SLAVE_NOT_AVAILABLE
The message was sent successfully, but the slave is not available at this time. If the role of the Broker server is the synchronous Master, that is, SYNC_MASTER (the default is the asynchronous Master server, that is, ASYNC_MASTER), but there is no slave Broker server configured, it will return to this state-no slave server is available.
1.2 How to handle the failure of message sending
The send method of Producer itself supports internal retry, and the retry logic is as follows:
- Up to 2 multiple attempts (2 times for synchronous transmission and 0 times for asynchronous transmission).
- If the sending fails, then turn to the next Broker. The total time of this method does not exceed the value set by sendMsgTimeout, which is 10s by default.
- If sending a message to the broker itself generates a timeout exception, it will not try again.
The above strategy also guarantees to a certain extent that the message can be sent successfully. If the business has high requirements for message reliability, it is recommended that the application add corresponding retry logic: for example, when the send synchronization method fails to send, try to store the message in the db, and then retry periodically by the background thread to ensure that the message must arrive at the Broker.
Why the above db retry method is not integrated into the MQ client, but requires the application to complete it, mainly based on the following considerations: First, the MQ client is designed in a stateless mode, which is convenient for arbitrary horizontal expansion, and The consumption of machine resources is only cpu, memory, and network. Secondly, if the MQ client integrates a KV storage module, the data can only be placed in a synchronous disk to be more reliable, and the performance overhead of the synchronous disk is relatively high, so asynchronous disk placement is usually used, and the application shutdown process is not affected by MQ operation. Maintenance personnel control, it may often happen that kill -9 is shut down in a violent way, resulting in data loss due to not being placed in a timely manner. Third, the reliability of the machine where the Producer is located is low, generally a virtual machine, which is not suitable for storing important data. In summary, it is recommended that the retry process be controlled by the application.
1.3 Choose oneway to send
Usually the sending of a message is such a process:
- The client sends a request to the server
- Server processing request
- The server returns a response to the client
Therefore, the time-consuming time of a message sending is the sum of the above three steps, and some scenarios require very short time, but the reliability requirements are not high, such as log collection applications, which can be called in oneway form The oneway form only sends a request without waiting for a response, and the sending request is just an overhead of an operating system system call at the client implementation level, that is, writing data into the client's socket buffer. This process usually takes microseconds.
2 consumers
2.1 The consumption process is idempotent
RocketMQ cannot avoid message repetition (Exactly-Once), so if the business is very sensitive to consumption repetition, it must be deduplicated at the business level. You can use a relational database for deduplication. First, you need to determine the unique key of the message, which can be msgId or a unique identification field in the message content, such as order Id. Before consumption, determine whether the unique key exists in the relational database. If it does not exist, insert and consume, otherwise skip. (The actual process should consider the issue of atomicity to determine whether there is an attempt to insert. If the primary key conflict is reported, the insertion will fail and skip directly)
msgId must be a globally unique identifier, but in actual use, there may be two different msgIds for the same message (consumer retransmission, repetition caused by the client retransmission mechanism, etc.). This situation requires Make business fields repeated consumption.
2.2 Ways to deal with slow consumption
1 Improve consumption parallelism
Most of the message consumption behavior is IO-intensive, that is, it may be operating the database or calling RPC. The consumption speed of this type of consumption behavior lies in the throughput of the back-end database or external system. By increasing the consumption parallelism, the total can be improved. Consumption throughput, but the degree of parallelism increases to a certain extent, but it will decrease. Therefore, the application must set a reasonable degree of parallelism. There are several ways to modify the parallelism of consumption as follows:
- Under the same ConsumerGroup, increase the number of Consumer instances to increase the degree of parallelism (note that Consumer instances exceeding the number of subscription queues are invalid). You can add a machine or start multiple processes on an existing machine.
- Improve the consumption parallel threads of a single Consumer by modifying the parameters consumeThreadMin and consumeThreadMax.
2 Batch consumption
If some business processes support batch consumption, consumption throughput can be greatly improved. For example, order deduction applications, it takes 1 s to process one order at a time, and it may only take 2 s to process 10 orders at a time. This can greatly improve the throughput of consumption. By setting the consumerMessageBatchMaxSize parameter, the default is 1, that is, only one message is consumed at a time. For example, if it is set to N, the number of messages consumed each time is less than or equal to N.
3 Skip non-important messages
When message accumulation occurs, if the consumption speed has not kept up with the sending speed, if the business does not require high data, you can choose to discard unimportant messages. For example, when the number of messages in a certain queue accumulates to more than 100,000, try to discard some or all of the messages, so that you can quickly catch up with the speed of sending messages. The sample code is as follows:
public ConsumeConcurrentlyStatus consumeMessage(
List<MessageExt> msgs,
ConsumeConcurrentlyContext context) {
long offset = msgs.get(0).getQueueOffset();
String maxOffset =
msgs.get(0).getProperty(Message.PROPERTY_MAX_OFFSET);
long diff = Long.parseLong(maxOffset) - offset;
if (diff > 100000) {
// TODO 消息堆积情况的特殊处理
return ConsumeConcurrentlyStatus.CONSUME_SUCCESS;
}
// TODO 正常消费过程
return ConsumeConcurrentlyStatus.CONSUME_SUCCESS;
}
4 Optimize the consumption process of each message
For example, the consumption process of a certain message is as follows:
- Query from DB according to the message [Data 1]
- Query from DB according to the message [Data 2]
- Complex business calculations
- Insert [Data 3] into DB
- Insert [Data 4] into DB
There are 4 interactions with DB during the consumption process of this message. If it is calculated as 5ms each time, it will take a total of 20ms. Assuming that the business calculation takes 5ms, the total time will be 25ms, so if you can interact with the DB 4 times The optimization is 2 times, then the total time can be optimized to 15ms, that is, the overall performance is increased by 40%. Therefore, if the application is sensitive to delay, you can deploy the DB on the SSD hard disk. Compared with the SCSI disk, the RT of the former will be much smaller.
2.3 Consumption print log
If the amount of messages is small, it is recommended to print messages in the consumption entry method, which consumes time, etc., to facilitate subsequent troubleshooting.
public ConsumeConcurrentlyStatus consumeMessage(
List<MessageExt> msgs,
ConsumeConcurrentlyContext context) {
log.info("RECEIVE_MSG_BEGIN: " + msgs.toString());
// TODO 正常消费过程
return ConsumeConcurrentlyStatus.CONSUME_SUCCESS;
}
If you can print each message and consume time, it will be more convenient when troubleshooting online problems such as slow consumption.
2.4 Other consumer suggestions
1 About consumers and subscriptions
The first thing to note is that different consumer groups can consume some topics independently, and each consumer group has its own consumption offset. Please ensure that the subscription information of each consumer in the same group is consistent. .
2 About orderly messages
Consumers will lock each message queue to ensure that they are consumed one by one. Although this will cause performance degradation, it is useful when you care about the order of messages. We do not recommend throwing exceptions, you can return ConsumeOrderlyStatus.SUSPEND_CURRENT_QUEUE_A_MOMENT as an alternative.
3 About concurrent consumption
As the name implies, consumers will consume these messages concurrently. It is recommended that you use it for good performance. We do not recommend throwing exceptions. You can return ConsumeConcurrentlyStatus.RECONSUME_LATER instead.
4 Consume Status
For concurrent consumption listeners, you can return RECONSUME_LATER to notify the consumer that the message cannot be consumed now, and hope that it can be consumed again later. Then, you can continue to consume other messages. For an ordered message listener, because you care about its order, you cannot skip the message, but you can return SUSPEND_CURRENT_QUEUE_A_MOMENT to tell the consumer to wait for a while.
5 About Blocking
It is not recommended to block the listener, because it will block the thread pool and may eventually terminate the consumer process
6 About setting the number of threads
Consumers use ThreadPoolExecutor to consume messages internally, so you can change it by setting setConsumeThreadMin or setConsumeThreadMax.
7 About consumption sites
When creating a new consumer group, you need to decide whether you need to consume historical messages that already exist in Broker. CONSUME_FROM_LAST_OFFSET will ignore historical messages and consume any messages generated afterwards. CONSUME_FROM_FIRST_OFFSET will consume every information that exists in the Broker. You can also use CONSUME_FROM_TIMESTAMP to consume messages generated after the specified timestamp.
3 Broker
3.1 Broker role
Broker roles are divided into ASYNC_MASTER (asynchronous master), SYNC_MASTER (synchronous master) and SLAVE (slave). If the reliability of the message is stricter, you can use the SYNC_MASTER plus SLAVE deployment method. If the message reliability is not high, you can use ASYNC_MASTER plus SLAVE deployment mode. If it is only convenient for testing, you can choose to deploy only ASYNC_MASTER or SYNC_MASTER only.
3.2 FlushDiskType
SYNC_FLUSH (synchronous refresh) will lose a lot of performance compared to ASYNC_FLUSH (asynchronous processing), but it is also more reliable, so you need to make a trade-off based on the actual business scenario.
3.3 Broker configuration
parameter name | Defaults | Description |
---|---|---|
listenPort | 10911 | The listening port that accepts client connections |
namesrvAddr | null | nameServer address |
brokerIP1 | InetAddress of the network card | IP currently monitored by the broker |
brokerIP2 | Same as brokerIP1 | When there is a master-slave broker, if the brokerIP2 attribute is configured on the broker master node, the broker slave node will connect to the brokerIP2 configured by the master node for synchronization |
brokerName | null | the name of the broker |
brokerClusterName | DefaultCluster | The name of the Cluser to which this broker belongs |
brokerId | 0 | broker id, 0 means master, other positive integers mean slave |
storePathCommitLog | $HOME/store/commitlog/ | Path to store commit log |
storePathConsumerQueue | $HOME/store/consumequeue/ | The path to store the consume queue |
mappedFileSizeCommitLog | 1024 * 1024 * 1024(1G) | Mapping file size of commit log |
deleteWhen | 04 | At what time of day delete the commit log that has exceeded the file retention time |
fileReservedTime | 72 | File retention time in hours |
brokerRole | ASYNC_MASTER | SYNC_MASTER/ASYNC_MASTER/SLAVE |
flushDiskType | ASYNC_FLUSH | SYNC_FLUSH/ASYNC_FLUSH The broker in SYNC_FLUSH mode guarantees to flush the message before receiving the confirmation from the producer. The broker in ASYNC_FLUSH mode uses the mode of flashing a group of messages to achieve better performance. |
4 NameServer
In RocketMQ, Name Servers are designed for simple routing management. Its responsibilities include:
- Brokers regularly register routing data with each name server.
- The name server provides the latest routing information for clients, including producers, consumers and command line clients.
5 Client configuration
Compared with RocketMQ's Broker cluster, both producers and consumers are clients. This section mainly describes the public behavior configuration of producers and consumers.
5.1 Client addressing mode
RocketMQ can make the client find the Name Server, and then find the Broker through the Name Server. There are multiple configuration methods as shown below, the priority is from high to low, and the high priority will override the low priority.
- Specify the Name Server address in the code, separate multiple namesrv addresses with semicolons
producer.setNamesrvAddr("192.168.0.1:9876;192.168.0.2:9876");
consumer.setNamesrvAddr("192.168.0.1:9876;192.168.0.2:9876");
- Specify the Name Server address in the Java startup parameters
-Drocketmq.namesrv.addr=192.168.0.1:9876;192.168.0.2:9876
- The environment variable specifies the Name Server address
export NAMESRV_ADDR=192.168.0.1:9876;192.168.0.2:9876
- HTTP static server addressing (default)
After the client is started, it will periodically visit a static HTTP server, the address is as follows: http://jmenv.tbsite.net:8080/rocketmq/nsaddr, the return content of this URL is as follows:
192.168.0.1:9876;192.168.0.2:9876
The client accesses this HTTP server every 2 minutes by default and updates the local Name Server address. The URL has been hard-coded in the code. You can change the server to be accessed by modifying the /etc/hosts file. For example, add the following configuration to /etc/hosts:
10.232.22.67 jmenv.taobao.net
It is recommended to use the HTTP static server addressing method. The advantage is that the client deployment is simple and the Name Server cluster can be hot upgraded.
5.2 Client configuration
DefaultMQProducer, TransactionMQProducer, DefaultMQPushConsumer, and DefaultMQPullConsumer all inherit from the ClientConfig class, which is a public configuration class for the client. The configuration of the client is in the form of get and set. Each parameter can be configured with spring or in the code. For example, the parameter namesrvAddr can be configured like this, producer.setNamesrvAddr("192.168.0.1:9876"), other parameters Similarly.
1 Public configuration of the client
parameter name | Defaults | Description |
---|---|---|
namesrvAddr | Name Server address list, multiple NameServer addresses are separated by semicolons | |
clientIP | Local IP | The client's local IP address, some machines will not recognize the client's IP address, it needs to be specified in the code forcibly |
instanceName | DEFAULT | The name of the client instance. The multiple Producers and Consumers created by the client actually share an internal instance (this instance includes network connections, thread resources, etc.) |
clientCallbackExecutorThreads | 4 | Number of asynchronous callback threads in the communication layer |
pollNameServerInteval | 30000 | Interval of polling Name Server, in milliseconds |
heartbeatBrokerInterval | 30000 | The interval between sending heartbeats to Broker, in milliseconds |
persistConsumerOffsetInterval | 5000 | Persistent Consumer consumption progress interval, in milliseconds |
2 Producer configuration
parameter name | Defaults | Description |
---|---|---|
producerGroup | DEFAULT_PRODUCER | Producer group name, if multiple Producers belong to an application and send the same message, they should be grouped into the same group |
createTopicKey | TBW102 | When sending a message, a topic that does not exist on the server is automatically created, and a key needs to be specified. The key can be used to configure the default route of the topic where the message is sent. |
defaultTopicQueueNums | 4 | The number of queues created by default when sending messages and automatically creating topics where the server does not exist |
sendMsgTimeout | 10000 | Timeout of sending message, in milliseconds |
compressMsgBodyOverHowmuch | 4096 | When the message body exceeds the size to start compression (Consumer will automatically decompress the message when it receives it), in bytes |
retryAnotherBrokerWhenNotStoreOK | FALSE | If sending a message returns sendResult, but sendStatus!=SEND_OK, whether to retry sending |
retryTimesWhenSendFailed | 2 | If the message fails to be sent, the maximum number of retries, this parameter only works in synchronous sending mode |
maxMessageSize | 4MB | The message size limited by the client exceeds the error, and the server will also limit it, so it needs to be used in conjunction with the server. |
transactionCheckListener | Transaction message back check listener, if sending transaction message, it must be set | |
checkThreadPoolMinSize | 1 | The minimum number of threads in the thread pool when the Broker checks the producer transaction status |
checkThreadPoolMaxSize | 1 | The maximum number of threads in the thread pool when the Broker checks the Producer transaction status |
checkRequestHoldMax | 2000 | When the Broker checks the Producer transaction status, the Producer local buffers the request queue size |
RPCHook | null | 该参数是在Producer创建时传入的,包含消息发送前的预处理和消息响应后的处理两个接口,用户可以在第一个接口中做一些安全控制或者其他操作。 |
3 PushConsumer配置
参数名 | 默认值 | 说明 |
---|---|---|
consumerGroup | DEFAULT_CONSUMER | Consumer组名,多个Consumer如果属于一个应用,订阅同样的消息,且消费逻辑一致,则应该将它们归为同一组 |
messageModel | CLUSTERING | 消费模型支持集群消费和广播消费两种 |
consumeFromWhere | CONSUME_FROM_LAST_OFFSET | Consumer启动后,默认从上次消费的位置开始消费,这包含两种情况:一种是上次消费的位置未过期,则消费从上次中止的位置进行;一种是上次消费位置已经过期,则从当前队列第一条消息开始消费 |
consumeTimestamp | 半个小时前 | 只有当consumeFromWhere值为CONSUME_FROM_TIMESTAMP时才起作用。 |
allocateMessageQueueStrategy | AllocateMessageQueueAveragely | Rebalance算法实现策略 |
subscription | 订阅关系 | |
messageListener | 消息监听器 | |
offsetStore | 消费进度存储 | |
consumeThreadMin | 10 | 消费线程池最小线程数 |
consumeThreadMax | 20 | 消费线程池最大线程数 |
consumeConcurrentlyMaxSpan | 2000 | 单队列并行消费允许的最大跨度 |
pullThresholdForQueue | 1000 | 拉消息本地队列缓存消息最大数 |
pullInterval | 0 | 拉消息间隔,由于是长轮询,所以为0,但是如果应用为了流控,也可以设置大于0的值,单位毫秒 |
consumeMessageBatchMaxSize | 1 | 批量消费,一次消费多少条消息 |
pullBatchSize | 32 | 批量拉消息,一次最多拉多少条 |
4 PullConsumer配置
参数名 | 默认值 | 说明 |
---|---|---|
consumerGroup | DEFAULT_CONSUMER | Consumer组名,多个Consumer如果属于一个应用,订阅同样的消息,且消费逻辑一致,则应该将它们归为同一组 |
brokerSuspendMaxTimeMillis | 20000 | 长轮询,Consumer拉消息请求在Broker挂起最长时间,单位毫秒 |
consumerTimeoutMillisWhenSuspend | 30000 | 长轮询,Consumer拉消息请求在Broker挂起超过指定时间,客户端认为超时,单位毫秒 |
consumerPullTimeoutMillis | 10000 | 非长轮询,拉消息超时时间,单位毫秒 |
messageModel | BROADCASTING | 消息支持两种模式:集群消费和广播消费 |
messageQueueListener | 监听队列变化 | |
offsetStore | 消费进度存储 | |
registerTopics | 注册的topic集合 | |
allocateMessageQueueStrategy | AllocateMessageQueueAveragely | Rebalance算法实现策略 |
5 Message数据结构
字段名 | 默认值 | 说明 |
---|---|---|
Topic | null | 必填,消息所属topic的名称 |
Body | null | 必填,消息体 |
Tags | null | 选填,消息标签,方便服务器过滤使用。目前只支持每个消息设置一个tag |
Keys | null | 选填,代表这条消息的业务关键词,服务器会根据keys创建哈希索引,设置后,可以在Console系统根据Topic、Keys来查询消息,由于是哈希索引,请尽可能保证key唯一,例如订单号,商品Id等。 |
Flag | 0 | 选填,完全由应用来设置,RocketMQ不做干预 |
DelayTimeLevel | 0 | 选填,消息延时级别,0表示不延时,大于0会延时特定的时间才会被消费 |
WaitStoreMsgOK | TRUE | 选填,表示消息是否在服务器落盘后才返回应答。 |
6 系统配置
本小节主要介绍系统(JVM/OS)相关的配置。
6.1 JVM选项
推荐使用最新发布的JDK 1.8版本。通过设置相同的Xms和Xmx值来防止JVM调整堆大小以获得更好的性能。简单的JVM配置如下所示:
-server -Xms8g -Xmx8g -Xmn4g
如果您不关心RocketMQ Broker的启动时间,还有一种更好的选择,就是通过“预触摸”Java堆以确保在JVM初始化期间每个页面都将被分配。那些不关心启动时间的人可以启用它: -XX:+AlwaysPreTouch
禁用偏置锁定可能会减少JVM暂停, -XX:-UseBiasedLocking
至于垃圾回收,建议使用带JDK 1.8的G1收集器。
-XX:+UseG1GC -XX:G1HeapRegionSize=16m
-XX:G1ReservePercent=25
-XX:InitiatingHeapOccupancyPercent=30
这些GC选项看起来有点激进,但事实证明它在我们的生产环境中具有良好的性能。另外不要把-XX:MaxGCPauseMillis的值设置太小,否则JVM将使用一个小的年轻代来实现这个目标,这将导致非常频繁的minor GC,所以建议使用rolling GC日志文件:
-XX:+UseGCLogFileRotation
-XX:NumberOfGCLogFiles=5
-XX:GCLogFileSize=30m
如果写入GC文件会增加代理的延迟,可以考虑将GC日志文件重定向到内存文件系统:
-Xloggc:/dev/shm/mq_gc_%p.log123
6.2 Linux内核参数
os.sh脚本在bin文件夹中列出了许多内核参数,可以进行微小的更改然后用于生产用途。下面的参数需要注意,更多细节请参考/proc/sys/vm/*的文档
- vm.extra_free_kbytes,告诉VM在后台回收(kswapd)启动的阈值与直接回收(通过分配进程)的阈值之间保留额外的可用内存。RocketMQ使用此参数来避免内存分配中的长延迟。(与具体内核版本相关)
- vm.min_free_kbytes,如果将其设置为低于1024KB,将会巧妙的将系统破坏,并且系统在高负载下容易出现死锁。
- vm.max_map_count,限制一个进程可能具有的最大内存映射区域数。RocketMQ将使用mmap加载CommitLog和ConsumeQueue,因此建议将为此参数设置较大的值。(agressiveness --> aggressiveness)
- vm.swappiness,定义内核交换内存页面的积极程度。较高的值会增加攻击性,较低的值会减少交换量。建议将值设置为10来避免交换延迟。
- File descriptor limits,RocketMQ需要为文件(CommitLog和ConsumeQueue)和网络连接打开文件描述符。我们建议设置文件描述符的值为655350。
- Disk scheduler,RocketMQ建议使用I/O截止时间调度器,它试图为请求提供有保证的延迟。