RocketMQ source code analysis——Broker

Preface

The Broker module involves a lot of content. This article mainly introduces the following technical points:

  1. Broker startup process analysis
  2. Message storage design
  3. Message writing process
  4. Highlight Analysis: Function Number Design of NRS and NRC
  5. Highlight analysis: CompletableFuture with synchronous double-write performance improved several times
  6. Highlight analysis: Use reentrant lock or spin lock when writing Commitlog?
  7. Highlight Analysis: Zero-copy technology MMAP improves file reading and writing performance
  8. Highlight analysis: off-heap memory mechanism

Broker startup process

The startup flow chart of Broker is as follows

Insert image description here

  1. Load the configuration information stored on the server, the consumer's consumption progress, the consumer's subscription information, etc. These configuration information are automatically persisted to the server before the broker is shut down. This is just to restore the configuration of the broker before shutting down.
  2. Load the message storage file MessageStore component, which stores messages by creating a MappedFileQueue, and this MappedFileQueue is the mapping in the code of the folder where the files stored by CommitLog, ConsumeQueue, Index and other files are located.
  3. Create and start the BrokerController controller, which will create some processors and managers for starting and shutting down the broker, as well as handling the sending and receiving of messages and other operations.

Broker message storage design

The layout of files in Kafka is based on Topic/partition. Each partition has a physical folder. File sequential writing is implemented at the partition file level. If there are hundreds or thousands of topics in a Kafka cluster, and each topic has hundreds of partitions, When messages are written with high concurrency, their IO operations will appear scattered (the dispersed message placement strategy will lead to fierce disk IO competition and become a bottleneck), and their operations are equivalent to random IO, that is, Kafka's IO performance will decrease when messages are written. As the number of topics and partitions increases, its write performance will first increase and then decrease.

RocketMQ pursues the ultimate sequential writing when writing messages. All messages are written to the commitlog file sequentially regardless of topic, and the orderliness will not be affected as the number of topics and partitions increases. In a scenario where the message sending end and the consumer end coexist, Kafka's throughput will drop sharply as the number of topics increases, while RocketMQ's performance is stable. Therefore, Kafka is suitable for business scenarios with relatively few topics and consumers, while RocketMQ is more suitable for business scenarios with many topics and multiple consumers.

Save file design

The files mainly stored by RocketMQ include Commitlog files, ConsumeQueue files, and IndexFile. RocketMQ stores the messages of all topics in the same file, ensuring that files are written sequentially when sending messages, and doing its best to ensure high performance and high throughput of message sending. However, since general message middleware is a subscription mechanism based on message topics, this brings great inconvenience to retrieving messages according to message topics. In order to improve the efficiency of message consumption, RocketMQ introduces the ConsumeQueue message queue file. Each message topic contains multiple message consumption queues, and each message queue has a message file. RocketMQ also introduces the IndexFile index file. Its main design concept is to speed up the message retrieval performance. It can quickly retrieve messages from the Commitlog file according to the attributes of the message. The whole is as follows:

Insert image description here

  1. CommitLog: Message storage file, messages of all message topics are stored in the CommitLog file
  2. ConsumeQueue: Message consumption queue. After the message reaches the CommitLog file, it will be asynchronously forwarded to the message consumption queue for consumption by message consumers.
  3. IndexFile: Message index file, mainly stores the correspondence between message Key and Offset

Message storage structure

CommitLog is stored in the form of physical files. The CommitLog on each Broker is shared by all ConsumeQueue on the machine. In CommitLog, the storage length of a message is not fixed. RocketMQ adopts some mechanisms to try to write sequentially to CommitLog, but reads randomly. . The default size of the commitlog file is lG. The default size can be changed by setting the mapedFileSizeCommitLog attribute in the broker configuration file.

The logical view of Commitlog file storage is as follows. The first 4 bytes of each message store the total length of the message. But the storage length of a message is not fixed.

ConsumeQueue is a logical queue of messages, similar to the index file of a database, and stores addresses pointing to physical storage. Each Message Queue under each Topic has a corresponding ConsumeQueue file.

ConsumeQueue stores message entries. In order to speed up the retrieval of ConsumeQueue message entries and save disk space, each Consumequeue entry does not store the full information of the message. The message entries are as follows:

ConsumeQueue is the index file of the Commitlog file. Its construction mechanism is: when the message reaches the Commitlog file, a dedicated thread generates a message forwarding task, thereby constructing the message consumption queue file (ConsumeQueue) and the index file mentioned below. The storage mechanism designed in this way has the following advantages:

  1. CommitLog writes sequentially, which can greatly improve writing efficiency.
  2. Although it is a random read, the operating system's pagecache mechanism can be used to read from the disk in batches and store them in the memory as a cache to speed up subsequent reads. At the same time, because the index of each message in ConsumeQueue is of fixed length, it can also ensure that the time complexity of message consumption remains O(1).
  3. In order to ensure complete sequential writing, the intermediate structure ConsumeQueue is needed. Because only offset information is stored in ConsumeQueue, the size is limited. In actual situations, most ConsumeQueue can be fully read into the memory, so this intermediate structure The operation speed is very fast, which can be considered as the speed of memory reading. In addition, in order to ensure the consistency between CommitLog and ConsumeQueue, CommitLog stores all information such as Consume Queues, Message Key, Tag, etc. Even if ConsumeQueue is lost, it can be completely restored through commitLog.

IndexFile

index stores an index file, which is used to speed up message query. Message consumption queue RocketMQ is an index file specially built for message subscription to improve the speed of retrieving messages based on topics and messages. It uses the Hash index mechanism, specifically the linked list structure of Hash slots and Hash conflicts.

Config

The config folder stores related information such as Topic and Consumer. Information related to topics and consumer groups exists here.

Broker message writing process

RocketMQ uses Netty to handle the network. When the broker receives a request to write a message, it will enter the processRequest method in the SendMessageProcessor class.

Finally enter the asyncPutMessage method in the DefaultMessageStore class to store the message

Then the message enters the asyncPutMessage method in the commitlog class to store the message.

The entire storage design hierarchy is very clear. The rough hierarchy is as follows:

image.png

Broker design highlights

Function number design of NRS and NRC

RocketMQ uses Netty for communication. There are two core client classes: RemotingCommand and NettyRemotingClient.

RemotingCommand mainly handles message assembly: including message headers, message serialization and deserialization.

NettyRemotingClient mainly handles message sending: including synchronous, asynchronous, one-way, registration and other operations.

Because there are many types of RocketMQ messages, a design similar to function numbers is used for message sending. When the client sends a message, it defines a code, which corresponds to a function, and the server registers a business process, which corresponds to the business process of a code.

For example: from the producer client code, jump to the NRC code: NettyRemotingClient

image.png

In sendMessage() in the MQClientAPIImpl class

image.png

NettyRemotingClient class

image.png

In NRS, you only need to register the ExecutorService that the server needs to process into the NRS component.

In initialize() in the BrokerController class in the startup process

image.png

image.png

Note: The design of function numbers is not one-to-one between the client and the server. On the server side, different function numbers can often be mapped to one processing task.

CompletableFuture with synchronous double-write performance improved several times

After RocketMQ4.7.0, RocketMQ makes extensive use of the asynchronous programming interface CompletableFuture in Java. Especially when receiving and processing messages on the Broker side.

For example: asyncPutMessage method in DefaultMessageStore class

image.png

The Future interface is an implementation of the Future mode in the design pattern: if a request or task is time-consuming, the method call can be changed to asynchronous, the method returns immediately, the task is executed asynchronously using other threads outside the main thread, and the main thread continues implement. When you need to get the calculation results, get the data.

In the Master-Slave master-slave architecture, there are two modes of data synchronization/replication between the Master node and Slave node: synchronous double-write and asynchronous replication. Synchronous double writing means that after the Master successfully places the message on the disk, it needs to wait for the Slave node to successfully copy (if there are multiple Slaves, just copy one successfully) before telling the client that the message was successfully sent.

image.png

RocketMQ 4.7.0 and later will rationally use CompletableFuture to optimize the performance of synchronous double writing, streamlining the processing of messages, and greatly improving the Broker's ability to receive messages.

Use reentrant lock or spin lock when writing Commitlog?

RocketMQ uses a lock mechanism when writing messages to CommitLog, that is, only one thread can write to the CommitLog file at the same time. Two locks are used in CommitLog, one is a spin lock and the other is a reentrant lock. The source code is as follows:


The type of lock can be configured independently. RocketMQ official document optimization suggestions: it is recommended to use spin locks for asynchronous disk brushing and reentrant locks for synchronous disk brushing. Adjust the Broker configuration item useReentrantLockWhenPutMessage, which defaults to false;

When synchronously brushing the disk, competition for locks is fierce, and more threads will be blocked and waiting for locks. If spin locks are used, a lot of CPU time will be wasted, so "it is recommended to use reentrancy locks for synchronous disk brushing."

Asynchronous disk brushing is to brush the disk at a certain interval. The lock competition is not fierce, and there will not be a large number of threads blocked waiting for the lock. Occasionally, if the lock is waiting, spin and wait for a short period of time. Do not perform context switching, so spin is used. A lock would be more appropriate.

Zero-copy technology MMAP improves file reading and writing performance

Zero-copy (English: Zero-copy) technology means that when a computer performs operations, the CPU does not need to first copy data from one memory location to another specific area. This technique is commonly used to save CPU cycles and memory bandwidth when transferring files over a network. MMAP is a type of zero-copy technology.

The bottom layer of RocketMQ uses mmap technology for read and write operations on disk files such as commitLog and consumeQueue. Specifically in the code, the map() function of NIO's MapperByteBuffer in the JDK is used to first map the disk files (CommitLog files, consumeQueue files) into memory.

If mmap technology is not used, using the most traditional and basic ordinary files for IO operations will cause multiple copies of data. For example, we read data from the disk into the kernel IO buffer, and then read it from the kernel IO buffer into the user process private space, and then we can get the data.

MMAP memory mapping maps the location of the file on the hard disk to the application buffers (establishing a one-to-one correspondence). Since mmap() maps the file directly to user space, the actual file is read according to This mapping relationship directly copies the file from the hard disk to the user space. Only one data copy is performed, and the file content is no longer copied from the hard disk to a buffer in the kernel space.

mmap technology has a limit on the file size during the address mapping process, which is between 1.5G and 2G. Therefore, RocketMQ will control the size of a single commitLog file to 1GB and the consumeQueue file size to 5.72MB, so that When reading and writing, memory mapping is convenient.

The source code related to MMAP when the Broker is started is as follows:

When the producer sends a message, the source code for MMAP related messages is written as follows:

Off-heap memory mechanism

Under normal circumstances, RocketMQ uses MMAP memory mapping. Messages are written to the memory mapped file during production, and then read again during consumption. But RocketMQ also provides a mechanism. Off-heap memory mechanism: TransientStorePool, short-lived storage pool (off-heap memory).

Off-heap memory enabled

To enable off-heap memory, you need to modify the configuration file broker.conf: transientStorePoolEnable=true

At the same time, if the off-heap memory buffer is enabled, the cluster mode must be asynchronous disk flushing mode and the Broker must be the master node. You can see this restriction by viewing the source code:

DefaultMessageStore. DefaultMessageStore()

It can also be seen from the flow chart of off-heap memory that writing messages to off-heap memory obviously requires one more step, so the setting of the off-heap memory buffer must be asynchronous.

Off-heap buffer process

RocketMQ creates a separate ByteBuffer memory cache pool to temporarily store data. The data is first written into the memory map, and then the commit thread regularly copies the data from the memory to the memory map corresponding to the target physical file. The main reason RocketMQ introduces this mechanism is to provide a memory lock that keeps the current off-heap memory locked in memory to prevent the process from swapping the memory to disk. At the same time, because it is off-heap memory, this design can avoid frequent GC.

The meaning of off-heap memory buffering

In the default mode (Mmap+PageCache), all reading and writing messages go through the pageCache (MappedByteBuffer class). In this way, reading and writing are all in the pagecache and there will inevitably be locking problems. In the case of concurrent read and write operations, there will be Page fault interrupt reduction, memory locking, write-back of contaminated pages).

If an off-heap buffer is used, the two-layer architecture of DirectByteBuffer (off-heap memory) + PageCache can achieve separation of reading and writing messages. When writing messages, DirectByteBuffer is written to the off-heap memory, and the reading message goes The most important thing is PageCache (MappedByteBuffer class). The advantage is that it avoids many easy-to-block areas in memory operations and reduces latency, such as reducing page fault interrupts, memory locking, and write-back of polluted pages.

Therefore, it is relatively better to use an off-heap buffer, but it certainly requires a certain amount of memory. If the server memory is tight, this mode is not recommended. At the same time, the off-heap buffer also needs to be coordinated with asynchronous brushing. The disk can be used (because writing data is divided into two steps, the delay of synchronous disk flushing will be relatively large).

Guess you like

Origin blog.csdn.net/qq_28314431/article/details/133032609