RocketMQ high concurrent read and write

The concurrent read and write capabilities of RocketMQ can withstand the 2016 Double Eleven, creating 175,000 orders per second (a single order generates N messages, and the actual tps is 175*n0,000). principles are discussed. It is mainly reflected in two aspects: the client sends and receives messages, and the server receives and persists messages (emphasis added).

Client (RocketMQ-client)

1. The client sends messages with load balancing. The client memory stores all the current server lists. Every time a server is sent, a server is switched to send messages, so that the amount of messages received by each server is as balanced as possible to avoid hot spots.
2. The sending code is thread-safe. When the Producer instance is ready, it can send messages in an infinite loop. Generally, the business side will have N data source instances, so high concurrent write capability is guaranteed from the data source aspect.

3. In the consumer-side load balancing cluster consumption mode, all consumer instances with the same ID consume all queues of the topic on average.

Server side (Broker)

The high concurrent read and write on the server mainly uses the PageCache feature of the Linux operating system, and directly operates the PageCache through Java's MappedByteBuffer. MappedByteBuffer can directly map the file directly to the memory. In fact, Map maps the content of the file to an area of ​​the virtual memory of the computer, so that the data in the memory can be directly manipulated without any operation. hard disk to write files.

Here we first introduce the message storage structure of RocketMQ: it consists of commitLog and consume queue  .

commitLog

1. The commitLog is where message metadata is saved. All messages will be saved to the commitLog file after they arrive at the Broker.
It should be emphasized here that all topic messages will be stored in the commitLog uniformly. For example, the current cluster has TopicA and TopicB. The messages of these two Toipcs will be stored in the same commitLog in the order in which the messages arrive, instead of every Each topic has its own independent commitLog.
2. The upper limit of the size of each commitLog is 1G. After 1G is reached, a new CommitLog file will be automatically created to save data.
3. CommitLog cleanup mechanism:

  • Clean up by time, rocketmq will clean up the commitLog file 3 days ago by default;
  • Clean up by disk water level: When the disk usage reaches 75% of the disk capacity, the oldest commitLog file will be cleaned up.

4. File address: ${user.home}/store/${commitlog}/${fileName}

ConsumerQueue:

1. ConsumerQueue is equivalent to the index file of CommitLog. When consumers consume, they will first look for the offset of the message in the CommitLog from the ConsumerQueue, and then go to the CommitLog to find the metadata. If a message only has data in the CommitLog, but not in the ConsumerQueue, the consumer cannot consume it. Rocktet's transaction message is this principle.
2. The data structure of consumequeue consists of 3 parts:

  • The actual offset of the message in the commitLog file (commitLogOffset)
  • message size
  • hash value of message tag

3. File address: ${user.home}/store/consumequeue/${topicName}/${queueId}/${fileName}

 

Thanks to the above data structure, MQ writes data sequentially to disk in the process of writing data , and reads data by jumping to disk (try to hit PageCache).

message sequence write

On a single server, the MQ metadata all falls on a single file (ie commitLog), and a large amount of data IO is written to the same commitLog in sequence. After 1G is full, a new one is written. In the true sense of the sequential write disk, plus By default, MQ is forced to flush 4K from PageCache to disk (cache), so high concurrent write performance is outstanding.

message skip read

MQ relies on the system PageCache to read messages. The higher the PageCache hit rate, the higher the read performance. Linux usually tries to pre-read data as much as possible, reducing the probability of applications directly accessing the disk.

When the client pulls messages from the Broker, the system reads the file on the Broker as follows:

1. Check whether the data to be read is in the cache of the last pre-read;
2. If it is not in the cache, the operating system reads the corresponding data page from the disk, and the system will also read the consecutive pages after the data page (generally Three pages) are also read into the cache, and then the data required by the application is returned to the application. In this case, the operating system considers it to be a jump read, which is a synchronous read-ahead.
3. If the cache is hit, it is equivalent to the last cached content is valid. The operating system considers that the disk is read sequentially, and continues to expand the cached data range, and reads the N pages of data after the previously cached data page into the cache, which belongs to the cache. Asynchronous read ahead.

The system defines a data structure for the cache, named window, which consists of the current content to be read + the pre-read content (group).

The following is an example to illustrate with the following figure:

  • a state: the cache state when the operating system is waiting for an application read request.
  • b Status: The client initiates a read operation, and the broker finds that the read data is not in the Cache, that is, it is not in the previous pre-read group, indicating that the file access is not sequential access (the scenario may be that a certain part of the message in the middle is not consumed and consumed directly The latest news), the system uses synchronous read-ahead to directly read pages from disk + cache pages to memory.
  • c status: the client continues to initiate the read operation, and the system finds that the read data is in the Cache, indicating that the previous pre-read hit, the operating system doubles the pre-read group, and allows the underlying file system to read the remaining data in the group The file data blocks in the Cache are asynchronously read ahead.

Therefore, the Broker's machine needs a large amount of memory, try to cache enough commitLogs, and let the Broker read and write messages basically operate in the PageCache. At runtime, if the amount of data is very large, you can see that the broker process occupies a lot of memory, but most of them are cached commitlogs.

 

Cache cleaning mechanism (PageCache)

Linux will cache as much message data as possible in memory to improve the read data buffer hit rate. When the memory is not enough, it is still necessary to clear the useless data, and use the cleared space to cache new content. This whole process is managed by Linux with a doubly linked list, as shown in the following figure:

inactive_list represents access to cold data, active_list represents access to hot data, the newly allocated data page is first linked to the inactive_list header, and then moved to the active_list header when it is referenced.

When the memory is insufficient, the system firstly scans the active_list in reverse from the tail and inserts the item whose status is not referenced into the head of the inactive_list, then the system scans the inactive_list in reverse, and if the scanned item is in a suitable state, it will recycle the item. Until a sufficient number of Cache items are reclaimed, this is the process of the system reclaiming memory.

 

It should be noted here that if the memory recovery speed is slower than the application write cache speed, it will cause the thread writing cache to wait all the time. It is reflected in RocketMQ that the write message RT is very high, which is the " glitch problem ". At this time, it is necessary to adjust the GC parameters and system kernel parameters, which will not be explained here.

 

Demo:
git clone https://github.com/javahongxi/incubator-rocketmq.git
create configuration file conf.properties
rocketmqHome=D:\\github\\incubator-rocketmq\\distribution
namesrvAddr=127.0.0.1:
9876 mapedFileSizeCommitLog= 52428800
mapedFileSizeConsumeQueue=30000

-c conf.properties
starts NamesrvStartup, BrokerStartup, Consumer, Producer in turn

 

rocketmq extension: https://github.com/javahongxi/incubator-rocketmq-externals.git

rocketmq extension: https://github.com/javahongxi/incubator-rocketmq-externals.git

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=326403417&siteId=291194637