RocketMQ high concurrent read and write

The concurrent read and write capabilities of RocketMQ can withstand the double eleven in 2016, creating 175,000 orders per second (a single order generates N messages, and the actual tps is 175*n million). principles are discussed. It is mainly reflected in two aspects: the client sends and receives messages, and the server receives and persists messages (emphasis added).

Client (RocketMQ-client)

1. The client sends messages with load balancing. The client memory stores all the current server lists. Every time a server is sent, a server is switched to send messages, so that the amount of messages received by each server is as balanced as possible to avoid hot spots.
2. The sending code is thread-safe. When the Producer instance is ready, it can send messages in an infinite loop. Generally, the business side will have N data source instances, so high concurrent write capability is guaranteed from the data source aspect.

3. In the consumer-side load balancing cluster consumption mode, all consumer instances with the same ID consume all queues of the topic on average.

Server side (Broker)

The high concurrent read and write on the server mainly uses the PageCache feature of the Linux operating system, and directly operates the PageCache through Java's MappedByteBuffer. MappedByteBuffer can directly map the file directly to the memory. In fact, Map maps the content of the file to an area of ​​the virtual memory of the computer, so that the data in the memory can be directly manipulated without any operation. hard disk to write files.

Here we first introduce the message storage structure of RocketMQ: it consists of commitLog and consume queue  .

commitLog

1. The commitLog is where message metadata is saved. All messages will be saved to the commitLog file after they arrive at the Broker.
It should be emphasized here that all topic messages will be stored in the commitLog uniformly. For example, the current cluster has TopicA and TopicB. The messages of these two Toipcs will be stored in the same commitLog in the order in which the messages arrive, instead of every Each topic has its own independent commitLog.
2. The upper limit of the size of each commitLog is 1G. After 1G is reached, a new CommitLog file will be automatically created to save data.
3. CommitLog cleanup mechanism:

  • Clean up by time, rocketmq will clean up the commitLog file 3 days ago by default;
  • Clean up by disk water level: When the disk usage reaches 75% of the disk capacity, the oldest commitLog file will be cleaned up.

4. File address: ${user.home}/store/${commitlog}/${fileName}

ConsumerQueue:

1. ConsumerQueue is equivalent to the index file of CommitLog. When consumers consume, they will first look for the offset of the message in the commitLog from the ConsumerQueue, and then go to the CommitLog to find the metadata. If a message only has data in the CommitLog, but not in the ConsumerQueue, the consumer cannot consume it. Rocktet's transaction message is this principle.
2. The data structure of consumequeue consists of 3 parts:

  • The actual offset of the message in the commitLog file (commitLogOffset)
  • message size
  • hash value of message tag

3. File address: ${user.home}/store/consumequeue/${topicName}/${queueId}/${fileName}

 

Thanks to the above data structure, MQ writes data sequentially to disk in the process of writing data , and reads data by jumping to disk (try to hit PageCache).

message sequence write

On a single server, the MQ metadata all falls on a single file (ie commitLog), and a large amount of data IO is written to the same commitLog in sequence. When 1G is full, a new one is written. In the true sense, the sequence is written to the disk, plus By default, MQ is forced to flush 4K from PageCache to disk (cache), so high concurrent write performance is outstanding.

message skip read

MQ relies on the system PageCache to read messages. The higher the PageCache hit rate, the higher the read performance. Linux usually tries to read data in advance, which reduces the probability of applications directly accessing the disk.

When the client pulls messages from the Broker, the system reads the file on the Broker as follows:

1. Check whether the data to be read is in the cache of the last pre-read;
2. If it is not in the cache, the operating system reads the corresponding data page from the disk, and the system will also read the consecutive pages after the data page (generally Three pages) are also read into the cache, and then the data required by the application is returned to the application. In this case, the operating system considers it to be a jump read, which is a synchronous read-ahead.
3. If the cache is hit, it is equivalent to the last cached content is valid. The operating system considers that the disk is read sequentially, and continues to expand the cached data range, and reads the N pages of data after the previously cached data page into the cache, which belongs to the cache. Asynchronous read ahead.

The system defines a data structure for the cache, named window, which consists of the current content to be read + the pre-read content (group).

The following is an example to illustrate with the following figure:

  • a state: the cache state when the operating system is waiting for an application read request.
  • b Status: The client initiates a read operation, and the broker finds that the read data is not in the Cache, that is, it is not in the previous pre-read group, indicating that the file access is not sequential access (the scenario may be that a certain part of the message in the middle is not consumed and consumed directly The latest news), the system uses synchronous read-ahead to directly read pages from disk + cache pages to memory.
  • c状态:客户端继续发起读操作,系统发现所读数据在Cache中,则表明前次预读命中,操作系统把预读group扩大一倍,并让底层文件系统读入group中剩下尚不在Cache中的文件数据块,异步预读。

所以Broker的机器需要大内存,尽量缓存足够多的commitLog,让Broker读写消息基本在PageCache中操作。在运行时,如果数据量非常大,可以看到broker的进程占用内存比较多,其实大部分是被缓存住的commitlog。

 

缓存清理机制(PageCache)

Linux会缓存尽量多的消息数据到内存中,提高读数据缓冲命中率。当内存不够时,还是要清理没用的数据,将清理的空间用以缓存新的内容,这整个过程,Linux用一个双向链表来管理,如下图:

inactive_list代表访问冷数据,active_list代表访问热数据,新分配的数据页先链入到inactive_list头部,当其被引用时再将其移到active_list的头部。

当内存不足时,系统首先从尾部开始反向扫描 active_list并将状态不是referenced的项链入到inactive_list的头部,然后系统反向扫描inactive_list,如果所扫描的项的处于合适的状态就回收该项,直到回收了足够数目的Cache项,这就是系统回收内存的过程。

 

这里需要注意一点,如果内存回收速度比应用写缓存的速度慢,会导致写缓存的线程一直等待,体现到RocketMQ上就是写消息RT很高,这就是 “毛刺问题”。这时就需要结合GC参数和系统内核参数进行调整,此处不对此展开说明了。

 

demo演示:
git clone https://github.com/javahongxi/incubator-rocketmq.git
创建配置文件conf.properties
rocketmqHome=D:\\github\\incubator-rocketmq\\distribution
namesrvAddr=127.0.0.1:9876
mapedFileSizeCommitLog=52428800
mapedFileSizeConsumeQueue=30000

-c conf.properties
依次启动NamesrvStartup,BrokerStartup,Consumer,Producer

 

rocketmq扩展:https://github.com/javahongxi/incubator-rocketmq-externals.git

rocketmq扩展:https://github.com/javahongxi/incubator-rocketmq-externals.git

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=326403447&siteId=291194637