RocketMQ is still using version 4.7.1 of DLedger mode, it is recommended to upgrade to 4.8.0

Preface

In the latest 4.8.0 DLedger mode, when sending messages, the ack from the node passes through the pipeline, which greatly improves the throughput of message sending.
In the previous version (before version 4.8.0, such as 4.7.1), regarding the DLedger mode, I checked the source code and found the following situations, some of which have not been optimized in 4.8.0.

Detailed description

The network between master and slave has an impact on sending tps

Because when submitting a message, it is necessary to synchronously block the ack of waiting for the quorum of slave nodes to return, which has a great impact on the performance of message sending.

    /**
     * Handle the append requests:
     * 1.append the entry to local store
     * 2.submit the future to entry pusher and wait the quorum ack
     * 3.if the pending requests are full, then reject it immediately
     *
     * @param request
     * @return
     * @throws IOException
     */
    @Override
    public CompletableFuture<AppendEntryResponse> handleAppend(AppendEntryRequest request) throws IOException {
        // 前面删除了很多代码
                    DLedgerEntry dLedgerEntry = new DLedgerEntry();
                    dLedgerEntry.setBody(request.getBody());
                    DLedgerEntry resEntry = dLedgerStore.appendAsLeader(dLedgerEntry);
                    return dLedgerEntryPusher.waitAck(resEntry, false);
         //后面删除了一些代码
    }

In 4.8.0, the ack action is asynchronous through the pipeline, and the worker thread sending the message can be quickly released to process the subsequent message sending request.
In fact, in some actual scenarios of this ack action, network problems may become the bottleneck for sending tps. For example, the cloud virtual machine I use, the node where the rocketmq cluster is deployed is not in the same network segment, and the actual host base may be far away. I pinged it and found that the time required 0.3-0.4ms, which means that one The request action may take at least 0.3ms, and 0.3ms is just the ping time. Actually, the action request rt may be longer. Even if an average of 3 tcp requests can be processed in 1ms, the maximum number of 1s is 3000. One ack should require at least two communications (request + response), which means that my sending tps, a single node sending tps cannot exceed 3000/2, and the tps is only 1500. My environment is pressure tested. Sending tps less than 1000.
Another problem is that if there is a problem with the slave node, it will have a great impact on the master's sending tps. I simulated that one of the slave nodes was down, and the tps was lower.
In the optimization using 4.8.0, the sending tps of DLedger is about half of that of non-DLedger. For example, the normal model is 6W, and the DLedger can reach 3W, which is a terrifying improvement to my environment.

How to solve the old version

If in 4.7.1, the sending tps of DLedger is so low, the solution is to modify the default configuration of the number of threads on the sender, and change the spin lock to reentry lock, and solve the CPU usage caused by IO blocking by adding threads Decrease , as long as the number of threads on the sender is set reasonably, sending tps is still considerable. For these two configurations, see this article written before: When the RocketMQ broker processes the message commit, whether the lock should use a spin lock or a reentry lock . 4.8.0 can use the default two configurations, basically no need to consider increasing the number of threads.
Although the version before 4.8.0 can solve the problem of low sending tps by increasing the number of threads, because of the ack, the network flash with the slave node may also cause the put message to hold the lock for too long, which is our common system busy, trigger current limit. The best suggestion is to upgrade to version 4.8.0.

There is no difference between master and slave asynchronous replication

Versions of the DLedger model prior to 4.8.0 have no difference between master-slave synchronous double writing or asynchronous message replication. They are all synchronous blocking methods. In the DLedgerCommitLog class, the putMessage method is called, and there is no asynchronous operation:

    @Override
    public CompletableFuture<PutMessageResult> asyncPutMessage(MessageExtBrokerInner msg) {
        return CompletableFuture.completedFuture(this.putMessage(msg));
    }

    @Override
    public CompletableFuture<PutMessageResult> asyncPutMessages(MessageExtBatch messageExtBatch) {
        return CompletableFuture.completedFuture(putMessages(messageExtBatch));
    }

Therefore, it is recommended to upgrade to version 4.8.0.

Old version does not support off-heap memory

Another problem of the DLedger model is that it does not support the use of off-heap memory when putting a message. This problem has not been fixed in 4.8.0. Non-DLedger transientStorePoolEnablecan choose to enable off-heap memory through configuration parameters , and it will take up 5 additional memory by default. G. Sending message processing. When submitting a message, you can write to the off-heap memory first, asynchronously write to the page cache, and then flash the disk, etc., and read from the page cache to achieve read-write separation. DLedger currently writes directly to the page cache through mmap. When I looked at the source code and found this problem, it was very strange, and I confirmed with the author that this configuration is indeed not supported. Hey, I even asked the boss to turn on this configuration in production and set up 3G memory. The result was a waste.
In short, if you use DLedger, it is strongly recommended to upgrade to 4.8.0.

Guess you like

Origin blog.csdn.net/x763795151/article/details/112853382