RocketMQ source code analysis——NameServer

Why should you learn RocketMQ source code?

  • Write elegant, efficient code. As Alibaba’s core link product for Double Eleven transactions, RocketMQ supports tens of millions of concurrencies and trillions of data peaks. Reading source code can accumulate experience in writing efficient and elegant code.
  • Improve micro-architectural design capabilities, focusing on thinking and concepts. As a top-level Apache project, Apache RocketMQ's architectural design is worth learning from.
  • Solve various difficult and complicated diseases at work and study. If you encounter problems such as stuck consumption or lagging while using RocketMQ, you can find the problem and solve it by reading the source code.
  • Show your excellent self in interviews with BATJ first-tier Internet companies. In interviews with major companies, especially Alibaba-based companies, it will definitely be a big plus if you have systematic knowledge of RocketMQ source code.

Technical highlights in RocketMQ source code

  • read-write lock
  • Atomic operation class
  • File storage design
  • Zero copy: MMAP
  • Thread Pool
  • ConcurrentHashMap
  • Copy-on-write containers
  • Load balancing strategy
  • Failure delay mechanism
  • off-heap memory

RocketMQ module structure

The overall module of RocketMQ is as follows:

  1. rocketmq-namesrv : naming service. Update and route discovery broker services. Provide message producers and message consumers with routing information about topics. In addition to storing basic routing information, NameServer must also be able to manage Broker nodes, including routing registration, routing deletion and other functions.
  2. rocketmq-broker : The core of mq. It can receive requests from producers and consumers and call store layer services to process messages. The basic unit of HA service supports synchronous dual-write, asynchronous dual-write and other modes.
  3. rocketmq-store : Storage layer implementation, including index service and high-availability HA service implementation.
  4. rocketmq-remoting : The underlying communication implementation based on netty, all interactions between services are based on this module.
  5. rocketmq-common : Some common functional classes between modules, such as some configuration files and constants.
  6. rocketmq-client : Java version of mq client implementation
  7. rocketmq-filter : Message filtering service, which is equivalent to adding a filter agent between the broker and consumer.
  8. rocketmq-srvutil : ServerUtil, a tool class for parsing command lines.
  9. rocketmq-tools : mq cluster management tool, providing functions such as message query

RocketMQ has a lot of source code. It is not necessary to read all the source code of RocketMQ. Interpret the core and key source code. The core process of RocketMQ is as follows:

  • Startup process
    RocketMQ server consists of two parts: NameServer and Broker. NameServer is the registration center of the service. Broker will register its address to NameServer. When the producer and consumer start, they will first obtain the Broker's address from NameServer and then go to it from NameServer. Broker sends and receives messages.
  • The message production process
    Producer writes the message to the specific Queue in the Broker in the RocketMQ cluster.
  • The message consumption process
    Comsumer pulls the corresponding message from the RocketMQ cluster and confirms the consumption.

NameServer source code analysis

NameServer overall process

NameServer is the "brain" of the entire RocketMQ. It is the service registration center of RocketMQ, so RocketMQ needs to start the NameServer first and then start the Broker in Rocket.

Insert image description here

  • NameServer starts.
    Start listening and wait for Broker, Producer, and Comsumer connections. The Broker registers with all NameServers when it starts. The producer obtains the Broker server address list from the NameServer before sending messages, and then selects a server from the list to send messages based on the load balancing algorithm. The consumer obtains the Broker server address list (possibly a cluster) from the NamerServer before subscribing to the message of a topic, but the consumer chooses to subscribe to the message from the Broker. The subscription rules are determined by the Broker configuration.
  • After the routing registration
    Broker is started, it sends routing and heartbeat information to all NameServers.
  • Route Elimination
    NameServer maintains a long connection with each Broker service and checks whether the Broker is alive every 10 seconds. If it detects that the Broker is down, it will be removed from the routing registry. In this way, RocketMQ's high availability can be achieved.

NameServer startup process

NameServer is started separately. Entry class: NamesrvController. The flow chart is as follows:

image.png

Load KV configuration

Core interpretation of createNamesrvController() in the NamesrvController class

image.png

It is found that there is a p parameter in the source code. If you directly enter -p in the startup parameter, you can print all the parameter information of the NameServer (but the NameServer will automatically terminate), indicating that this -p is a test parameter.

image.png

During normal startup, all parameters can be found in the startup log:

image.png

Construct NRS communication to receive routing and heartbeat information

image.png

image.png

Scheduled tasks eliminate timeout Broker

The core controller will start a scheduled task: scan Brokers every 10 seconds and remove inactive Brokers.

The Broker sends a heartbeat packet to the NameServer every 30 seconds. The heartbeat packet contains the BrokerId, Broker address, Broker name, the name of the cluster to which the Broker belongs, and the list of FilterServers associated with the Broker.

image.png

But if the Broker goes down and the NameServer cannot receive the heartbeat packet, how does the NameServer eliminate these failed Brokers? NameServer will scan the brokerLiveTable status table every 10 seconds. If the timestamp of BrokerLive's lastUpdateTimestamp is more than 120s from the current time, the Broker will be considered invalid, the Broker will be removed, the connection with the Broker will be closed, and topicQueueTable, brokerAddrTable, brokerLiveTable, and filterServerTable will be updated at the same time.

image.png

However, there are problems with this design. If the Broker that the NameServer thinks is available is actually down, the routes read from the NameServer include unavailable hosts, which will cause abnormal message production/consumption. This problem can be solved by fault avoidance strategies and retry mechanisms on the production and consumer sides. This design is in line with RocketMQ's design philosophy: the overall design pursues simplicity and performance. At the same time, the NameServer is designed to be stateless and multiple servers can be deployed at will. The code is also very simple and lightweight.

RocketMQ has two trigger points to delete routing information:

  • NameServer periodically scans the brokerLiveTable to detect the time difference between the last heartbeat packet and the current system. If the time exceeds 120s, the broker needs to be removed.
  • When the Broker is shut down normally, it will execute the unregisterBroker command. The two methods of routing deletion are the same, and both delete the information related to the broker from the relevant routing table.

After the consumer starts, the first step is to obtain Topic related information from the NameServer.

NameServer design highlights

read-write lock

There is a read-write lock design in the RouteInfoManager class

image.png

When a message is sent, the client will obtain the routing information from the NameServer, and the Broker will regularly update the NameServer's routing information, so the routing table will have the following operations very frequently:

  1. When producers send messages, they need to frequently obtain topics and read the topic table.
    image.png
  2. Broker will update a routing table regularly (30s) and write to the Topic table.
    image.png

How can frequent reading and writing improve concurrency, especially when producers send messages, so the read-write lock mechanism is used here (for scenarios where there is more reading and less writing).

Synchronized and ReentrantLock are basically exclusive locks. Exclusive locks only allow one thread to access at the same time, while read-write locks can allow multiple reading threads to access at the same time. However, when a writing thread accesses, all reading threads and other writers Threads are blocked. The read-write lock maintains a pair of locks, a read lock and a write lock. By separating the read lock and the write lock, the concurrency is greatly improved compared to the general exclusive lock.

Storage is based on memory

NameServer stores the following information:

topicQueueTable : Topic message queue routing information, load balancing is performed based on the routing table when sending messages.

brokerAddrTable : Broker basic information, including brokerName, cluster name, active and backup Broker addresses

clusterAddrTable : Broker cluster information, stores all Broker names in the cluster

brokerLiveTable : Broker status information, NameServer will replace this information every time it receives a heartbeat packet.

filterServerTable : FilterServer list on the Broker, used for class mode message filtering.

image.png

The implementation of NameServer is based on memory. NameServer does not persist routing information. The important task of persistence is left to the Broker. This design can improve the processing capabilities of NameServer.

NameServer stateless

  • They do not communicate with each other in the NameServer cluster.
  • In the master-slave architecture, the Broker will register routing and heartbeat information with all NameServers.
  • The producer/consumer establishes a long connection with one of the NameServer clusters at the same time.

Assume that a RocketMQ cluster is deployed in two computer rooms. Each computer room has some NameServer, Broker and client nodes. When the link between the two computer rooms is interrupted, all NameServers can provide services, and the client can only be in the local computer room. Find the Broker of this computer room in NameServer.

In the RocetMQ cluster, NameServers do not need to communicate with each other, so network partitions have no impact on the availability of the NameServer itself. If the NameServer detects that the connection to the Broker is interrupted, the NameServer will think that the Broker can no longer provide services, and the NameServer This Broker will be immediately removed from the routing information to prevent the client from connecting to an unavailable Broker.

After the network is partitioned, the NameServer cannot receive the heartbeats of the Brokers in the peer computer room. At this time, each Namesever only has the Broker information of the local computer room.

Guess you like

Origin blog.csdn.net/qq_28314431/article/details/133030933