Routing Center NameServer in-depth source code analysis

This chapter mainly introduces the mechanism of RocketMQ reason management, service registration and service discovery. NameServer is the "brain" of the entire RocketMQ. I believe everyone is familiar with the term "service discovery". There will be a service registry in the distributed service SOA architecture system. The registry center of distributed service SOA mainly provides service invocation analysis services, and guides service callers (consumers) to find "remote" service providers and complete network communication. So what data does RocketMQ's routing center store? As a high-performance message middleware, how to avoid the single point of failure of NameServer? What about availability? Let us enter the wonderful world of RocketMQ NameServer with the above questions.

NameServer Architecture Design

The design idea of ​​message middleware is generally a topic-based subscription and publishing mechanism, and in order to avoid a single point of failure of a message server from causing the entire system to collapse, multiple message servers are usually deployed to jointly store messages. Which message server to send to? What if a message server goes down? How does the producer sense without restarting the service?

NameServer exists to solve the above problems.

When the Broker starts, it registers itself with all NameServers. The message producer first obtains the Broker server address list through the NameServer before sending the message, and then selects a message server from the list to send the message according to the load balancing algorithm. The NameServer maintains a long connection with each Broker server, and checks whether the Broker is alive at intervals of 30 seconds. If the Broker is detected to be down, it will be removed from the routing table, but the routing change will not immediately notify the producer. Why is this designed? This is to reduce the complexity of the NameServer implementation and provide a fault-tolerant mechanism on the message sender to ensure high availability of message sending. This part will be explained in detail in the subsequent producer section.

The high availability of the NameServer itself can be achieved by deploying multiple NameServer servers, but they do not communicate with each other before, which means that the data between the NameServers at a certain time will not be exactly the same, but this will not have any impact on message sending. , which is also a highlight of the RocketMQ NameServer design, pursuing simplicity and efficiency.

NameServer core class analysis

  • RouteInfoManager: used to manage heartbeat information and route management
  • KVConfigManager: used to manage and load KV configuration. Load the kvConfig configuration file (usually xxx/kvConfig.json) specified by NamesrvController into memory
  • NameSrvController: NameSever controller, a bit like the Controller layer of the three-tier architecture, is used for request forwarding, wraps the core class, and is responsible for the initialization and shutdown of the NameServer server.
  • NamesrvStartUp: The NameServer service startup class helps to read the configuration, create the Controller and start the service.
  • NameSrvConfig: NameServer business parameter configuration class
  • NettyServerConfig: NameServer network parameter configuration class, because it involves network interaction
  • BrokerHousekeepingServer: related to event monitoring, used to handle Broker's Channel events, such as handling connection events, closing events, exception events, etc.
  • NettyRemotingServer: Used to implement network interaction, and there are also some inner classes. This chapter will not be introduced in detail, and will be introduced in detail in the communication chapter. The simple understanding is to interact and transmit data with Broker, Producer, and Consumer through this example.
  • DefaultRequestProcessor: The default request processor for processing requests.

Since there are many classes and the internal logic is simple, this chapter will only introduce some of the most important core classes related to NameServer

RouteInfoManager, others will be introduced when the NameServer starts the process or the communication layer.

RouteInfoManager

We know that NameServer acts as a "routing center". Broker will submit its own metadata information to NameServer regularly. Producer obtains the list of Broker addresses through NameServer, and selects a suitable Broker to send messages.

So where are these data stored in NameServer? In fact, it is RouteInfoManager.

RouteInfoManager的核心属性其实就是多个映射表,也就是Map,用于存储Key、Value键值对使用的。

private final HashMap<String/* topic */, List> topicQueueTable;

private final HashMap<String/* brokerName */, BrokerData> brokerAddrTable;

private final HashMap<String/* clusterName /, Set<String/ brokerName */>> clusterAddrTable;

private final HashMap<String/* brokerAddr */, BrokerLiveInfo> brokerLiveTable;

private final HashMap<String/* brokerAddr /, List/ Filter Server */> filterServerTable;

这个类的大多数方法都是基于这些映射表的基础上操作的。

为大家一一介绍这些表的作用

  • topicQueueTable:key为topic,value为队列在broker上的分布情况,可以理解为每一个BrokerName对应一个QueueData。topic消息队列路由信息,消息发送时根据该映射表进行负载均衡
  • brokerAddrTable:key为broker名称,value为Broker基础信息。包含 brokerName、所属集群名称,brokerName下的master节点的id、ip地址,slave节点的id、ip地址。
  • clusterAddrTable:Broker集群信息,key为集群名称,value为存储当前集群名称下的所有Broker名称
  • brokerLiveTable:Broker状态信息。NameServer每次收到心跳包会替换该信息
  • filterServerTable:Broker上的FilterServer列表,用于类模式消息过滤用的

RocketMQ基于发布订阅机制,一个Topic拥有多个消息队列,一个Broker为每一个主题默认创建4个读队列4个写队列。多个Broker组成一个集群,BrokerName由相同的多台Broker组成Master-Slave架构,brokerId为0代表Master,大于0表示Slave。BrokerLiveInfo中的lastUpdateTimestamp存储上次收到Broker心跳包的时间

假设此时的RocketMQ部署架构为 2主2从,如下图所示 brokerId为0表示master,brokerID为1表示slave:

那么对应RouteInfoManager中的运行时数据结构为:

路由注册

路由注册时通过Broker定期会与NameServer的心跳功能实现的。Broker启动时候会向NameServer集群中所有的NameServer发送心跳包,每隔30s向集群中所有的NameServer发送心跳包,NameServer收到Broker心跳包以后会更新brokerAddrTable缓存中的BrokerLiveInfo的lastUpdateTimestamp,NameServer会以10秒为周期进行扫描brokerLiveTble,如果连续120s都没有收到broker的心跳包,那么会将此broker处理为下线状态,移除关于下线broker的相关信息,并且关闭Socket连接。

Broker发送心跳包

此方法其实就是broker向NameServer集群去发送心跳包,包的一些内容,网络交互的一些细节我们后续会讲,主要是会发送一些关于当前broker的一些信息以及管理的topic信息。这里Broker与NameServer具体是如何交互的我们后续会详细讲解,这里大家只需要知道发送的时候消息头会携带一个code码,根据code码可以可以知晓本次操作是

RegisterBroker(心跳)操作

NameServer处理心跳包

NameServer会接受到Broker发来的心跳包,DefaultRequestProcessor(默认请求处理器) 根据请求头中的code码,执行处理心跳包的操作。

具体会进行调用RouteInfoManager.registerBroker()操作。其实就是broker注册路由的操作过程,主要还是操作RouteInfoManager中的多个映射表

这里先总结一下流程,后续为大家贴代码,首先注册路由的操作全程都是加写锁的,因为上述的多个映射表都是HashMap结构,HashMap结构是线程不安全的,因此需要加锁操作。

broker向NameServer进行路由注册的时候会携带一些数据

比如 brokerName、所属集群名称、broker地址、brokerId、自身管理的所有topic信息

注册路由的过程总结

  1. 根据所属集群名称去 clusterAddrTable 表中获取Set,将brokerName加入到此set集合中。
  2. 根据brokerName去 brokerAddrTable 表中获取 brokerData,如果获取不到,则创建 brokerData,写入brokerAddrTable 映射表中
  3. 去判断是否存在brokerName下的主备节点切换,如果存在切换则更新其相关信息
  4. 如果此次传入的topic相关信息不为空,并且broker节点是master节点的话,尝试去更新topic相关路由信息,主要是保存broker上管理了多少topic,以及管理了topic内的读队列和写队列数量。
  5. 更新心跳信息,重点是更新 brokerLiveInfo的lastUpdateTimestamp字段,上次心跳时间。
  6. 更新类模式消息过滤信息
  7. 向broker返回结果。

具体代码流程:

首先看一下构造器

    public RouteInfoManager() {
        this.topicQueueTable = new HashMap<String, List<QueueData>>(1024);
        this.brokerAddrTable = new HashMap<String, BrokerData>(128);
        this.clusterAddrTable = new HashMap<String, Set<String>>(32);
        this.brokerLiveTable = new HashMap<String, BrokerLiveInfo>(256);
        this.filterServerTable = new HashMap<String, List<String>>(256);
    }
复制代码

路由删除

根据上面章节的介绍,Broker每隔30s向NameServer发送一个心跳包,心跳包中包含BrokerId、Broker地址、Broker名称、Broker所属集群名称、Broker关联的fiterServer列表。但是如果Broker宕机,NameServer无法收到来自broker心跳包,此时NameServer如何去剔除这些失效的Broker呢?

NameServer会启动一个定时任务,每隔10s去扫描所有的 brokerLiveTable 映射表,如果当前时间减去BrokerLiveInfo.lastUpdateTimestamp已经超过120s,则认为Broker已经失效,移除该broker,关闭与Broker连接,并同时更新多个映射表相关的信息。

RocketMQ三个触发点来触发

  • 定时任务通过lastUpdateTimestamp信息判断broker已经失效,会触发 destory 操作,也就是路由删除操作。
  • 网络交互Netty层面的,NameServer和Broker会建立长连接Channel,在此期间,如果Channel中120s没有进行 读 | 写 操作的时候,同样会进行关闭socket,触发destory操作 (具体细节我们讲通信层的时候在讲)
  • broker正常关闭,会执行 unRegisterBroker 操作

定时任务触发

每隔10s进行扫描判断是否存在失效的Broker,如果存在则移除失效broker

Netty网络交互层面触发

具体原理我们后续会在通信层面讲解,这里简单说一下原理

通过Netty提供的 IdleStateHandler 类,用于提供心跳机制的,简单来说,就是当 指定的时间内 channel 没有传递数据的话,IdleStateHandler 会通过 pipeline 传播 userEventTriggered()方法,并且内容类型为具体的 IdleStateEvent,RocektMQ提供的ChannelHandler对象监听userEventTriggered()方法,然后进行关闭通道。

可以看到,向某个地方放入来一个Netty事件。

其实原理就是向一个队列里面放入一个事件,并且这个队列一直存在一个线程进行消费队列中的事件,当发现事件类型是 Idle 的时候,会调用 onChannelDestory() 方法

Broker正常下线触发

Broker正常关闭的时候,会向NameServer发送 unRegisterBroker 操作。

做的事情其实也很简单,就是移除下线broker相关的信息。

    // 参数一:broker所属集群名称
    // 参数二:broker地址
    // 参数三:broker名称
    // 参数四:brokerID
    public void unregisterBroker(
        final String clusterName,
        final String brokerAddr,
        final String brokerName,
        final long brokerId) {
        try {
            try {
                this.lock.writeLock().lockInterruptibly();
                // 移除心跳信息 根据broker地址移除
                BrokerLiveInfo brokerLiveInfo = this.brokerLiveTable.remove(brokerAddr);
                log.info("unregisterBroker, remove from brokerLiveTable {}, {}",
                    brokerLiveInfo != null ? "OK" : "Failed",
                    brokerAddr
                );

                // 移除类模式过滤消息
                this.filterServerTable.remove(brokerAddr);

                boolean removeBrokerName = false;
                // 根据brokerName获取brokerData
                BrokerData brokerData = this.brokerAddrTable.get(brokerName);
                if (null != brokerData) {
                    // 根据brokerId移除brokerAdd信息
                    String addr = brokerData.getBrokerAddrs().remove(brokerId);
                    log.info("unregisterBroker, remove addr from brokerAddrTable {}, {}",
                        addr != null ? "OK" : "Failed",
                        brokerAddr
                    );

                    // 条件成立:说明不存在brokerName下不存在节点了
                    if (brokerData.getBrokerAddrs().isEmpty()) {
                        // 移除此brokerName
                        this.brokerAddrTable.remove(brokerName);
                        log.info("unregisterBroker, remove name from brokerAddrTable OK, {}",
                            brokerName
                        );


                        removeBrokerName = true;
                    }
                }

                // 条件成立 :说明brokerName下不存在任何节点了
                if (removeBrokerName) {
                    Set<String> nameSet = this.clusterAddrTable.get(clusterName);
                    if (nameSet != null) {
                        // 移除集群下的失效brokerName
                        boolean removed = nameSet.remove(brokerName);
                        log.info("unregisterBroker, remove name from clusterAddrTable {}, {}",
                            removed ? "OK" : "Failed",
                            brokerName);

                        if (nameSet.isEmpty()) {
                            this.clusterAddrTable.remove(clusterName);
                            log.info("unregisterBroker, remove cluster from clusterAddrTable {}",
                                clusterName
                            );
                        }
                    }
                    // 移除失效的broker相关的队列信息
                    this.removeTopicByBrokerName(brokerName);
                }
            } finally {
                this.lock.writeLock().unlock();
            }
        } catch (Exception e) {
            log.error("unregisterBroker Exception", e);
        }
    }
复制代码

onChannelDestory()方法解析

    // 参数一:需要关闭的broker地址
    // 参数二:namesrv与broker建立的channel
    public void onChannelDestroy(String remoteAddr, Channel channel) {
        String brokerAddrFound = null;
        if (channel != null) {
            try {
                try {
                    // 读锁
                    this.lock.readLock().lockInterruptibly();
                    // 迭代 broker 活跃映射表
                    Iterator<Entry<String, BrokerLiveInfo>> itBrokerLiveTable =
                        this.brokerLiveTable.entrySet().iterator();
                    while (itBrokerLiveTable.hasNext()) {
                        Entry<String, BrokerLiveInfo> entry = itBrokerLiveTable.next();
                        if (entry.getValue().getChannel() == channel) {
                            brokerAddrFound = entry.getKey();
                            break;
                        }
                    }
                } finally {
                    // 释放读锁
                    this.lock.readLock().unlock();
                }
            } catch (Exception e) {
                log.error("onChannelDestroy Exception", e);
            }
        }


        if (null == brokerAddrFound) {
            brokerAddrFound = remoteAddr;
        } else {
            log.info("the broker's channel destroyed, {}, clean it's data structure at once", brokerAddrFound);
        }

        if (brokerAddrFound != null && brokerAddrFound.length() > 0) {

            try {
                try {
                    // 写锁
                    this.lock.writeLock().lockInterruptibly();
                    // 移除broker活跃集合表中key为brokerAddrFound的
                    this.brokerLiveTable.remove(brokerAddrFound);
                    this.filterServerTable.remove(brokerAddrFound);
                    String brokerNameFound = null;
                    boolean removeBrokerName = false;

                    Iterator<Entry<String, BrokerData>> itBrokerAddrTable =
                        this.brokerAddrTable.entrySet().iterator();


                    while (itBrokerAddrTable.hasNext() && (null == brokerNameFound)) {
                        BrokerData brokerData = itBrokerAddrTable.next().getValue();

                        Iterator<Entry<Long, String>> it = brokerData.getBrokerAddrs().entrySet().iterator();
                        while (it.hasNext()) {
                            Entry<Long, String> entry = it.next();
                            Long brokerId = entry.getKey();
                            String brokerAddr = entry.getValue();
                            if (brokerAddr.equals(brokerAddrFound)) {
                                brokerNameFound = brokerData.getBrokerName();
                                it.remove();
                                log.info("remove brokerAddr[{}, {}] from brokerAddrTable, because channel destroyed",
                                    brokerId, brokerAddr);
                                break;
                            }
                        }

                        if (brokerData.getBrokerAddrs().isEmpty()) {
                            removeBrokerName = true;
                            itBrokerAddrTable.remove();
                            log.info("remove brokerName[{}] from brokerAddrTable, because channel destroyed",
                                brokerData.getBrokerName());
                        }
                    }

                    // 条件一成立:说明 找到broker地址对应的brokerName,
                    // 条件二成立:removeBrokerName为true代表brokerName对应的broker节点已经全部下线
                    // 需要从集群中移除对应的broker节点
                    if (brokerNameFound != null && removeBrokerName) {
                        Iterator<Entry<String, Set<String>>> it = this.clusterAddrTable.entrySet().iterator();
                        while (it.hasNext()) {
                            Entry<String, Set<String>> entry = it.next();
                            String clusterName = entry.getKey();
                            Set<String> brokerNames = entry.getValue();
                            boolean removed = brokerNames.remove(brokerNameFound);
                            if (removed) {
                                log.info("remove brokerName[{}], clusterName[{}] from clusterAddrTable, because channel destroyed",
                                    brokerNameFound, clusterName);

                                if (brokerNames.isEmpty()) {
                                    log.info("remove the clusterName[{}] from clusterAddrTable, because channel destroyed and no broker in this cluster",
                                        clusterName);
                                    it.remove();
                                }

                                break;
                            }
                        }
                    }

                    // 条件成立:removeBrokerName为true 代表brokerName对应的broker节点已经全部下线
                    if (removeBrokerName) {
                        Iterator<Entry<String, List<QueueData>>> itTopicQueueTable =
                            this.topicQueueTable.entrySet().iterator();
                        // 迭代主题队列映射表 将销毁的broker上分布的topic队列信息移除
                        while (itTopicQueueTable.hasNext()) {
                            Entry<String, List<QueueData>> entry = itTopicQueueTable.next();
                            String topic = entry.getKey();
                            List<QueueData> queueDataList = entry.getValue();

                            Iterator<QueueData> itQueueData = queueDataList.iterator();
                            while (itQueueData.hasNext()) {
                                QueueData queueData = itQueueData.next();
                                if (queueData.getBrokerName().equals(brokerNameFound)) {
                                    itQueueData.remove();
                                    log.info("remove topic[{} {}], from topicQueueTable, because channel destroyed",
                                        topic, queueData);
                                }
                            }

                            if (queueDataList.isEmpty()) {
                                itTopicQueueTable.remove();
                                log.info("remove topic[{}] all queue, from topicQueueTable, because channel destroyed",
                                    topic);
                            }
                        }
                    }
                } finally {
                    // 释放写锁
                    this.lock.writeLock().unlock();
                }
            } catch (Exception e) {
                log.error("onChannelDestroy Exception", e);
            }
        }
    }
复制代码

路由发现

RocektMQ 路由发现是非实时的,当Topic路由信息发生变化后,NameServer不会主动推送给客户端,而是由客户端定时拉取主题最新的路由。根据主题名称拉取路由信息的命令编码为:GET_ROUTEINFO_BY_TOPIC。

其实就是根据请求头中提取code值,根据code值判断是 根据主题名称拉取路由 逻辑,然后执行此逻辑

getRouteInfoByTopic 会调用到具体的获取路由数据逻辑。

上述关于通信层的信息在后续章节会详细讲解,只需要知道请求信息里面包含了topic名称,需要根据Topic名称获取 TopicRouteData 数据,然后将 TopicRouteData 编码为json存放在响应体中,返回给客户端。如果未找到则会向客户端返回 协议码为 TOPIC_NOT_EXIST

TopicRouteData中存放的数据也非常简单

private String orderTopicConf;
private List queueDatas;
private List brokerDatas;
private HashMap<String/* brokerAddr /, List/ Filter Server */> filterServerTable;

  • orderTopicConfig:顺序消息配置内容,来自于kvConfig。后续顺序消息章节会讲
  • List :topic 队列元数据,例如 topic 数据 分布在哪些broker上
  • List:topic 分布的 broker 元数据,例如 broker名称、broker地址等信息
  • filterServerTable:broker上过滤服务列表

QueueData内部的属性:

BrokerData内部的属性:

为大家简单举个例子吧,方便大家理解。

假设RocektMQ此时的部署架构为2主2从。如下图所示:

假设某个Topic具有16个队列,8个队列是可读的、8个队列是可写的。并且Topic其中的4个写队列、4个读队列在Broker-a上,另外4个写队列、4个读队列在Broekr-b上。

那么正常情况下 返回给客户端的QueueData长这样:

返回给客户端的BrokerData长这样:

pickupTopicRouteData()方法解析

流程总结 读取映射表中的数据是加读锁的。

  1. 去topicQueueTable映射表上根据topic名称获取List
  2. 收集QueueData中的所有BrokerName,代表topic分布在这些BrokerName上
  3. 遍历这些brokerName,去对应的brokerAddrTable映射表中获取brokerData数据
  4. 进行组装数据,返回TopicRouteData
 // 参数:topic 主题
    public TopicRouteData pickupTopicRouteData(final String topic) {
        // 创建topic路由数据对象
        TopicRouteData topicRouteData = new TopicRouteData();
        boolean foundQueueData = false;
        boolean foundBrokerData = false;
        // 存放brokerName集合
        Set<String> brokerNameSet = new HashSet<String>();
        // 存放broker数据集合
        List<BrokerData> brokerDataList = new LinkedList<BrokerData>();

        topicRouteData.setBrokerDatas(brokerDataList);
        // 类过滤模式相关
        HashMap<String, List<String>> filterServerMap = new HashMap<String, List<String>>();
        topicRouteData.setFilterServerTable(filterServerMap);

        try {
            try {
                // 加读锁
                this.lock.readLock().lockInterruptibly();
                // 获取关于topic的队列数据信息 主要是队列分布在哪些broker上
                List<QueueData> queueDataList = this.topicQueueTable.get(topic);
                if (queueDataList != null) {
                    // 设置查找到的队列数据集合
                    topicRouteData.setQueueDatas(queueDataList);
                    foundQueueData = true;

                    // 迭代队列数据集合 将分布的brokerName存放到集合中
                    Iterator<QueueData> it = queueDataList.iterator();
                    while (it.hasNext()) {
                        QueueData qd = it.next();
                        brokerNameSet.add(qd.getBrokerName());
                    }

                    // 迭代存在当前topic队列数据的brokerName
                    for (String brokerName : brokerNameSet) {
                        // 获取brokerData数据
                        BrokerData brokerData = this.brokerAddrTable.get(brokerName);
                        if (null != brokerData) {
                            // 赋值一份brokerData数据
                            BrokerData brokerDataClone = new BrokerData(brokerData.getCluster(), brokerData.getBrokerName(), (HashMap<Long, String>) brokerData
                                .getBrokerAddrs().clone());
                            brokerDataList.add(brokerDataClone);
                            foundBrokerData = true;
                            for (final String brokerAddr : brokerDataClone.getBrokerAddrs().values()) {
                                List<String> filterServerList = this.filterServerTable.get(brokerAddr);
                                filterServerMap.put(brokerAddr, filterServerList);
                            }
                        }
                    }
                }
            } finally {
                this.lock.readLock().unlock();
            }
        } catch (Exception e) {
            log.error("pickupTopicRouteData Exception", e);
        }

        log.debug("pickupTopicRouteData {} {}", topic, topicRouteData);

        // 两者都为true才会返回路由数据 否则返回null
        if (foundBrokerData && foundQueueData) {
            return topicRouteData;
        }

        return null;
    }
复制代码

NameServer启动流程解析

核心其实就是创建上述介绍的那些核心类,以及由于是网络编程,还会启动Netty服务器,默认监听9876端口,等待客户端连接,同时注册一些定时任务,由于简单,这里就直接贴代码注释

启动入口是 NameSrvStartUp,因此我们直接分析该类

NameSrvStartup#createNameSrvController逻辑

   public static NamesrvController main0(String[] args) {

        try {
            // 解析配置并且创建NameSrv控制器
            NamesrvController controller = createNamesrvController(args);
            // 执行NamesrvController initialize()逻辑
            start(controller);
            String tip = "The Name Server boot success. serializeType=" + RemotingCommand.getSerializeTypeConfigInThisServer();
            log.info(tip);
            System.out.printf("%s%n", tip);
            return controller;
        } catch (Throwable e) {
            e.printStackTrace();
            System.exit(-1);
        }

        return null;
    }
复制代码
    public static NamesrvController createNamesrvController(String[] args) throws IOException, JoranException {
        // 设置rocketmq.remoting.version属性
        // 其实是设置rocketMq的版本
        System.setProperty(RemotingCommand.REMOTING_VERSION_KEY, Integer.toString(MQVersion.CURRENT_VERSION));
        //PackageConflictDetect.detectFastjson();

        //  构建命令行选项
        Options options = ServerUtil.buildCommandlineOptions(new Options());
        // 解析命令行相关 生成命令行对象 用于处于命令行指令
        commandLine = ServerUtil.parseCmdLine("mqnamesrv", args, buildCommandlineOptions(options), new PosixParser());
        if (null == commandLine) {
            System.exit(-1);
            return null;
        }

        // 创建NameSrv 业务参数配置类
        final NamesrvConfig namesrvConfig = new NamesrvConfig();
        // 创建NameServer 网络参数配置类
        final NettyServerConfig nettyServerConfig = new NettyServerConfig();
        // 修改启动端口为9876
        nettyServerConfig.setListenPort(9876);
        // 条件成立:说明命令行输入了-c参数 java CLITester -c configFile地址
        if (commandLine.hasOption('c')) {
            // 获取到配置文件地址
            String file = commandLine.getOptionValue('c');
            if (file != null) {
                InputStream in = new BufferedInputStream(new FileInputStream(file));
                // 生成配置类
                properties = new Properties();
                // 加载到配置信息
                properties.load(in);
                // 尝试将properties解析的属性赋值给namesrvConfig的属性中
                MixAll.properties2Object(properties, namesrvConfig);
                // 尝试将properties解析的属性赋值给nettyServerConfig的属性中
                MixAll.properties2Object(properties, nettyServerConfig);

                // 设置文件路径
                namesrvConfig.setConfigStorePath(file);

                System.out.printf("load config properties file OK, %s%n", file);
                in.close();
            }
        }

        // 条件成立:说明命令行输入了-p参数 java CLITester -p
        // 会进行打印配置项相关
        if (commandLine.hasOption('p')) {
            InternalLogger console = InternalLoggerFactory.getLogger(LoggerName.NAMESRV_CONSOLE_NAME);
            MixAll.printObjectProperties(console, namesrvConfig);
            MixAll.printObjectProperties(console, nettyServerConfig);
            System.exit(0);
        }

        // 尝试将命令行配置的参数都设置到namesrvConfig的属性中
        MixAll.properties2Object(ServerUtil.commandLine2Properties(commandLine), namesrvConfig);

        // 条件成立 程序退出 因为需要配置rocketMq Home
        if (null == namesrvConfig.getRocketmqHome()) {
            System.out.printf("Please set the %s variable in your environment to match the location of the RocketMQ installation%n", MixAll.ROCKETMQ_HOME_ENV);
            System.exit(-2);
        }

        // 日志相关
        LoggerContext lc = (LoggerContext) LoggerFactory.getILoggerFactory();
        JoranConfigurator configurator = new JoranConfigurator();
        configurator.setContext(lc);
        lc.reset();
        configurator.doConfigure(namesrvConfig.getRocketmqHome() + "/conf/logback_namesrv.xml");

        log = InternalLoggerFactory.getLogger(LoggerName.NAMESRV_LOGGER_NAME);

        MixAll.printObjectProperties(log, namesrvConfig);
        MixAll.printObjectProperties(log, nettyServerConfig);

        // 生成NamesrvController对象
        // 参数一:namesrv配置对象
        // 参数二:netty服务启动配置对象
        final NamesrvController controller = new NamesrvController(namesrvConfig, nettyServerConfig);

        // remember all configs to prevent discard
        // 记住所有的配置以防止丢弃
        controller.getConfiguration().registerConfig(properties);

        return controller;
    }
复制代码
    public NamesrvController(NamesrvConfig namesrvConfig, NettyServerConfig nettyServerConfig) {
        this.namesrvConfig = namesrvConfig;
        this.nettyServerConfig = nettyServerConfig;
        this.kvConfigManager = new KVConfigManager(this);
        this.routeInfoManager = new RouteInfoManager();
        this.brokerHousekeepingService = new BrokerHousekeepingService(this);
        this.configuration = new Configuration(
            log,
            this.namesrvConfig, this.nettyServerConfig
        );
        this.configuration.setStorePathFromConfig(this.namesrvConfig, "configStorePath");
    }
复制代码

NameSrvStartup#start逻辑

流程总结:

  • 调用NamesrvController.initizlize()方法
  • 向JVM中增加一个关闭的钩子,当JVM关闭的时候,会执行钩子逻辑,也就是NamesrvController.shutdown()方法
  • 调用NamesrvController.start()方法
    public static NamesrvController start(final NamesrvController controller) throws Exception {

        if (null == controller) {
            throw new IllegalArgumentException("NamesrvController is null");
        }

        // 执行controller.initialize() 和  controller.start()逻辑
        // 返回值:初始化结果 true 代表初始化成功 false代表失败
        boolean initResult = controller.initialize();

        // 条件成立:说明初始化失败
        if (!initResult) {
            controller.shutdown();
            System.exit(-3);
        }

        // 向JVM中增加一个关闭的钩子 当JVM关闭的时候,会执行ShutdownHookThread.run()方法
        // ShutdownHookThread.run()方法内部又会执行Callable.call()方法
        Runtime.getRuntime().addShutdownHook(new ShutdownHookThread(log, new Callable<Void>() {
            @Override
            public Void call() throws Exception {
                // 当jvm被关闭的时候执行 用于关闭资源
                controller.shutdown();
                return null;
            }
        }));

        // 启动namesrcController对象
        // 主要是启动内部的remotingServer对象
        controller.start();

        return controller;
    }
复制代码

NamesrvController#initizlize()方法解析

主要做了几件事情:

  • 加载kv配置
  • 创建NettyRemotingServer对象(后续通信章节会详细讲解)
  • 注册默认请求处理器 (后续通信章节会详细讲解)
  • 注册两个定时任务
    • 每隔10秒扫描是否有满足失效条件的Broker
    • 每隔10分钟打印所有的配置
    /**
     * 下面的部分对象 在创建NameSrvController的时候已经被创建
     * @return 初始化结果
     */
    public boolean initialize() {

        // 加载kv配置
        this.kvConfigManager.load();

        // 创建网络层server对象
        // 参数一:netty服务器启动配置类
        // 参数二:broker内部处理服务 用于监听不同的事件 处理不同的对象
        this.remotingServer = new NettyRemotingServer(this.nettyServerConfig, this.brokerHousekeepingService);

        // 生成网络层线程池
        this.remotingExecutor =
            Executors.newFixedThreadPool(nettyServerConfig.getServerWorkerThreads(), new ThreadFactoryImpl("RemotingExecutorThread_"));

        // 注册处理器
        this.registerProcessor();

        // 注册定时任务 延时5秒执行,10秒为一个周期
        // 作用:扫描下线的broker
        this.scheduledExecutorService.scheduleAtFixedRate(new Runnable() {

            @Override
            public void run() {
                NamesrvController.this.routeInfoManager.scanNotActiveBroker();
            }
        }, 5, 10, TimeUnit.SECONDS);

        // 注册定时任务 延时1秒后执行 10分钟为一个周期
        // 作用:打印所有配置相关
        this.scheduledExecutorService.scheduleAtFixedRate(new Runnable() {

            @Override
            public void run() {
                NamesrvController.this.kvConfigManager.printAllPeriodically();
            }
        }, 1, 10, TimeUnit.MINUTES);

        // ssl相关 注册一个监听器去加载SslContext 不关注此逻辑
        if (TlsSystemConfig.tlsMode != TlsMode.DISABLED) {
            // Register a listener to reload SslContext
            try {
                fileWatchService = new FileWatchService(
                    new String[] {
                        TlsSystemConfig.tlsServerCertPath,
                        TlsSystemConfig.tlsServerKeyPath,
                        TlsSystemConfig.tlsServerTrustCertPath
                    },
                    new FileWatchService.Listener() {
                        boolean certChanged, keyChanged = false;
                        @Override
                        public void onChanged(String path) {
                            if (path.equals(TlsSystemConfig.tlsServerTrustCertPath)) {
                                log.info("The trust certificate changed, reload the ssl context");
                                reloadServerSslContext();
                            }
                            if (path.equals(TlsSystemConfig.tlsServerCertPath)) {
                                certChanged = true;
                            }
                            if (path.equals(TlsSystemConfig.tlsServerKeyPath)) {
                                keyChanged = true;
                            }
                            if (certChanged && keyChanged) {
                                log.info("The certificate and private key changed, reload the ssl context");
                                certChanged = keyChanged = false;
                                reloadServerSslContext();
                            }
                        }
                        private void reloadServerSslContext() {
                            ((NettyRemotingServer) remotingServer).loadSslContext();
                        }
                    });
            } catch (Exception e) {
                log.warn("FileWatchService created error, can't load the certificate dynamically");
            }
        }

        return true;
    }
复制代码
    private void registerProcessor() {
        // 测试用的,默认不成立
        if (namesrvConfig.isClusterTest()) {

            this.remotingServer.registerDefaultProcessor(new ClusterTestRequestProcessor(this, namesrvConfig.getProductEnvName()),
                this.remotingExecutor);
        } else {
            // 默认走这里
            // 向网络层服务对象注册一个默认的处理器
            // 参数一:默认请求处理器
            // 参数二:网络层线程池 用于执行默认请求处理器里面的逻辑
            this.remotingServer.registerDefaultProcessor(new DefaultRequestProcessor(this), this.remotingExecutor);
        }
    }
复制代码

NamesrvController#start方法解析

其实主要是调用NettyRemotingServer.start()方法,该方法内部做的事情其实就是启动Netty服务器,配置ChannelHandler、线程池等参数,监听9876端口,具体逻辑会在通信层章节讲解。

这里的代码先不作具体解释,只是贴出代码

  public void start() throws Exception {
        // 启动网络服务对象
        this.remotingServer.start();

        if (this.fileWatchService != null) {
            this.fileWatchService.start();
        }
    }
复制代码

this.remotingServer.start();

    @Override
    public void start() {
        // 创建 默认事件执行器组 线程用于执行channelHandler逻辑
        this.defaultEventExecutorGroup = new DefaultEventExecutorGroup(
            nettyServerConfig.getServerWorkerThreads(),
            new ThreadFactory() {

                private AtomicInteger threadIndex = new AtomicInteger(0);

                @Override
                public Thread newThread(Runnable r) {
                    return new Thread(r, "NettyServerCodecThread_" + this.threadIndex.incrementAndGet());
                }
            });

        // 创建Netty的ChannelHandler对象
        prepareSharableHandlers();



        ServerBootstrap childHandler =
            // 设置我们的boss组和worker组  boss组用于处理accept事件 worker组处理read和write事件
            this.serverBootstrap.group(this.eventLoopGroupBoss, this.eventLoopGroupSelector)
                 // 根据是否是Linux内核 创建不同的socketChannel用于accept
                .channel(useEpoll() ? EpollServerSocketChannel.class : NioServerSocketChannel.class)
                 // 服务端处理客户端的连接请求是顺序处理的,当服务器没有过多的线程处于连接请求时,需要将客户端的连接请求放入到队列中
                 // 而SO_BACKLOG指定的是队列的大小
                 // 对应的是tcp/ip协议,具体实现为操作系统层面的协议栈代码。
                 // listen函数中的 backlog 参数,用来初始化服务端可连接队列
                 // 内核要维护 两个队列 一个是 未连接队列(syns queue) 另外一个是已连接队列(accept queue)
                 // 未连接队列保存的是 客户端已经发来Syn连接请求 代表至少一个Syn已到达,但是还未完成三次握手
                 // 已连接队列保存的是 已经完成三次握手 但是还未调用 socket库的accept方法
                 // SO_BACKLOG的作用是 当 syncQueue + acceptQueue > SO_BACKLOG时候,新的连接会被TCP内核拒绝掉
                .option(ChannelOption.SO_BACKLOG, nettyServerConfig.getServerSocketBacklog())
                 // SO_REUSEADDR  地址复用相关
                 // 一般来说,一个端口释放后会等待两分钟之后才能再被使用,SO_REUSEADDR是让端口释放后立即就可以再次被使用
                .option(ChannelOption.SO_REUSEADDR, true)
                 // 保活机制,如果开启SO_KEEPALIVE后,当 客户端向服务端发送消息后,服务端没向客户端回复,
                 // 那么客户端可能不知道 服务器是否已经挂掉了,解决办法是 客户端当超过一定时间后自动给服务器发送一个空报文,等待服务端返回
                 // 类似于心跳机制
                 // 为什么要关掉保护机制? 几个原因
                 // 1. 超时的默认时间为2小时,意味这只有当服务端未向客户端响应数据,客户端需要2小时后才能发现服务端的问题,参数是可配置的
                 // 2. 保活机制是属于传输层的,当发现连接挂掉后不能执行应用层的相应逻辑
                 // 3. Namesrv实现了应用层的心跳机制 用于处理Broker下线问题
                 // 4. 不能判断连接是否可用,只能判断连接是否存活,TCP连接中的另外一方突然断电关闭连接,那么对端是无法知晓的。
                .option(ChannelOption.SO_KEEPALIVE, false)
                 // childOption用于指定worker组线程 也就是指 客户端已经和服务端完成了三次握手,进入了read或者write的步骤
                 // 控制是否开启Nagle算法,提高较慢的广域网传输效率
                 // 通俗解释:减少需要传输的数据次数,优化网络 既然相同的数据要减少传输次数,那么必要导致通信过程中数据包的增多
                .childOption(ChannelOption.TCP_NODELAY, true)
                 // 指定accept监听的端口 9876
                .localAddress(new InetSocketAddress(this.nettyServerConfig.getListenPort()))
                 // 指定ChannelHandler相关的类
                .childHandler(new ChannelInitializer<SocketChannel>() {
                    @Override
                    public void initChannel(SocketChannel ch) throws Exception {
                        // handshakeHandler、encoder、connectionManageHandler、serverHandler都是标注了Sharable注解
                        // 代表公用Handler,不需要为每个socket连接创建相应的handler
                        // inboundHandler: handshakeHandler -> decoder -> idleState -> connectManager -> serverHandler
                        // outboundHandler : encoder -> idleState -> connectManager -> serverHandler
                        ch.pipeline()
                             // handshakeHandler 当客户端配置了useTls = true,为服务端动态创建SSLHandler,并动态删除自己
                             // ssl相关的,不在本次考虑范围
                            .addLast(defaultEventExecutorGroup, HANDSHAKE_HANDLER_NAME, handshakeHandler)
                            .addLast(defaultEventExecutorGroup,
                                encoder,
                                new NettyDecoder(),
                                // 心跳机制,用来检查远端是否存活 配置为120秒
                                // 简单来说,就是当 120内 不存在 读|写的时候
                                // IdleStateHandler会通过pipeline传递userEventTriggered()方法,
                                // 并且内容类型为IdleStateEvent
                                // connectionManageHandler 用于接受此事件,然后进行处理
                                new IdleStateHandler(0, 0, nettyServerConfig.getServerChannelMaxIdleTimeSeconds()),
                                // 用于当一定时间内发现对端连接未进行读|写数据进行关闭通道
                                connectionManageHandler,
                                // rocketMq逻辑处理对象 核心 所有请求会被封装为RemotingCommand对象,
                                // 然后被serverHandler内部的逻辑处理
                                serverHandler
                            );
                    }
                });
        //  下面如果配置了socket接受缓冲区、发送缓冲区、写缓冲区高水位、低水位等 则使用配置的参数
        if (nettyServerConfig.getServerSocketSndBufSize() > 0) {
            log.info("server set SO_SNDBUF to {}", nettyServerConfig.getServerSocketSndBufSize());
            childHandler.childOption(ChannelOption.SO_SNDBUF, nettyServerConfig.getServerSocketSndBufSize());
        }
        if (nettyServerConfig.getServerSocketRcvBufSize() > 0) {
            log.info("server set SO_RCVBUF to {}", nettyServerConfig.getServerSocketRcvBufSize());
            childHandler.childOption(ChannelOption.SO_RCVBUF, nettyServerConfig.getServerSocketRcvBufSize());
        }
        if (nettyServerConfig.getWriteBufferLowWaterMark() > 0 && nettyServerConfig.getWriteBufferHighWaterMark() > 0) {
            log.info("server set netty WRITE_BUFFER_WATER_MARK to {},{}",
                    nettyServerConfig.getWriteBufferLowWaterMark(), nettyServerConfig.getWriteBufferHighWaterMark());
            childHandler.childOption(ChannelOption.WRITE_BUFFER_WATER_MARK, new WriteBufferWaterMark(
                    nettyServerConfig.getWriteBufferLowWaterMark(), nettyServerConfig.getWriteBufferHighWaterMark()));
        }

        // 开启Netty内存池管理
        if (nettyServerConfig.isServerPooledByteBufAllocatorEnable()) {
            childHandler.childOption(ChannelOption.ALLOCATOR, PooledByteBufAllocator.DEFAULT);
        }

        try {
            // 绑定端口,sync()同步等待
            ChannelFuture sync = this.serverBootstrap.bind().sync();

            InetSocketAddress addr = (InetSocketAddress) sync.channel().localAddress();
            // 保存端口
            this.port = addr.getPort();
        } catch (InterruptedException e1) {
            throw new RuntimeException("this.serverBootstrap.bind().sync() InterruptedException", e1);
        }


        if (this.channelEventListener != null) {
            this.nettyEventExecutor.start();
        }

        // 注册定时任务 3秒后开始执行 执行周期为1秒
        this.timer.scheduleAtFixedRate(new TimerTask() {

            @Override
            public void run() {
                try {
                    // 定期调用以扫描过期的请求
                    NettyRemotingServer.this.scanResponseTable();
                } catch (Throwable e) {
                    log.error("scanResponseTable exception", e);
                }
            }
        }, 1000 * 3, 1000);
    }
复制代码

总结

整个NameSrv路由注册、路由发现、路由删除。整个NameSrv核心地方已经讲解完毕了,涉及到网络交互的一些地方没有细讲,下个章节笔者将会为大家讲解通信层模块,也就是RocketMQ源码中的Remoting模块。

路由发现机制可以用下图来解释

整个NameServer先介绍到这里了,我们发现NameServer这样的架构设计会存在这样一种情况:

NameServer需要等Broker至少120s才能将该Broker从路由表删除,那如果在Broker故障期间,消息生产者根据主题获取到已经宕机的Broker,会导致消息发送失败,那这种情况怎么办?这样岂不是消息发送不是高可用的?这点在后续生产者章节会为大家详细介绍。

Guess you like

Origin juejin.im/post/7086112241124114463