上文说了NodeManager的初始化,本文说下其服务启动的代码:
@Override protected void serviceStart() throws Exception { try { doSecureLogin(); } catch (IOException e) { throw new YarnRuntimeException("Failed NodeManager login", e); } super.serviceStart(); }
看起来真简单,而实际上,则是把我们初始化过程中加入到serviceList中的所有服务都拿出来,进行一轮serviceStart的过程:
第一个:
DeletionService del = createDeletionService(exec); addService(del);
定时清除服务,其实际上并没有serviceStart,是因为其初始化的时候,已经定义了一个定时处理的线程池:
@Override protected void serviceInit(Configuration conf) throws Exception { ThreadFactory tf = new ThreadFactoryBuilder().setNameFormat("DeletionService #%d").build(); if (conf != null) { sched = new ScheduledThreadPoolExecutor(conf.getInt(YarnConfiguration.NM_DELETE_THREAD_COUNT, YarnConfiguration.DEFAULT_NM_DELETE_THREAD_COUNT), tf); debugDelay = conf.getInt(YarnConfiguration.DEBUG_NM_DELETE_DELAY_SEC, 0); } else { sched = new ScheduledThreadPoolExecutor(YarnConfiguration.DEFAULT_NM_DELETE_THREAD_COUNT, tf); } sched.setExecuteExistingDelayedTasksAfterShutdownPolicy(false); sched.setKeepAliveTime(60L, SECONDS); if (stateStore.canRecover()) { recover(stateStore.loadDeletionServiceState()); } super.serviceInit(conf); }
接着,看这部分:
nodeHealthChecker = new NodeHealthCheckerService(); addService(nodeHealthChecker); dirsHandler = nodeHealthChecker.getDiskHandler();
其初始化已经完成了,但是实际上内部并没有serviceStart方法,而实际上,其所用的地方在下面:
nodeStatusUpdater = createNodeStatusUpdater(context, dispatcher, nodeHealthChecker);
看看其serviceStart的方法:
// NodeManager is the last service to start, so NodeId is available. this.nodeId = this.context.getNodeId(); this.httpPort = this.context.getHttpPort(); this.nodeManagerVersionId = YarnVersionInfo.getVersion(); try { // Registration has to be in start so that ContainerManager can get the // perNM tokens needed to authenticate ContainerTokens. this.resourceTracker = getRMClient(); registerWithRM(); super.serviceStart(); startStatusUpdater(); } catch (Exception e) { String errorMessage = "Unexpected error starting NodeStatusUpdater"; LOG.error(errorMessage, e); throw new YarnRuntimeException(e); }
这里,如果查看代码,会发现nodeId,和httpPort都没有定义,看似是bug,是bug么?不是,可以看这块:
addService(nodeStatusUpdater); ((NMContext) context).setNodeStatusUpdater(nodeStatusUpdater);
最后才把nodeStatusUpdater才加到服务清单内,所以最后才会对其进行初始化,所以我们先略过这块,看看后面的:
NodeResourceMonitor nodeResourceMonitor = createNodeResourceMonitor(); addService(nodeResourceMonitor);
还是很奇怪这段代码,感觉什么用都没有,没有初始化,也没有自定义的serviceStart方法,只能采用默认的方法,不多介绍了:
containerManager = createContainerManager(context, exec, del, nodeStatusUpdater, this.aclsManager, dirsHandler); addService(containerManager);
看看这个containerManager的serviceStart方法:
final InetSocketAddress initialAddress = conf.getSocketAddr(YarnConfiguration.NM_BIND_HOST, YarnConfiguration.NM_ADDRESS, YarnConfiguration.DEFAULT_NM_ADDRESS, YarnConfiguration.DEFAULT_NM_PORT); boolean usingEphemeralPort = (initialAddress.getPort() == 0); if (context.getNMStateStore().canRecover() && usingEphemeralPort) { throw new IllegalArgumentException("Cannot support recovery with an " + "ephemeral server port. Check the setting of " + YarnConfiguration.NM_ADDRESS); } // If recovering then delay opening the RPC service until the recovery // of resources and containers have completed, otherwise requests from // clients during recovery can interfere with the recovery process. final boolean delayedRpcServerStart = context.getNMStateStore().canRecover(); Configuration serverConf = new Configuration(conf); // always enforce it to be token-based. serverConf.set(CommonConfigurationKeysPublic.HADOOP_SECURITY_AUTHENTICATION, SaslRpcServer.AuthMethod.TOKEN.toString()); YarnRPC rpc = YarnRPC.create(conf); server = rpc.getServer(ContainerManagementProtocol.class, this, initialAddress, serverConf, this.context.getNMTokenSecretManager(), conf.getInt(YarnConfiguration.NM_CONTAINER_MGR_THREAD_COUNT, YarnConfiguration.DEFAULT_NM_CONTAINER_MGR_THREAD_COUNT));
毫无疑问,这段代码负责建立一个RPCServer,奇怪的是ipc的默认端口竟然是0,所以启动之前一定要配置,不然启动应该会报错:
/** address of node manager IPC. */ public static final String NM_ADDRESS = NM_PREFIX + "address"; public static final int DEFAULT_NM_PORT = 0; public static final String DEFAULT_NM_ADDRESS = "0.0.0.0:" + DEFAULT_NM_PORT;
接下来,看看nodeId到底是怎么来的:
// setup node ID InetSocketAddress connectAddress; if (delayedRpcServerStart) { connectAddress = NetUtils.getConnectAddress(initialAddress); } else { server.start(); connectAddress = NetUtils.getConnectAddress(server); } NodeId nodeId = buildNodeId(connectAddress, hostOverride); ((NodeManager.NMContext) context).setNodeId(nodeId); this.context.getNMTokenSecretManager().setNodeId(nodeId); this.context.getContainerTokenSecretManager().setNodeId(nodeId);我们给出了connectAddress,生成了一个nodeId,
private NodeId buildNodeId(InetSocketAddress connectAddress, String hostOverride) { if (hostOverride != null) { connectAddress = NetUtils.getConnectAddress(new InetSocketAddress(hostOverride, connectAddress.getPort())); } return NodeId.newInstance(connectAddress.getAddress().getCanonicalHostName(), connectAddress.getPort()); }
@Private @Unstable public static NodeId newInstance(String host, int port) { NodeId nodeId = Records.newRecord(NodeId.class); nodeId.setHost(host); nodeId.setPort(port); nodeId.build(); return nodeId; }
如此,生成了一个NodeId。
LOG.info("ContainerManager started at " + connectAddress); LOG.info("ContainerManager bound to " + initialAddress);
最后有日志输出,我们也可以在日志中看到得到的NodeId到底是什么:
WebServer webServer = createWebServer(context, containerManager.getContainersMonitor(), this.aclsManager, dirsHandler); addService(webServer);
看看NM监控webapp的启动:
@Override protected void serviceStart() throws Exception { String bindAddress = WebAppUtils.getWebAppBindURL(getConfig(), YarnConfiguration.NM_BIND_HOST, WebAppUtils.getNMWebAppURLWithoutScheme(getConfig())); LOG.info("Instantiating NMWebApp at " + bindAddress); try { this.webApp = WebApps .$for("node", Context.class, this.nmContext, "ws") .at(bindAddress) .with(getConfig()) .withHttpSpnegoPrincipalKey( YarnConfiguration.NM_WEBAPP_SPNEGO_USER_NAME_KEY) .withHttpSpnegoKeytabKey( YarnConfiguration.NM_WEBAPP_SPNEGO_KEYTAB_FILE_KEY) .start(this.nmWebApp); this.port = this.webApp.httpServer().getConnectorAddress(0).getPort(); } catch (Exception e) { String msg = "NMWebapps failed to start."; LOG.error(msg, e); throw new YarnRuntimeException(msg, e); } super.serviceStart(); }没什么可说的,最重要的是需要注意address和port的加载来源,
/** NM Webapp address. **/ public static final String NM_WEBAPP_ADDRESS = NM_PREFIX + "webapp.address"; public static final int DEFAULT_NM_WEBAPP_PORT = 8042; public static final String DEFAULT_NM_WEBAPP_ADDRESS = "0.0.0.0:" + DEFAULT_NM_WEBAPP_PORT;
这些配置都在YarnConfiguration内:
这一切结束后,我们再看看:
addService(nodeStatusUpdater);
// NodeManager is the last service to start, so NodeId is available. this.nodeId = this.context.getNodeId(); this.httpPort = this.context.getHttpPort(); this.nodeManagerVersionId = YarnVersionInfo.getVersion(); try { // Registration has to be in start so that ContainerManager can get the // perNM tokens needed to authenticate ContainerTokens. this.resourceTracker = getRMClient(); registerWithRM(); super.serviceStart(); startStatusUpdater(); } catch (Exception e) { String errorMessage = "Unexpected error starting NodeStatusUpdater"; LOG.error(errorMessage, e); throw new YarnRuntimeException(e); }
这下看的清楚了,里面的nodeId和httpPort实际上已经初始化完毕了,重点放在startStatusUpdater,重点在其中的try部分的代码:
NodeStatus nodeStatus = getNodeStatus(lastHeartBeatID);
该方法读取了NM节点的基本状态:
private NodeStatus getNodeStatus(int responseId) throws IOException { NodeHealthStatus nodeHealthStatus = this.context.getNodeHealthStatus(); nodeHealthStatus.setHealthReport(healthChecker.getHealthReport()); nodeHealthStatus.setIsNodeHealthy(healthChecker.isHealthy()); nodeHealthStatus.setLastHealthReportTime(healthChecker.getLastHealthReportTime()); if (LOG.isDebugEnabled()) { LOG.debug("Node's health-status : " + nodeHealthStatus.getIsNodeHealthy() + ", " + nodeHealthStatus.getHealthReport()); } List<ContainerStatus> containersStatuses = getContainerStatuses(); NodeStatus nodeStatus = NodeStatus.newInstance(nodeId, responseId, containersStatuses, createKeepAliveApplicationList(), nodeHealthStatus); return nodeStatus; }
而实际上的实现,采用的是healthChecker来实现的,实际上则是我们前面的NodeHealthCheckerService:
nodeHealthChecker = new NodeHealthCheckerService(); addService(nodeHealthChecker);
nodeStatusUpdater = createNodeStatusUpdater(context, dispatcher, nodeHealthChecker);
我们看看其中的方法,大同小异,看其中一个:
/** * @return the reporting string of health of the node */ String getHealthReport() { String scriptReport = (nodeHealthScriptRunner == null) ? "" : nodeHealthScriptRunner.getHealthReport(); if (scriptReport.equals("")) { return dirsHandler.getDisksHealthReport(false); } else { return scriptReport.concat(SEPARATOR + dirsHandler.getDisksHealthReport(false)); } }
其实工作都是交给了dirsHandler,具体不多说了:
而所有的服务启动完毕之后,我们的NMManager就可以使用了。