接系列(二)。
介绍完ApplicationSubmissionContext之后,继续下去:
@Override public YarnClientApplication createApplication() throws YarnException, IOException { ApplicationSubmissionContext context = Records.newRecord (ApplicationSubmissionContext.class); GetNewApplicationResponse newApp = getNewApplication(); ApplicationId appId = newApp.getApplicationId(); context.setApplicationId(appId); return new YarnClientApplication(newApp, context); }
其实这里有一个问题,需要深入探索下newRecord方法,此处就不多写了,另外撰文再说。
先看看getNewApplication方法:
private GetNewApplicationResponse getNewApplication() throws YarnException, IOException { GetNewApplicationRequest request = Records.newRecord(GetNewApplicationRequest.class); return rmClient.getNewApplication(request); }
果然,不出意料,利用建立好的rpc连接来实现。
这里,补上一点东西,具体的rmClient的实现是什么?因为ApplicationClientProtocol有很多的实现类,在我们提交任务的过程中,具体是哪一个发挥了作用?
/** * Delegate responsible for communicating with the Resource Manager's * {@link ApplicationClientProtocol}. * * @param conf * the configuration object. */ public ResourceMgrDelegate(YarnConfiguration conf) { super(ResourceMgrDelegate.class.getName()); this.conf = conf; this.client = YarnClient.createYarnClient(); init(conf); start(); }
仔细看这个构造器,发现,这里传入的其实是YarnConfiguration,而不是我们前面所使用的Configuration,点进去看一下,铺天盖地全是相关的配置:
@Override protected void serviceStart() throws Exception { try { rmClient = ClientRMProxy.createRMProxy(getConfig(), ApplicationClientProtocol.class); if (historyServiceEnabled) { historyClient.start(); } if (timelineServiceEnabled) { timelineClient.start(); } } catch (IOException e) { throw new YarnRuntimeException(e); } super.serviceStart(); }
而这里,服务启动时候传入的,也是YarnConfiguration,看看这个createRMProxy方法:
/** * Create a proxy for the specified protocol. For non-HA, * this is a direct connection to the ResourceManager address. When HA is * enabled, the proxy handles the failover between the ResourceManagers as * well. */ @Private protected static <T> T createRMProxy(final Configuration configuration, final Class<T> protocol, RMProxy instance) throws IOException { YarnConfiguration conf = (configuration instanceof YarnConfiguration) ? (YarnConfiguration) configuration : new YarnConfiguration(configuration); RetryPolicy retryPolicy = createRetryPolicy(conf); if (HAUtil.isHAEnabled(conf)) { RMFailoverProxyProvider<T> provider = instance.createRMFailoverProxyProvider(conf, protocol); return (T) RetryProxy.create(protocol, provider, retryPolicy); } else { InetSocketAddress rmAddress = instance.getRMAddress(conf, protocol); LOG.info("Connecting to ResourceManager at " + rmAddress); T proxy = RMProxy.<T>getProxy(conf, protocol, rmAddress); return (T) RetryProxy.create(protocol, proxy, retryPolicy); } }
更加具体的实现,此处不予赘述了,因为YarnConfiguration的存在,最终加载了的的实现类是:
ApplicationClientProtocolPBClientImpl。
@Override public GetNewApplicationResponse getNewApplication( GetNewApplicationRequest request) throws YarnException, IOException { GetNewApplicationRequestProto requestProto = ((GetNewApplicationRequestPBImpl) request).getProto(); try { return new GetNewApplicationResponsePBImpl(proxy.getNewApplication(null, requestProto)); } catch (ServiceException e) { RPCUtil.unwrapAndThrowException(e); return null; } }
这里,我们能看到对request做了序列化,采用的是ProtocolBuffer的序列化机制。
到这里,client的代码告一段落,我们看下服务器端对应的处理是怎么做的:
/** * The client interface to the Resource Manager. This module handles all the rpc * interfaces to the resource manager from the client. */ public class ClientRMService extends AbstractService implements ApplicationClientProtocol
这个类,在org.apache.yarn.server.resourcemanager下面,是ResourceManager对应处理的逻辑所在地,我们看下其中getNewApplication的处理机制。
@Override public GetNewApplicationResponse getNewApplication( GetNewApplicationRequest request) throws YarnException { GetNewApplicationResponse response = recordFactory .newRecordInstance(GetNewApplicationResponse.class); response.setApplicationId(getNewApplicationId()); // Pick up min/max resource from scheduler... response.setMaximumResourceCapability(scheduler .getMaximumResourceCapability()); return response; }
这里,看下getNewApplicationId方法:
ApplicationId getNewApplicationId() { ApplicationId applicationId = org.apache.hadoop.yarn.server.utils.BuilderUtils .newApplicationId(recordFactory, ResourceManager.getClusterTimeStamp(), applicationCounter.incrementAndGet()); LOG.info("Allocated new applicationId: " + applicationId.getId()); return applicationId; }这样,对于新建的一个Application就获取到一个新的id了,
/** * <p><code>ApplicationId</code> represents the <em>globally unique</em> * identifier for an application.</p> * * <p>The globally unique nature of the identifier is achieved by using the * <em>cluster timestamp</em> i.e. start-time of the * <code>ResourceManager</code> along with a monotonically increasing counter * for the application.</p> */ @Public @Stable public abstract class ApplicationId implements Comparable<ApplicationId>
这里,我们可以看下ApplicationId的生成逻辑,是如何生成一个全局唯一的ApplicationId:
@Private @Unstable public static ApplicationId newInstance(long clusterTimestamp, int id) { ApplicationId appId = Records.newRecord(ApplicationId.class); appId.setClusterTimestamp(clusterTimestamp); appId.setId(id); appId.build(); return appId; }
逻辑很简单,重要的是依旧采用了protocol buffer的机制,就是其中的build方法,这里是2.x.x后续版本的一大特点,序列化采用了Google的Protocol Buffer机制。
执行完毕之后,客户端得到了一个全局唯一的ApplicationId,接下来,我们继续看JobSubmitter中submitJobInternal的逻辑:
到这里,JobClient与ResourceManager的交互逻辑介绍到这儿,那么,接下来该怎么做?
问题又来了,JobClient获取到了一个ApplicationId,那么,ResourceManager端,如何让该数据继续保持运行呢?其是如何继续申请资源,保持服务运行,一直到执行完毕的呢?
寻找一个切入点,琢磨琢磨。
status = submitClient.submitJob( jobId, submitJobDir.toString(), job.getCredentials());
在获取唯一的ApplicationId之后,我们发现,另一个执行的逻辑,把文件的目录,提交给了ResourceManager,让其继续执行程序,我们看下这个方法:
/** * Submit a Job for execution. Returns the latest profile for * that job. */ public JobStatus submitJob(JobID jobId, String jobSubmitDir, Credentials ts) throws IOException, InterruptedException;
注释很简单,把作业目录,作业的Application以及credentials信息,提交给RM,先看下client这边的处理逻辑:
@Override public JobStatus submitJob(JobID jobId, String jobSubmitDir, Credentials ts) throws IOException, InterruptedException { addHistoryToken(ts); // Construct necessary information to start the MR AM ApplicationSubmissionContext appContext = createApplicationSubmissionContext( conf, jobSubmitDir, ts); // Submit to ResourceManager try { ApplicationId applicationId = resMgrDelegate .submitApplication(appContext); ApplicationReport appMaster = resMgrDelegate .getApplicationReport(applicationId); String diagnostics = (appMaster == null ? "application report is null" : appMaster.getDiagnostics()); if (appMaster == null || appMaster.getYarnApplicationState() == YarnApplicationState.FAILED || appMaster.getYarnApplicationState() == YarnApplicationState.KILLED) { throw new IOException("Failed to run job : " + diagnostics); } return clientCache.getClient(jobId).getJobStatus(jobId); } catch (YarnException e) { throw new IOException(e); } }
这个方法内容比较重要,先来说下createApplicationSubmissionContext,这里面的代码非常多,位于YARNRunner,大家可以仔细看代码,这里挑选一些重要的来看:
插播一些内容;前文提到的第一次提交获取ApplicationId的时候,其实只提交了非常简单的ApplicationSubmissionContext的内容,告知ResourceManager,我这里有个新的任务需要提交,而这次,才是真正开始要开始任务的执行了,那么,需要上传的内容非常仔细了。
capability.setMemory(conf.getInt(MRJobConfig.MR_AM_VMEM_MB, MRJobConfig.DEFAULT_MR_AM_VMEM_MB)); capability.setVirtualCores(conf.getInt(MRJobConfig.MR_AM_CPU_VCORES, MRJobConfig.DEFAULT_MR_AM_CPU_VCORES));
这里有两个配置,分别是内存和需要的核数的配置,如果在命令行中指定的话,会写到conf中,并且在此处获取。
Path jobConfPath = new Path(jobSubmitDir, MRJobConfig.JOB_CONF_FILE);
提交文件的工作目录,这个目录的来源,可以追溯到JobSubmissionFiles中的getStagingDir方法:
/** * Initializes the staging directory and returns the path. It also * keeps track of all necessary ownership & permissions * @param cluster * @param conf */ public static Path getStagingDir(Cluster cluster, Configuration conf) throws IOException,InterruptedException {
余下逻辑可以参照JobSubmitter中的代码,大致意思是把此次所需要提交的文件目录的地址获取到。
localResources.put( MRJobConfig.JOB_CONF_FILE, createApplicationResource(defaultFileContext, jobConfPath, LocalResourceType.FILE));
通常来说,工作目录都是在HDFS上,然后加载到localResources中,这是个Map,没什么可说的。
// Setup the command to run the AM List<String> vargs = new ArrayList<String>(8); vargs.add(MRApps.crossPlatformifyMREnv(jobConf, Environment.JAVA_HOME) + "/bin/java"); // TODO: why do we use 'conf' some places and 'jobConf' others? long logSize = jobConf.getLong(MRJobConfig.MR_AM_LOG_KB, MRJobConfig.DEFAULT_MR_AM_LOG_KB) << 10; String logLevel = jobConf.get(MRJobConfig.MR_AM_LOG_LEVEL, MRJobConfig.DEFAULT_MR_AM_LOG_LEVEL); int numBackups = jobConf.getInt(MRJobConfig.MR_AM_LOG_BACKUPS, MRJobConfig.DEFAULT_MR_AM_LOG_BACKUPS); MRApps.addLog4jSystemProperties(logLevel, logSize, numBackups, vargs, conf); // Check for Java Lib Path usage in MAP and REDUCE configs warnForJavaLibPath(conf.get(MRJobConfig.MAP_JAVA_OPTS, ""), "map", MRJobConfig.MAP_JAVA_OPTS, MRJobConfig.MAP_ENV); warnForJavaLibPath( conf.get(MRJobConfig.MAPRED_MAP_ADMIN_JAVA_OPTS, ""), "map", MRJobConfig.MAPRED_MAP_ADMIN_JAVA_OPTS, MRJobConfig.MAPRED_ADMIN_USER_ENV); warnForJavaLibPath(conf.get(MRJobConfig.REDUCE_JAVA_OPTS, ""), "reduce", MRJobConfig.REDUCE_JAVA_OPTS, MRJobConfig.REDUCE_ENV); warnForJavaLibPath( conf.get(MRJobConfig.MAPRED_REDUCE_ADMIN_JAVA_OPTS, ""), "reduce", MRJobConfig.MAPRED_REDUCE_ADMIN_JAVA_OPTS, MRJobConfig.MAPRED_ADMIN_USER_ENV); // Add AM admin command opts before user command opts // so that it can be overridden by user String mrAppMasterAdminOptions = conf.get( MRJobConfig.MR_AM_ADMIN_COMMAND_OPTS, MRJobConfig.DEFAULT_MR_AM_ADMIN_COMMAND_OPTS); warnForJavaLibPath(mrAppMasterAdminOptions, "app master", MRJobConfig.MR_AM_ADMIN_COMMAND_OPTS, MRJobConfig.MR_AM_ADMIN_USER_ENV); vargs.add(mrAppMasterAdminOptions); // Add AM user command opts String mrAppMasterUserOptions = conf.get( MRJobConfig.MR_AM_COMMAND_OPTS, MRJobConfig.DEFAULT_MR_AM_COMMAND_OPTS); warnForJavaLibPath(mrAppMasterUserOptions, "app master", MRJobConfig.MR_AM_COMMAND_OPTS, MRJobConfig.MR_AM_ENV); vargs.add(mrAppMasterUserOptions);
这一段代码,根据Conf和Job自身的Conf,设置好ApplicationMaster启动的相应参数,这里面有一句:
vargs.add(MRJobConfig.APPLICATION_MASTER_CLASS);
这里添加的是:org.apache.hadoop.mapreduce.v2.app.MRAppMaster
这里就牵涉到系列一中提到的ApplicationMaster,对于每个提交的作业来说,都有其ApplicationMaster,我们在启动程序的时候可以注意到这句话:
LOG.debug("Command to launch container for ApplicationMaster is : " + mergedCommand);
日志中会打印执行的命令。
ContainerLaunchContext amContainer = ContainerLaunchContext .newInstance(localResources, environment, vargsFinal, null, securityTokens, acls);
这里,必须重点看一下,我们需要看看这个类和这个方法:
/** * <p><code>ContainerLaunchContext</code> represents all of the information * needed by the <code>NodeManager</code> to launch a container.</p> * * <p>It includes details such as: * <ul> * <li>{@link ContainerId} of the container.</li> * <li>{@link Resource} allocated to the container.</li> * <li>User to whom the container is allocated.</li> * <li>Security tokens (if security is enabled).</li> * <li> * {@link LocalResource} necessary for running the container such * as binaries, jar, shared-objects, side-files etc. * </li> * <li>Optional, application-specific binary service data.</li> * <li>Environment variables for the launched process.</li> * <li>Command to launch the container.</li> * </ul> * </p> * * @see ContainerManagementProtocol#startContainers(org.apache.hadoop.yarn.api.protocolrecords.StartContainersRequest) */ @Public @Stable public abstract class ContainerLaunchContext
看看这个类的注释,其中包含了所有的信息,可以让NodeManager来提供一个Container,来把ApplicationMaster启动起来。
看到这里的代码,其实更好理解Container的动态划分的概念了,对于Container资源的占用,其实是通过服务启动来占用的,我们启动了一个ApplicationMaster,来实现了资源的占用。
最后,我们建立了自己的ApplicationMaster启动的相关参数,并且定义了一个Container,然后返回了一个ApplicationSubmissionContext。
然后,通过RPC把这个ApplicationSubmissionContext提交给RM。
@Override public SubmitApplicationResponse submitApplication( SubmitApplicationRequest request) throws YarnException, IOException { SubmitApplicationRequestProto requestProto = ((SubmitApplicationRequestPBImpl) request).getProto(); try { return new SubmitApplicationResponsePBImpl(proxy.submitApplication(null, requestProto)); } catch (ServiceException e) { RPCUtil.unwrapAndThrowException(e); return null; } }
代码很清晰,我们看下RM端是如何对这个ApplicationSubmissionContext处理的。
@Override public SubmitApplicationResponse submitApplication( SubmitApplicationRequest request) throws YarnException { ApplicationSubmissionContext submissionContext = request .getApplicationSubmissionContext(); ApplicationId applicationId = submissionContext.getApplicationId(); // ApplicationSubmissionContext needs to be validated for safety - only // those fields that are independent of the RM's configuration will be // checked here, those that are dependent on RM configuration are validated // in RMAppManager. String user = null; try { // Safety user = UserGroupInformation.getCurrentUser().getShortUserName(); } catch (IOException ie) { LOG.warn("Unable to get the current user.", ie); RMAuditLogger.logFailure(user, AuditConstants.SUBMIT_APP_REQUEST, ie.getMessage(), "ClientRMService", "Exception in submitting application", applicationId); throw RPCUtil.getRemoteException(ie); } // Check whether app has already been put into rmContext, // If it is, simply return the response if (rmContext.getRMApps().get(applicationId) != null) { LOG.info("This is an earlier submitted application: " + applicationId); return SubmitApplicationResponse.newInstance(); } if (submissionContext.getQueue() == null) { submissionContext.setQueue(YarnConfiguration.DEFAULT_QUEUE_NAME); } if (submissionContext.getApplicationName() == null) { submissionContext.setApplicationName( YarnConfiguration.DEFAULT_APPLICATION_NAME); } if (submissionContext.getApplicationType() == null) { submissionContext .setApplicationType(YarnConfiguration.DEFAULT_APPLICATION_TYPE); } else { if (submissionContext.getApplicationType().length() > YarnConfiguration.APPLICATION_TYPE_LENGTH) { submissionContext.setApplicationType(submissionContext .getApplicationType().substring(0, YarnConfiguration.APPLICATION_TYPE_LENGTH)); } } try { // call RMAppManager to submit application directly rmAppManager.submitApplication(submissionContext, System.currentTimeMillis(), user); LOG.info("Application with id " + applicationId.getId() + " submitted by user " + user); RMAuditLogger.logSuccess(user, AuditConstants.SUBMIT_APP_REQUEST, "ClientRMService", applicationId); } catch (YarnException e) { LOG.info("Exception in submitting application with id " + applicationId.getId(), e); RMAuditLogger.logFailure(user, AuditConstants.SUBMIT_APP_REQUEST, e.getMessage(), "ClientRMService", "Exception in submitting application", applicationId); throw e; } SubmitApplicationResponse response = recordFactory .newRecordInstance(SubmitApplicationResponse.class); return response; }
经过了一系列的检验,我们把注意力集中在rmAppManager提交的操作上。
这里的RMAppManager中,记录了所有提交给RM的Application。
而具体的,RM如何与NM沟通,并把ApplicationMaster在NM上的Container启动起来,下文再叙。