关于Yarn源码的那些事（三）

接系列（二）。

介绍完ApplicationSubmissionContext之后，继续下去：

@Override
  public YarnClientApplication createApplication()
      throws YarnException, IOException {
    ApplicationSubmissionContext context = Records.newRecord
        (ApplicationSubmissionContext.class);
    GetNewApplicationResponse newApp = getNewApplication();
    ApplicationId appId = newApp.getApplicationId();
    context.setApplicationId(appId);
    return new YarnClientApplication(newApp, context);
  }

其实这里有一个问题，需要深入探索下newRecord方法，此处就不多写了，另外撰文再说。

先看看getNewApplication方法：

private GetNewApplicationResponse getNewApplication()
      throws YarnException, IOException {
    GetNewApplicationRequest request =
        Records.newRecord(GetNewApplicationRequest.class);
    return rmClient.getNewApplication(request);
  }

果然，不出意料，利用建立好的rpc连接来实现。

这里，补上一点东西，具体的rmClient的实现是什么？因为ApplicationClientProtocol有很多的实现类，在我们提交任务的过程中，具体是哪一个发挥了作用？

/**
	 * Delegate responsible for communicating with the Resource Manager's
	 * {@link ApplicationClientProtocol}.
	 * 
	 * @param conf
	 *            the configuration object.
	 */
	public ResourceMgrDelegate(YarnConfiguration conf) {
		super(ResourceMgrDelegate.class.getName());
		this.conf = conf;
		this.client = YarnClient.createYarnClient();
		init(conf);
		start();
	}

仔细看这个构造器，发现，这里传入的其实是YarnConfiguration，而不是我们前面所使用的Configuration，点进去看一下，铺天盖地全是相关的配置：

@Override
  protected void serviceStart() throws Exception {
    try {
      rmClient = ClientRMProxy.createRMProxy(getConfig(),
          ApplicationClientProtocol.class);
      if (historyServiceEnabled) {
        historyClient.start();
      }
      if (timelineServiceEnabled) {
        timelineClient.start();
      }
    } catch (IOException e) {
      throw new YarnRuntimeException(e);
    }
    super.serviceStart();
  }

而这里，服务启动时候传入的，也是YarnConfiguration，看看这个createRMProxy方法：

/**
   * Create a proxy for the specified protocol. For non-HA,
   * this is a direct connection to the ResourceManager address. When HA is
   * enabled, the proxy handles the failover between the ResourceManagers as
   * well.
   */
  @Private
  protected static <T> T createRMProxy(final Configuration configuration,
      final Class<T> protocol, RMProxy instance) throws IOException {
    YarnConfiguration conf = (configuration instanceof YarnConfiguration)
        ? (YarnConfiguration) configuration
        : new YarnConfiguration(configuration);
    RetryPolicy retryPolicy = createRetryPolicy(conf);
    if (HAUtil.isHAEnabled(conf)) {
      RMFailoverProxyProvider<T> provider =
          instance.createRMFailoverProxyProvider(conf, protocol);
      return (T) RetryProxy.create(protocol, provider, retryPolicy);
    } else {
      InetSocketAddress rmAddress = instance.getRMAddress(conf, protocol);
      LOG.info("Connecting to ResourceManager at " + rmAddress);
      T proxy = RMProxy.<T>getProxy(conf, protocol, rmAddress);
      return (T) RetryProxy.create(protocol, proxy, retryPolicy);
    }
  }

更加具体的实现，此处不予赘述了，因为YarnConfiguration的存在，最终加载了的的实现类是：

ApplicationClientProtocolPBClientImpl。

@Override
  public GetNewApplicationResponse getNewApplication(
      GetNewApplicationRequest request) throws YarnException,
      IOException {
    GetNewApplicationRequestProto requestProto =
        ((GetNewApplicationRequestPBImpl) request).getProto();
    try {
      return new GetNewApplicationResponsePBImpl(proxy.getNewApplication(null,
        requestProto));
    } catch (ServiceException e) {
      RPCUtil.unwrapAndThrowException(e);
      return null;
    }
  }

这里，我们能看到对request做了序列化，采用的是ProtocolBuffer的序列化机制。

到这里，client的代码告一段落，我们看下服务器端对应的处理是怎么做的：

/**
 * The client interface to the Resource Manager. This module handles all the rpc
 * interfaces to the resource manager from the client.
 */
public class ClientRMService extends AbstractService implements
    ApplicationClientProtocol

这个类，在org.apache.yarn.server.resourcemanager下面，是ResourceManager对应处理的逻辑所在地，我们看下其中getNewApplication的处理机制。

@Override
  public GetNewApplicationResponse getNewApplication(
      GetNewApplicationRequest request) throws YarnException {
    GetNewApplicationResponse response = recordFactory
        .newRecordInstance(GetNewApplicationResponse.class);
    response.setApplicationId(getNewApplicationId());
    // Pick up min/max resource from scheduler...
    response.setMaximumResourceCapability(scheduler
        .getMaximumResourceCapability());       
    
    return response;
  }

这里，看下getNewApplicationId方法：

 ApplicationId getNewApplicationId() {
    ApplicationId applicationId = org.apache.hadoop.yarn.server.utils.BuilderUtils
        .newApplicationId(recordFactory, ResourceManager.getClusterTimeStamp(),
            applicationCounter.incrementAndGet());
    LOG.info("Allocated new applicationId: " + applicationId.getId());
    return applicationId;
  }

这样，对于新建的一个Application就获取到一个新的id了，

/**
 * <p><code>ApplicationId</code> represents the <em>globally unique</em> 
 * identifier for an application.</p>
 * 
 * <p>The globally unique nature of the identifier is achieved by using the 
 * <em>cluster timestamp</em> i.e. start-time of the 
 * <code>ResourceManager</code> along with a monotonically increasing counter
 * for the application.</p>
 */
@Public
@Stable
public abstract class ApplicationId implements Comparable<ApplicationId>

这里，我们可以看下ApplicationId的生成逻辑，是如何生成一个全局唯一的ApplicationId:

 @Private
  @Unstable
  public static ApplicationId newInstance(long clusterTimestamp, int id) {
    ApplicationId appId = Records.newRecord(ApplicationId.class);
    appId.setClusterTimestamp(clusterTimestamp);
    appId.setId(id);
    appId.build();
    return appId;
  }

逻辑很简单，重要的是依旧采用了protocol buffer的机制，就是其中的build方法，这里是2.x.x后续版本的一大特点，序列化采用了Google的Protocol Buffer机制。

执行完毕之后，客户端得到了一个全局唯一的ApplicationId，接下来，我们继续看JobSubmitter中submitJobInternal的逻辑：

到这里，JobClient与ResourceManager的交互逻辑介绍到这儿，那么，接下来该怎么做？

问题又来了，JobClient获取到了一个ApplicationId，那么，ResourceManager端，如何让该数据继续保持运行呢？其是如何继续申请资源，保持服务运行，一直到执行完毕的呢？

寻找一个切入点，琢磨琢磨。

 status = submitClient.submitJob(
          jobId, submitJobDir.toString(), job.getCredentials());

在获取唯一的ApplicationId之后，我们发现，另一个执行的逻辑，把文件的目录，提交给了ResourceManager，让其继续执行程序，我们看下这个方法：

/**
   * Submit a Job for execution.  Returns the latest profile for
   * that job.
   */
  public JobStatus submitJob(JobID jobId, String jobSubmitDir, Credentials ts)
      throws IOException, InterruptedException;

注释很简单，把作业目录，作业的Application以及credentials信息，提交给RM，先看下client这边的处理逻辑：

@Override
	public JobStatus submitJob(JobID jobId, String jobSubmitDir, Credentials ts)
			throws IOException, InterruptedException {

		addHistoryToken(ts);

		// Construct necessary information to start the MR AM
		ApplicationSubmissionContext appContext = createApplicationSubmissionContext(
				conf, jobSubmitDir, ts);

		// Submit to ResourceManager
		try {
			ApplicationId applicationId = resMgrDelegate
					.submitApplication(appContext);
			ApplicationReport appMaster = resMgrDelegate
					.getApplicationReport(applicationId);
			String diagnostics = (appMaster == null ? "application report is null"
					: appMaster.getDiagnostics());
			if (appMaster == null
					|| appMaster.getYarnApplicationState() == YarnApplicationState.FAILED
					|| appMaster.getYarnApplicationState() == YarnApplicationState.KILLED) {
				throw new IOException("Failed to run job : " + diagnostics);
			}
			return clientCache.getClient(jobId).getJobStatus(jobId);
		} catch (YarnException e) {
			throw new IOException(e);
		}
	}

这个方法内容比较重要，先来说下createApplicationSubmissionContext，这里面的代码非常多，位于YARNRunner，大家可以仔细看代码，这里挑选一些重要的来看：

插播一些内容；前文提到的第一次提交获取ApplicationId的时候，其实只提交了非常简单的ApplicationSubmissionContext的内容，告知ResourceManager，我这里有个新的任务需要提交，而这次，才是真正开始要开始任务的执行了，那么，需要上传的内容非常仔细了。

capability.setMemory(conf.getInt(MRJobConfig.MR_AM_VMEM_MB,
				MRJobConfig.DEFAULT_MR_AM_VMEM_MB));
		capability.setVirtualCores(conf.getInt(MRJobConfig.MR_AM_CPU_VCORES,
				MRJobConfig.DEFAULT_MR_AM_CPU_VCORES));

这里有两个配置，分别是内存和需要的核数的配置，如果在命令行中指定的话，会写到conf中，并且在此处获取。

Path jobConfPath = new Path(jobSubmitDir, MRJobConfig.JOB_CONF_FILE);

提交文件的工作目录，这个目录的来源，可以追溯到JobSubmissionFiles中的getStagingDir方法：

/**
   * Initializes the staging directory and returns the path. It also
   * keeps track of all necessary ownership & permissions
   * @param cluster
   * @param conf
   */
  public static Path getStagingDir(Cluster cluster, Configuration conf) 
  throws IOException,InterruptedException {

余下逻辑可以参照JobSubmitter中的代码，大致意思是把此次所需要提交的文件目录的地址获取到。

localResources.put(
				MRJobConfig.JOB_CONF_FILE,
				createApplicationResource(defaultFileContext, jobConfPath,
						LocalResourceType.FILE));

通常来说，工作目录都是在HDFS上，然后加载到localResources中，这是个Map，没什么可说的。

// Setup the command to run the AM
		List<String> vargs = new ArrayList<String>(8);
		vargs.add(MRApps.crossPlatformifyMREnv(jobConf, Environment.JAVA_HOME)
				+ "/bin/java");

		// TODO: why do we use 'conf' some places and 'jobConf' others?
		long logSize = jobConf.getLong(MRJobConfig.MR_AM_LOG_KB,
				MRJobConfig.DEFAULT_MR_AM_LOG_KB) << 10;
		String logLevel = jobConf.get(MRJobConfig.MR_AM_LOG_LEVEL,
				MRJobConfig.DEFAULT_MR_AM_LOG_LEVEL);
		int numBackups = jobConf.getInt(MRJobConfig.MR_AM_LOG_BACKUPS,
				MRJobConfig.DEFAULT_MR_AM_LOG_BACKUPS);
		MRApps.addLog4jSystemProperties(logLevel, logSize, numBackups, vargs,
				conf);

		// Check for Java Lib Path usage in MAP and REDUCE configs
		warnForJavaLibPath(conf.get(MRJobConfig.MAP_JAVA_OPTS, ""), "map",
				MRJobConfig.MAP_JAVA_OPTS, MRJobConfig.MAP_ENV);
		warnForJavaLibPath(
				conf.get(MRJobConfig.MAPRED_MAP_ADMIN_JAVA_OPTS, ""), "map",
				MRJobConfig.MAPRED_MAP_ADMIN_JAVA_OPTS,
				MRJobConfig.MAPRED_ADMIN_USER_ENV);
		warnForJavaLibPath(conf.get(MRJobConfig.REDUCE_JAVA_OPTS, ""),
				"reduce", MRJobConfig.REDUCE_JAVA_OPTS, MRJobConfig.REDUCE_ENV);
		warnForJavaLibPath(
				conf.get(MRJobConfig.MAPRED_REDUCE_ADMIN_JAVA_OPTS, ""),
				"reduce", MRJobConfig.MAPRED_REDUCE_ADMIN_JAVA_OPTS,
				MRJobConfig.MAPRED_ADMIN_USER_ENV);

		// Add AM admin command opts before user command opts
		// so that it can be overridden by user
		String mrAppMasterAdminOptions = conf.get(
				MRJobConfig.MR_AM_ADMIN_COMMAND_OPTS,
				MRJobConfig.DEFAULT_MR_AM_ADMIN_COMMAND_OPTS);
		warnForJavaLibPath(mrAppMasterAdminOptions, "app master",
				MRJobConfig.MR_AM_ADMIN_COMMAND_OPTS,
				MRJobConfig.MR_AM_ADMIN_USER_ENV);
		vargs.add(mrAppMasterAdminOptions);

		// Add AM user command opts
		String mrAppMasterUserOptions = conf.get(
				MRJobConfig.MR_AM_COMMAND_OPTS,
				MRJobConfig.DEFAULT_MR_AM_COMMAND_OPTS);
		warnForJavaLibPath(mrAppMasterUserOptions, "app master",
				MRJobConfig.MR_AM_COMMAND_OPTS, MRJobConfig.MR_AM_ENV);
		vargs.add(mrAppMasterUserOptions);

这一段代码，根据Conf和Job自身的Conf，设置好ApplicationMaster启动的相应参数，这里面有一句：

		vargs.add(MRJobConfig.APPLICATION_MASTER_CLASS);

这里添加的是：org.apache.hadoop.mapreduce.v2.app.MRAppMaster

这里就牵涉到系列一中提到的ApplicationMaster，对于每个提交的作业来说，都有其ApplicationMaster，我们在启动程序的时候可以注意到这句话：

LOG.debug("Command to launch container for ApplicationMaster is : "
				+ mergedCommand);

日志中会打印执行的命令。

ContainerLaunchContext amContainer = ContainerLaunchContext
				.newInstance(localResources, environment, vargsFinal, null,
						securityTokens, acls);

这里，必须重点看一下，我们需要看看这个类和这个方法：

/**
 * <p><code>ContainerLaunchContext</code> represents all of the information
 * needed by the <code>NodeManager</code> to launch a container.</p>
 * 
 * <p>It includes details such as:
 *   <ul>
 *     <li>{@link ContainerId} of the container.</li>
 *     <li>{@link Resource} allocated to the container.</li>
 *     <li>User to whom the container is allocated.</li>
 *     <li>Security tokens (if security is enabled).</li>
 *     <li>
 *       {@link LocalResource} necessary for running the container such
 *       as binaries, jar, shared-objects, side-files etc. 
 *     </li>
 *     <li>Optional, application-specific binary service data.</li>
 *     <li>Environment variables for the launched process.</li>
 *     <li>Command to launch the container.</li>
 *   </ul>
 * </p>
 * 
 * @see ContainerManagementProtocol#startContainers(org.apache.hadoop.yarn.api.protocolrecords.StartContainersRequest)
 */
@Public
@Stable
public abstract class ContainerLaunchContext

看看这个类的注释，其中包含了所有的信息，可以让NodeManager来提供一个Container，来把ApplicationMaster启动起来。

看到这里的代码，其实更好理解Container的动态划分的概念了，对于Container资源的占用，其实是通过服务启动来占用的，我们启动了一个ApplicationMaster，来实现了资源的占用。

最后，我们建立了自己的ApplicationMaster启动的相关参数，并且定义了一个Container，然后返回了一个ApplicationSubmissionContext。

然后，通过RPC把这个ApplicationSubmissionContext提交给RM。

@Override
  public SubmitApplicationResponse submitApplication(
      SubmitApplicationRequest request) throws YarnException,
      IOException {
    SubmitApplicationRequestProto requestProto =
        ((SubmitApplicationRequestPBImpl) request).getProto();
    try {
      return new SubmitApplicationResponsePBImpl(proxy.submitApplication(null,
        requestProto));
    } catch (ServiceException e) {
      RPCUtil.unwrapAndThrowException(e);
      return null;
    }
  }

代码很清晰，我们看下RM端是如何对这个ApplicationSubmissionContext处理的。

@Override
  public SubmitApplicationResponse submitApplication(
      SubmitApplicationRequest request) throws YarnException {
    ApplicationSubmissionContext submissionContext = request
        .getApplicationSubmissionContext();
    ApplicationId applicationId = submissionContext.getApplicationId();

    // ApplicationSubmissionContext needs to be validated for safety - only
    // those fields that are independent of the RM's configuration will be
    // checked here, those that are dependent on RM configuration are validated
    // in RMAppManager.

    String user = null;
    try {
      // Safety
      user = UserGroupInformation.getCurrentUser().getShortUserName();
    } catch (IOException ie) {
      LOG.warn("Unable to get the current user.", ie);
      RMAuditLogger.logFailure(user, AuditConstants.SUBMIT_APP_REQUEST,
          ie.getMessage(), "ClientRMService",
          "Exception in submitting application", applicationId);
      throw RPCUtil.getRemoteException(ie);
    }

    // Check whether app has already been put into rmContext,
    // If it is, simply return the response
    if (rmContext.getRMApps().get(applicationId) != null) {
      LOG.info("This is an earlier submitted application: " + applicationId);
      return SubmitApplicationResponse.newInstance();
    }

    if (submissionContext.getQueue() == null) {
      submissionContext.setQueue(YarnConfiguration.DEFAULT_QUEUE_NAME);
    }
    if (submissionContext.getApplicationName() == null) {
      submissionContext.setApplicationName(
          YarnConfiguration.DEFAULT_APPLICATION_NAME);
    }
    if (submissionContext.getApplicationType() == null) {
      submissionContext
        .setApplicationType(YarnConfiguration.DEFAULT_APPLICATION_TYPE);
    } else {
      if (submissionContext.getApplicationType().length() > YarnConfiguration.APPLICATION_TYPE_LENGTH) {
        submissionContext.setApplicationType(submissionContext
          .getApplicationType().substring(0,
            YarnConfiguration.APPLICATION_TYPE_LENGTH));
      }
    }

    try {
      // call RMAppManager to submit application directly
      rmAppManager.submitApplication(submissionContext,
          System.currentTimeMillis(), user);

      LOG.info("Application with id " + applicationId.getId() + 
          " submitted by user " + user);
      RMAuditLogger.logSuccess(user, AuditConstants.SUBMIT_APP_REQUEST,
          "ClientRMService", applicationId);
    } catch (YarnException e) {
      LOG.info("Exception in submitting application with id " +
          applicationId.getId(), e);
      RMAuditLogger.logFailure(user, AuditConstants.SUBMIT_APP_REQUEST,
          e.getMessage(), "ClientRMService",
          "Exception in submitting application", applicationId);
      throw e;
    }

    SubmitApplicationResponse response = recordFactory
        .newRecordInstance(SubmitApplicationResponse.class);
    return response;
  }

经过了一系列的检验，我们把注意力集中在rmAppManager提交的操作上。

这里的RMAppManager中，记录了所有提交给RM的Application。

而具体的，RM如何与NM沟通，并把ApplicationMaster在NM上的Container启动起来，下文再叙。

关于Yarn源码的那些事（三）

猜你喜欢