关于Yarn源码的那些事（二）

本文，讨论一下提交的MapReduce作业，究竟是如何运行起来的？

本文会尽可能解决系列（一）中提出的那些问题，并且提出一些新的问题。

1：我们提交的MapReduce程序，到底是如何运行在Yarn框架上的？

这个问题，一点点来定位。

首先，我们需要在mapred-site.xml配置文件中指定mapreduce.framework.name，并且将其值指定为yarn，毫无疑问，这个值会在我们后来任务的运行过程中使用到：

<configuration>
        <property>
                <name>mapreduce.framework.name</name>
                <value>yarn</value>
        </property>
</configuration>

在这里指定了使用Yarn的资源调度框架之后，我们找点别的代码研究下：

追本溯源，即便是在2.X.X版本的Hadoop中，我们提交的任务依旧是通过Hadoop jar方式提交的，那么，看下这个hadoop的脚本中到底都写了什么：

elif [ "$COMMAND" = "jar" ] ; then
      CLASS=org.apache.hadoop.util.RunJar

接下来看下RunJar这个类，看看其中的main方法：

一路下来，看到其中通过反射跑的是jar包中的Main-class中的Main方法；代码如下：

/**
	 * Run a Hadoop job jar. If the main class is not in the jar's manifest,
	 * then it must be provided on the command line.
	 */
	public static void main(String[] args) throws Throwable {
		new RunJar().run(args);
	}

Manifest manifest = jarFile.getManifest();
		if (manifest != null) {
			mainClassName = manifest.getMainAttributes().getValue("Main-Class");
		}
		jarFile.close();

Thread.currentThread().setContextClassLoader(loader);
		Class<?> mainClass = Class.forName(mainClassName, true, loader);
		Method main = mainClass.getMethod("main", new Class[] { Array
				.newInstance(String.class, 0).getClass() });
		String[] newArgs = Arrays.asList(args).subList(firstArg, args.length)
				.toArray(new String[0]);
		try {
			main.invoke(null, new Object[] { newArgs });
		} catch (InvocationTargetException e) {
			throw e.getTargetException();
		}

代码全在RunJar内，不予赘述。

那么，既然跑的是Main方法，我们以MapReduce-examples中的那些例子为模板，以其中的WordCount为基础，琢磨一下：

System.exit(job.waitForCompletion(true) ? 0 : 1);

看到了这句话，接下来看看waitForCompletion方法：

/**
	 * Submit the job to the cluster and wait for it to finish.
	 * 
	 * @param verbose
	 *            print the progress to the user
	 * @return true if the job succeeded
	 * @throws IOException
	 *             thrown if the communication with the <code>JobTracker</code>
	 *             is lost
	 */
	public boolean waitForCompletion(boolean verbose) throws IOException,
			InterruptedException, ClassNotFoundException {
		if (state == JobState.DEFINE) {
			submit();
		}
		if (verbose) {
			monitorAndPrintJob();
		} else {
			// get the completion poll interval from the client.
			int completionPollIntervalMillis = Job
					.getCompletionPollInterval(cluster.getConf());
			while (!isComplete()) {
				try {
					Thread.sleep(completionPollIntervalMillis);
				} catch (InterruptedException ie) {
				}
			}
		}
		return isSuccessful();
	}

这里提一下，Hadoop 2.x.x版本，采用的是状态机与服务库的设计，每一个部件都有自己特定的状态，根据不同的事件来进行状态的不断转换，这样看代码也比较清晰了。

从这里，发现需要看下submit方法：

/**
	 * Submit the job to the cluster and return immediately.
	 * 
	 * @throws IOException
	 */
	public void submit() throws IOException, InterruptedException,
			ClassNotFoundException {
		ensureState(JobState.DEFINE);
		setUseNewAPI();
		connect();
		final JobSubmitter submitter = getJobSubmitter(cluster.getFileSystem(),
				cluster.getClient());
		status = ugi.doAs(new PrivilegedExceptionAction<JobStatus>() {
			public JobStatus run() throws IOException, InterruptedException,
					ClassNotFoundException {
				return submitter.submitJobInternal(Job.this, cluster);
			}
		});
		state = JobState.RUNNING;
		LOG.info("The url to track the job: " + getTrackingURL());
	}

可以看出这是个异步调用的方法，立刻返回，这里，重要的是JobSubmitter，是其提供了提交任务的操作，我们先看下这个submitter的初始化，研究下getJobSubmitter方法：

一路下来，到了这里：

JobSubmitter(FileSystem submitFs, ClientProtocol submitClient)
			throws IOException {
		this.submitClient = submitClient;
		this.jtFs = submitFs;
	}

看起来好像是根据指定的FileSystem和submitClient来生成JobSubmitter，那么，我们这里提交任务的时候，使用的FileSystem是从cluster中得到的，所以需要看下这个cluster：

而实际上，这个Cluster是从Job里得到的，是Job自身的一个成员变量，更进一步，其实是在Job初始化的时候得到的：

Job(JobConf conf) throws IOException {
		super(conf, null);
		// propagate existing user credentials to job
		this.credentials.mergeAll(this.ugi.getCredentials());
		this.cluster = null;
	}

在初始化的时候，我们发现，这个cluster传入的实际上是个空。

那么，到底是在什么时候这个cluster有所改变呢？

往前看一下，注意这两个方法：

setUseNewAPI();
connect();

一个一个看：

/**
	 * Default to the new APIs unless they are explicitly set or the old mapper
	 * or reduce attributes are used.
	 * 
	 * @throws IOException
	 *             if the configuration is inconsistant
	 */
	private void setUseNewAPI() throws IOException {
		int numReduces = conf.getNumReduceTasks();
		String oldMapperClass = "mapred.mapper.class";
		String oldReduceClass = "mapred.reducer.class";
		conf.setBooleanIfUnset("mapred.mapper.new-api",
				conf.get(oldMapperClass) == null);
		if (conf.getUseNewMapper()) {
			String mode = "new map API";
			ensureNotSet("mapred.input.format.class", mode);
			ensureNotSet(oldMapperClass, mode);
			if (numReduces != 0) {
				ensureNotSet("mapred.partitioner.class", mode);
			} else {
				ensureNotSet("mapred.output.format.class", mode);
			}
		} else {
			String mode = "map compatability";
			ensureNotSet(INPUT_FORMAT_CLASS_ATTR, mode);
			ensureNotSet(MAP_CLASS_ATTR, mode);
			if (numReduces != 0) {
				ensureNotSet(PARTITIONER_CLASS_ATTR, mode);
			} else {
				ensureNotSet(OUTPUT_FORMAT_CLASS_ATTR, mode);
			}
		}
		if (numReduces != 0) {
			conf.setBooleanIfUnset("mapred.reducer.new-api",
					conf.get(oldReduceClass) == null);
			if (conf.getUseNewReducer()) {
				String mode = "new reduce API";
				ensureNotSet("mapred.output.format.class", mode);
				ensureNotSet(oldReduceClass, mode);
			} else {
				String mode = "reduce compatability";
				ensureNotSet(OUTPUT_FORMAT_CLASS_ATTR, mode);
				ensureNotSet(REDUCE_CLASS_ATTR, mode);
			}
		}
	}

这里，能看出来，默认使用的是新的Api，而且我在配置文件里并未指定，所以默认使用全新的Api，接着看connect方法：

private synchronized void connect() throws IOException,
			InterruptedException, ClassNotFoundException {
		if (cluster == null) {
			cluster = ugi.doAs(new PrivilegedExceptionAction<Cluster>() {
				public Cluster run() throws IOException, InterruptedException,
						ClassNotFoundException {
					return new Cluster(getConfiguration());
				}
			});
		}
	}

可以看到，connect方法中，通过Configuration新建了一个Cluster，接下来，看下Cluster的初始化：

public Cluster(InetSocketAddress jobTrackAddr, Configuration conf)
			throws IOException {
		this.conf = conf;
		this.ugi = UserGroupInformation.getCurrentUser();
		initialize(jobTrackAddr, conf);
	}

重点在initialize方法：

private void initialize(InetSocketAddress jobTrackAddr, Configuration conf)
			throws IOException {

		synchronized (frameworkLoader) {
			for (ClientProtocolProvider provider : frameworkLoader) {
				LOG.debug("Trying ClientProtocolProvider : "
						+ provider.getClass().getName());
				ClientProtocol clientProtocol = null;
				try {
					if (jobTrackAddr == null) {
						clientProtocol = provider.create(conf);
					} else {
						clientProtocol = provider.create(jobTrackAddr, conf);
					}

					if (clientProtocol != null) {
						clientProtocolProvider = provider;
						client = clientProtocol;
						LOG.debug("Picked " + provider.getClass().getName()
								+ " as the ClientProtocolProvider");
						break;
					} else {
						LOG.debug("Cannot pick "
								+ provider.getClass().getName()
								+ " as the ClientProtocolProvider - returned null protocol");
					}
				} catch (Exception e) {
					LOG.info("Failed to use " + provider.getClass().getName()
							+ " due to error: " + e.getMessage());
				}
			}
		}

		if (null == clientProtocolProvider || null == client) {
			throw new IOException(
					"Cannot initialize Cluster. Please check your configuration for "
							+ MRConfig.FRAMEWORK_NAME
							+ " and the correspond server addresses.");
		}
	}

我们这里的jobTrackAddr是null，所以由provider（type：ClientProtocolProvider）来创建ClientProtocol，看看其中的create方法：

private void initialize(InetSocketAddress jobTrackAddr, Configuration conf)
			throws IOException {

		synchronized (frameworkLoader) {
			for (ClientProtocolProvider provider : frameworkLoader) {
				LOG.debug("Trying ClientProtocolProvider : "
						+ provider.getClass().getName());
				ClientProtocol clientProtocol = null;
				try {
					if (jobTrackAddr == null) {
						clientProtocol = provider.create(conf);
					} else {
						clientProtocol = provider.create(jobTrackAddr, conf);
					}

					if (clientProtocol != null) {
						clientProtocolProvider = provider;
						client = clientProtocol;
						LOG.debug("Picked " + provider.getClass().getName()
								+ " as the ClientProtocolProvider");
						break;
					} else {
						LOG.debug("Cannot pick "
								+ provider.getClass().getName()
								+ " as the ClientProtocolProvider - returned null protocol");
					}
				} catch (Exception e) {
					LOG.info("Failed to use " + provider.getClass().getName()
							+ " due to error: " + e.getMessage());
				}
			}
		}

		if (null == clientProtocolProvider || null == client) {
			throw new IOException(
					"Cannot initialize Cluster. Please check your configuration for "
							+ MRConfig.FRAMEWORK_NAME
							+ " and the correspond server addresses.");
		}
	}

这里，返回了一个clientProtocol，查看源码发现，ClientProtocolProvider的实现只有：

LocalClientProtocolProvider
YarnClientProtocolProvider

@Override
	public ClientProtocol create(Configuration conf) throws IOException {
		if (MRConfig.YARN_FRAMEWORK_NAME.equals(conf
				.get(MRConfig.FRAMEWORK_NAME))) {
			return new YARNRunner(conf);
		}
		return null;
	}

其中实现的create方法是一样的，需要根据Conf中的MRConfig.FRAMEWORK_NAME来寻找对应的clientProtocol，看下MRCONFIG是什么吧，源码在org.apache.hadoop。

	public static final String FRAMEWORK_NAME = "mapreduce.framework.name";

该类中定义了很多MapReduce中用到的常量，然后我们看到了这个，一切，瞬间清晰明了：因为我们定义了mapreduce.framework.name=yarn，所以，追本溯源，得到的clientProtocol为YARNRunner，其代替我们把任务提交给了Yarn集群。

继续我们的探索之路：

/**
	 * Submit the job to the cluster and return immediately.
	 * 
	 * @throws IOException
	 */
	public void submit() throws IOException, InterruptedException,
			ClassNotFoundException {
		ensureState(JobState.DEFINE);
		setUseNewAPI();
		connect();
		final JobSubmitter submitter = getJobSubmitter(cluster.getFileSystem(),
				cluster.getClient());
		status = ugi.doAs(new PrivilegedExceptionAction<JobStatus>() {
			public JobStatus run() throws IOException, InterruptedException,
					ClassNotFoundException {
				return submitter.submitJobInternal(Job.this, cluster);
			}
		});
		state = JobState.RUNNING;
		LOG.info("The url to track the job: " + getTrackingURL());
	}

重新亮一下原先的Job中的submit方法，分析submitJobInternal方法，代码较长，分析其中较为重要的模块：

JobID jobId = submitClient.getNewJobID();

获取此次提交的jobId，这个submitClient是什么？实际上就是我们的ClientProtcol，这里，应该算是其实现，即YarnRunner，我们看看里面如何获取到JobID的。

中间有点东西，我们忽视过去了，即YARNRunner的初始化，看看这个YARNRunner中的成员变量：

private final RecordFactory recordFactory = RecordFactoryProvider
			.getRecordFactory(null);
	private ResourceMgrDelegate resMgrDelegate;
	private ClientCache clientCache;
	private Configuration conf;
	private final FileContext defaultFileContext;

	/**
	 * Yarn runner incapsulates the client interface of yarn
	 * 
	 * @param conf
	 *            the configuration object for the client
	 */
	public YARNRunner(Configuration conf) {
		this(conf, new ResourceMgrDelegate(new YarnConfiguration(conf)));
	}

我们把重点放在这个构造器上，传入了一个ResourceMgrDelegate，这是什么呢，其实相当于ResouceManager的一个代理，从名字就能看出来。

看下ResourceMgrDelegate的构造器：

/**
	 * Delegate responsible for communicating with the Resource Manager's
	 * {@link ApplicationClientProtocol}.
	 * 
	 * @param conf
	 *            the configuration object.
	 */
	public ResourceMgrDelegate(YarnConfiguration conf) {
		super(ResourceMgrDelegate.class.getName());
		this.conf = conf;
		this.client = YarnClient.createYarnClient();
		init(conf);
		start();
	}

很明显，其中重要的有几个方法：createYarnClient，init，start，分开看看：

/**
   * Create a new instance of YarnClient.
   */
  @Public
  public static YarnClient createYarnClient() {
    YarnClient client = new YarnClientImpl();
    return client;
  }

这里返回了一个YarnClientImpl，而后ResourceMgrDelegate中的YarnClient实际上就是YarnClientImpl。

接下来看看init方法，该方法是在AbstractService内，而AbstractService是服务的基本实现类，而且是个抽象类，看看其中的init方法：

/**
   * {@inheritDoc}
   * This invokes {@link #serviceInit}
   * @param conf the configuration of the service. This must not be null
   * @throws ServiceStateException if the configuration was null,
   * the state change not permitted, or something else went wrong
   */
  @Override
  public void init(Configuration conf) {
    if (conf == null) {
      throw new ServiceStateException("Cannot initialize service "
                                      + getName() + ": null configuration");
    }
    if (isInState(STATE.INITED)) {
      return;
    }
    synchronized (stateChangeLock) {
      if (enterState(STATE.INITED) != STATE.INITED) {
        setConfig(conf);
        try {
          serviceInit(config);
          if (isInState(STATE.INITED)) {
            //if the service ended up here during init,
            //notify the listeners
            notifyListeners();
          }
        } catch (Exception e) {
          noteFailure(e);
          ServiceOperations.stopQuietly(LOG, this);
          throw ServiceStateException.convert(e);
        }
      }
    }
  }

看样子，其中第一个最重要的方法就是serviceInit，而这个方法，ResourceMgrDelegate内予以实现了，这里面有点绕，需要考虑清楚，接下来看下ResourceMgrDelegate中的serviceInit方法：

@Override
	protected void serviceInit(Configuration conf) throws Exception {
		client.init(conf);
		super.serviceInit(conf);
	}

此处的client已经是YarnClientImpl，但是其中没有init方法，而是继承于AbstractService的，其中使用了YarnClientImpl自身的serviceInit方法：

@SuppressWarnings("deprecation")
  @Override
  protected void serviceInit(Configuration conf) throws Exception {
    asyncApiPollIntervalMillis =
        conf.getLong(YarnConfiguration.YARN_CLIENT_APPLICATION_CLIENT_PROTOCOL_POLL_INTERVAL_MS,
          YarnConfiguration.DEFAULT_YARN_CLIENT_APPLICATION_CLIENT_PROTOCOL_POLL_INTERVAL_MS);
    asyncApiPollTimeoutMillis =
        conf.getLong(YarnConfiguration.YARN_CLIENT_APPLICATION_CLIENT_PROTOCOL_POLL_TIMEOUT_MS,
            YarnConfiguration.DEFAULT_YARN_CLIENT_APPLICATION_CLIENT_PROTOCOL_POLL_TIMEOUT_MS);
    submitPollIntervalMillis = asyncApiPollIntervalMillis;
    if (conf.get(YarnConfiguration.YARN_CLIENT_APP_SUBMISSION_POLL_INTERVAL_MS)
        != null) {
      submitPollIntervalMillis = conf.getLong(
        YarnConfiguration.YARN_CLIENT_APP_SUBMISSION_POLL_INTERVAL_MS,
        YarnConfiguration.DEFAULT_YARN_CLIENT_APPLICATION_CLIENT_PROTOCOL_POLL_INTERVAL_MS);
    }

    if (conf.getBoolean(YarnConfiguration.APPLICATION_HISTORY_ENABLED,
      YarnConfiguration.DEFAULT_APPLICATION_HISTORY_ENABLED)) {
      historyServiceEnabled = true;
      historyClient = AHSClient.createAHSClient();
      historyClient.init(conf);
    }

    if (conf.getBoolean(YarnConfiguration.TIMELINE_SERVICE_ENABLED,
        YarnConfiguration.DEFAULT_TIMELINE_SERVICE_ENABLED)) {
      timelineServiceEnabled = true;
      timelineClient = TimelineClient.createTimelineClient();
      timelineClient.init(conf);
      timelineDTRenewer = getTimelineDelegationTokenRenewer(conf);
      timelineService = TimelineUtils.buildTimelineTokenService(conf);
    }
    super.serviceInit(conf);
  }

看下这么一大段代码，看着挺多，实际上主要是一些变量的初始化，然后最后执行了AbstractService的ServiceInit方法。

接下来是start方法，同样，是AbstractService中的方法，其中执行了自己的serviceStart方法，我们看下ResourceMgrDelegate自身的ServiceStart方法做了什么：

@Override
	protected void serviceStart() throws Exception {
		client.start();
		super.serviceStart();
	}

主要是调用了YarnClientImpl的启动方法，绕来绕去，还得看YarnClientImpl中的serviceStart方法：

@Override
  protected void serviceStart() throws Exception {
    try {
      rmClient = ClientRMProxy.createRMProxy(getConfig(),
          ApplicationClientProtocol.class);
      if (historyServiceEnabled) {
        historyClient.start();
      }
      if (timelineServiceEnabled) {
        timelineClient.start();
      }
    } catch (IOException e) {
      throw new YarnRuntimeException(e);
    }
    super.serviceStart();
  }

这里，能够窥见，实际上是通过RPC的方式，有关于Hadoop IPC的内容在这里不多说了，追溯到这儿，可以看到是建立了RPC连接，需要实现的是ApplicationClientProtocol中的方法：

既然看到了这儿，那么，瞧瞧ApplicationClientProtocol是做什么的：

/**
 * <p>The protocol between clients and the <code>ResourceManager</code>
 * to submit/abort jobs and to get information on applications, cluster metrics,
 * nodes, queues and ACLs.</p> 
 */

注释如上，这个是真正的client与ResourceManager通信的协议，用于提交工作等事宜，具体的实现先不说，还是回到留下问题的地方，

public JobID getNewJobID() throws IOException, InterruptedException {
		try {
			this.application = client.createApplication()
					.getApplicationSubmissionContext();
			this.applicationId = this.application.getApplicationId();
			return TypeConverter.fromYarn(applicationId);
		} catch (YarnException e) {
			throw new IOException(e);
		}
	}

这里的代码还没看完呢？

经过一边溜达，我们终于可以继续了，经过初始化，构建了与ResourceManager的联系，然后开始执行createApplication方法：

@Override
  public YarnClientApplication createApplication()
      throws YarnException, IOException {
    ApplicationSubmissionContext context = Records.newRecord
        (ApplicationSubmissionContext.class);
    GetNewApplicationResponse newApp = getNewApplication();
    ApplicationId appId = newApp.getApplicationId();
    context.setApplicationId(appId);
    return new YarnClientApplication(newApp, context);
  }

这里多说一下ApplicationSubmissionContext，注释如下：

/**
 * <p><code>ApplicationSubmissionContext</code> represents all of the
 * information needed by the <code>ResourceManager</code> to launch 
 * the <code>ApplicationMaster</code> for an application.</p>
 * 
 * <p>It includes details such as:
 *   <ul>
 *     <li>{@link ApplicationId} of the application.</li>
 *     <li>Application user.</li>
 *     <li>Application name.</li>
 *     <li>{@link Priority} of the application.</li>
 *     <li>
 *       {@link ContainerLaunchContext} of the container in which the 
 *       <code>ApplicationMaster</code> is executed.
 *     </li>
 *     <li>maxAppAttempts. The maximum number of application attempts.
 *     It should be no larger than the global number of max attempts in the
 *     Yarn configuration.</li>
 *     <li>attemptFailuresValidityInterval. The default value is -1.
 *     when attemptFailuresValidityInterval in milliseconds is set to > 0,
 *     the failure number will no take failures which happen out of the
 *     validityInterval into failure count. If failure count reaches to
 *     maxAppAttempts, the application will be failed.
 *     </li>
 *   <li>Optional, application-specific {@link LogAggregationContext}</li>
 *   </ul>
 * </p>
 * 
 * @see ContainerLaunchContext
 * @see ApplicationClientProtocol#submitApplication(org.apache.hadoop.yarn.api.protocolrecords.SubmitApplicationRequest)

其中包含了所有的Application提交所需要的信息，用于让ResourceManager来启动一个新的ApplicationMaster。

本文实在太长，下文请看系列（三）。

关于Yarn源码的那些事（二）

猜你喜欢