本文,讨论一下提交的MapReduce作业,究竟是如何运行起来的?
本文会尽可能解决系列(一)中提出的那些问题,并且提出一些新的问题。
1:我们提交的MapReduce程序,到底是如何运行在Yarn框架上的?
这个问题,一点点来定位。
首先,我们需要在mapred-site.xml配置文件中指定mapreduce.framework.name,并且将其值指定为yarn,毫无疑问,这个值会在我们后来任务的运行过程中使用到:
<configuration> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> </configuration>
在这里指定了使用Yarn的资源调度框架之后,我们找点别的代码研究下:
追本溯源,即便是在2.X.X版本的Hadoop中,我们提交的任务依旧是通过Hadoop jar方式提交的,那么,看下这个hadoop的脚本中到底都写了什么:
elif [ "$COMMAND" = "jar" ] ; then CLASS=org.apache.hadoop.util.RunJar
接下来看下RunJar这个类,看看其中的main方法:
一路下来,看到其中通过反射跑的是jar包中的Main-class中的Main方法;代码如下:
/** * Run a Hadoop job jar. If the main class is not in the jar's manifest, * then it must be provided on the command line. */ public static void main(String[] args) throws Throwable { new RunJar().run(args); }
Manifest manifest = jarFile.getManifest(); if (manifest != null) { mainClassName = manifest.getMainAttributes().getValue("Main-Class"); } jarFile.close();
Thread.currentThread().setContextClassLoader(loader); Class<?> mainClass = Class.forName(mainClassName, true, loader); Method main = mainClass.getMethod("main", new Class[] { Array .newInstance(String.class, 0).getClass() }); String[] newArgs = Arrays.asList(args).subList(firstArg, args.length) .toArray(new String[0]); try { main.invoke(null, new Object[] { newArgs }); } catch (InvocationTargetException e) { throw e.getTargetException(); }
代码全在RunJar内,不予赘述。
那么,既然跑的是Main方法,我们以MapReduce-examples中的那些例子为模板,以其中的WordCount为基础,琢磨一下:
System.exit(job.waitForCompletion(true) ? 0 : 1);
看到了这句话,接下来看看waitForCompletion方法:
/** * Submit the job to the cluster and wait for it to finish. * * @param verbose * print the progress to the user * @return true if the job succeeded * @throws IOException * thrown if the communication with the <code>JobTracker</code> * is lost */ public boolean waitForCompletion(boolean verbose) throws IOException, InterruptedException, ClassNotFoundException { if (state == JobState.DEFINE) { submit(); } if (verbose) { monitorAndPrintJob(); } else { // get the completion poll interval from the client. int completionPollIntervalMillis = Job .getCompletionPollInterval(cluster.getConf()); while (!isComplete()) { try { Thread.sleep(completionPollIntervalMillis); } catch (InterruptedException ie) { } } } return isSuccessful(); }
这里提一下,Hadoop 2.x.x版本,采用的是状态机与服务库的设计,每一个部件都有自己特定的状态,根据不同的事件来进行状态的不断转换,这样看代码也比较清晰了。
从这里,发现需要看下submit方法:
/** * Submit the job to the cluster and return immediately. * * @throws IOException */ public void submit() throws IOException, InterruptedException, ClassNotFoundException { ensureState(JobState.DEFINE); setUseNewAPI(); connect(); final JobSubmitter submitter = getJobSubmitter(cluster.getFileSystem(), cluster.getClient()); status = ugi.doAs(new PrivilegedExceptionAction<JobStatus>() { public JobStatus run() throws IOException, InterruptedException, ClassNotFoundException { return submitter.submitJobInternal(Job.this, cluster); } }); state = JobState.RUNNING; LOG.info("The url to track the job: " + getTrackingURL()); }
可以看出这是个异步调用的方法,立刻返回,这里,重要的是JobSubmitter,是其提供了提交任务的操作,我们先看下这个submitter的初始化,研究下getJobSubmitter方法:
一路下来,到了这里:
JobSubmitter(FileSystem submitFs, ClientProtocol submitClient) throws IOException { this.submitClient = submitClient; this.jtFs = submitFs; }
看起来好像是根据指定的FileSystem和submitClient来生成JobSubmitter,那么,我们这里提交任务的时候,使用的FileSystem是从cluster中得到的,所以需要看下这个cluster:
而实际上,这个Cluster是从Job里得到的,是Job自身的一个成员变量,更进一步,其实是在Job初始化的时候得到的:
Job(JobConf conf) throws IOException { super(conf, null); // propagate existing user credentials to job this.credentials.mergeAll(this.ugi.getCredentials()); this.cluster = null; }
在初始化的时候,我们发现,这个cluster传入的实际上是个空。
那么,到底是在什么时候这个cluster有所改变呢?
往前看一下,注意这两个方法:
setUseNewAPI(); connect();
一个一个看:
/** * Default to the new APIs unless they are explicitly set or the old mapper * or reduce attributes are used. * * @throws IOException * if the configuration is inconsistant */ private void setUseNewAPI() throws IOException { int numReduces = conf.getNumReduceTasks(); String oldMapperClass = "mapred.mapper.class"; String oldReduceClass = "mapred.reducer.class"; conf.setBooleanIfUnset("mapred.mapper.new-api", conf.get(oldMapperClass) == null); if (conf.getUseNewMapper()) { String mode = "new map API"; ensureNotSet("mapred.input.format.class", mode); ensureNotSet(oldMapperClass, mode); if (numReduces != 0) { ensureNotSet("mapred.partitioner.class", mode); } else { ensureNotSet("mapred.output.format.class", mode); } } else { String mode = "map compatability"; ensureNotSet(INPUT_FORMAT_CLASS_ATTR, mode); ensureNotSet(MAP_CLASS_ATTR, mode); if (numReduces != 0) { ensureNotSet(PARTITIONER_CLASS_ATTR, mode); } else { ensureNotSet(OUTPUT_FORMAT_CLASS_ATTR, mode); } } if (numReduces != 0) { conf.setBooleanIfUnset("mapred.reducer.new-api", conf.get(oldReduceClass) == null); if (conf.getUseNewReducer()) { String mode = "new reduce API"; ensureNotSet("mapred.output.format.class", mode); ensureNotSet(oldReduceClass, mode); } else { String mode = "reduce compatability"; ensureNotSet(OUTPUT_FORMAT_CLASS_ATTR, mode); ensureNotSet(REDUCE_CLASS_ATTR, mode); } } }
这里,能看出来,默认使用的是新的Api,而且我在配置文件里并未指定,所以默认使用全新的Api,接着看connect方法:
private synchronized void connect() throws IOException, InterruptedException, ClassNotFoundException { if (cluster == null) { cluster = ugi.doAs(new PrivilegedExceptionAction<Cluster>() { public Cluster run() throws IOException, InterruptedException, ClassNotFoundException { return new Cluster(getConfiguration()); } }); } }
可以看到,connect方法中,通过Configuration新建了一个Cluster,接下来,看下Cluster的初始化:
public Cluster(InetSocketAddress jobTrackAddr, Configuration conf) throws IOException { this.conf = conf; this.ugi = UserGroupInformation.getCurrentUser(); initialize(jobTrackAddr, conf); }
重点在initialize方法:
private void initialize(InetSocketAddress jobTrackAddr, Configuration conf) throws IOException { synchronized (frameworkLoader) { for (ClientProtocolProvider provider : frameworkLoader) { LOG.debug("Trying ClientProtocolProvider : " + provider.getClass().getName()); ClientProtocol clientProtocol = null; try { if (jobTrackAddr == null) { clientProtocol = provider.create(conf); } else { clientProtocol = provider.create(jobTrackAddr, conf); } if (clientProtocol != null) { clientProtocolProvider = provider; client = clientProtocol; LOG.debug("Picked " + provider.getClass().getName() + " as the ClientProtocolProvider"); break; } else { LOG.debug("Cannot pick " + provider.getClass().getName() + " as the ClientProtocolProvider - returned null protocol"); } } catch (Exception e) { LOG.info("Failed to use " + provider.getClass().getName() + " due to error: " + e.getMessage()); } } } if (null == clientProtocolProvider || null == client) { throw new IOException( "Cannot initialize Cluster. Please check your configuration for " + MRConfig.FRAMEWORK_NAME + " and the correspond server addresses."); } }
我们这里的jobTrackAddr是null,所以由provider(type:ClientProtocolProvider)来创建ClientProtocol,看看其中的create方法:
private void initialize(InetSocketAddress jobTrackAddr, Configuration conf) throws IOException { synchronized (frameworkLoader) { for (ClientProtocolProvider provider : frameworkLoader) { LOG.debug("Trying ClientProtocolProvider : " + provider.getClass().getName()); ClientProtocol clientProtocol = null; try { if (jobTrackAddr == null) { clientProtocol = provider.create(conf); } else { clientProtocol = provider.create(jobTrackAddr, conf); } if (clientProtocol != null) { clientProtocolProvider = provider; client = clientProtocol; LOG.debug("Picked " + provider.getClass().getName() + " as the ClientProtocolProvider"); break; } else { LOG.debug("Cannot pick " + provider.getClass().getName() + " as the ClientProtocolProvider - returned null protocol"); } } catch (Exception e) { LOG.info("Failed to use " + provider.getClass().getName() + " due to error: " + e.getMessage()); } } } if (null == clientProtocolProvider || null == client) { throw new IOException( "Cannot initialize Cluster. Please check your configuration for " + MRConfig.FRAMEWORK_NAME + " and the correspond server addresses."); } }
这里,返回了一个clientProtocol,查看源码发现,ClientProtocolProvider的实现只有:
- LocalClientProtocolProvider
- YarnClientProtocolProvider
@Override public ClientProtocol create(Configuration conf) throws IOException { if (MRConfig.YARN_FRAMEWORK_NAME.equals(conf .get(MRConfig.FRAMEWORK_NAME))) { return new YARNRunner(conf); } return null; }
其中实现的create方法是一样的,需要根据Conf中的MRConfig.FRAMEWORK_NAME来寻找对应的clientProtocol,看下MRCONFIG是什么吧,源码在org.apache.hadoop。
public static final String FRAMEWORK_NAME = "mapreduce.framework.name";
该类中定义了很多MapReduce中用到的常量,然后我们看到了这个,一切,瞬间清晰明了:因为我们定义了mapreduce.framework.name=yarn,所以,追本溯源,得到的clientProtocol为YARNRunner,其代替我们把任务提交给了Yarn集群。
继续我们的探索之路:
/** * Submit the job to the cluster and return immediately. * * @throws IOException */ public void submit() throws IOException, InterruptedException, ClassNotFoundException { ensureState(JobState.DEFINE); setUseNewAPI(); connect(); final JobSubmitter submitter = getJobSubmitter(cluster.getFileSystem(), cluster.getClient()); status = ugi.doAs(new PrivilegedExceptionAction<JobStatus>() { public JobStatus run() throws IOException, InterruptedException, ClassNotFoundException { return submitter.submitJobInternal(Job.this, cluster); } }); state = JobState.RUNNING; LOG.info("The url to track the job: " + getTrackingURL()); }
重新亮一下原先的Job中的submit方法,分析submitJobInternal方法,代码较长,分析其中较为重要的模块:
JobID jobId = submitClient.getNewJobID();
获取此次提交的jobId,这个submitClient是什么?实际上就是我们的ClientProtcol,这里,应该算是其实现,即YarnRunner,我们看看里面如何获取到JobID的。
中间有点东西,我们忽视过去了,即YARNRunner的初始化,看看这个YARNRunner中的成员变量:
private final RecordFactory recordFactory = RecordFactoryProvider .getRecordFactory(null); private ResourceMgrDelegate resMgrDelegate; private ClientCache clientCache; private Configuration conf; private final FileContext defaultFileContext; /** * Yarn runner incapsulates the client interface of yarn * * @param conf * the configuration object for the client */ public YARNRunner(Configuration conf) { this(conf, new ResourceMgrDelegate(new YarnConfiguration(conf))); }
我们把重点放在这个构造器上,传入了一个ResourceMgrDelegate,这是什么呢,其实相当于ResouceManager的一个代理,从名字就能看出来。
看下ResourceMgrDelegate的构造器:
/** * Delegate responsible for communicating with the Resource Manager's * {@link ApplicationClientProtocol}. * * @param conf * the configuration object. */ public ResourceMgrDelegate(YarnConfiguration conf) { super(ResourceMgrDelegate.class.getName()); this.conf = conf; this.client = YarnClient.createYarnClient(); init(conf); start(); }
很明显,其中重要的有几个方法:createYarnClient,init,start,分开看看:
/** * Create a new instance of YarnClient. */ @Public public static YarnClient createYarnClient() { YarnClient client = new YarnClientImpl(); return client; }
这里返回了一个YarnClientImpl,而后ResourceMgrDelegate中的YarnClient实际上就是YarnClientImpl。
接下来看看init方法,该方法是在AbstractService内,而AbstractService是服务的基本实现类,而且是个抽象类,看看其中的init方法:
/** * {@inheritDoc} * This invokes {@link #serviceInit} * @param conf the configuration of the service. This must not be null * @throws ServiceStateException if the configuration was null, * the state change not permitted, or something else went wrong */ @Override public void init(Configuration conf) { if (conf == null) { throw new ServiceStateException("Cannot initialize service " + getName() + ": null configuration"); } if (isInState(STATE.INITED)) { return; } synchronized (stateChangeLock) { if (enterState(STATE.INITED) != STATE.INITED) { setConfig(conf); try { serviceInit(config); if (isInState(STATE.INITED)) { //if the service ended up here during init, //notify the listeners notifyListeners(); } } catch (Exception e) { noteFailure(e); ServiceOperations.stopQuietly(LOG, this); throw ServiceStateException.convert(e); } } } }
看样子,其中第一个最重要的方法就是serviceInit,而这个方法,ResourceMgrDelegate内予以实现了,这里面有点绕,需要考虑清楚,接下来看下ResourceMgrDelegate中的serviceInit方法:
@Override protected void serviceInit(Configuration conf) throws Exception { client.init(conf); super.serviceInit(conf); }
此处的client已经是YarnClientImpl,但是其中没有init方法,而是继承于AbstractService的,其中使用了YarnClientImpl自身的serviceInit方法:
@SuppressWarnings("deprecation") @Override protected void serviceInit(Configuration conf) throws Exception { asyncApiPollIntervalMillis = conf.getLong(YarnConfiguration.YARN_CLIENT_APPLICATION_CLIENT_PROTOCOL_POLL_INTERVAL_MS, YarnConfiguration.DEFAULT_YARN_CLIENT_APPLICATION_CLIENT_PROTOCOL_POLL_INTERVAL_MS); asyncApiPollTimeoutMillis = conf.getLong(YarnConfiguration.YARN_CLIENT_APPLICATION_CLIENT_PROTOCOL_POLL_TIMEOUT_MS, YarnConfiguration.DEFAULT_YARN_CLIENT_APPLICATION_CLIENT_PROTOCOL_POLL_TIMEOUT_MS); submitPollIntervalMillis = asyncApiPollIntervalMillis; if (conf.get(YarnConfiguration.YARN_CLIENT_APP_SUBMISSION_POLL_INTERVAL_MS) != null) { submitPollIntervalMillis = conf.getLong( YarnConfiguration.YARN_CLIENT_APP_SUBMISSION_POLL_INTERVAL_MS, YarnConfiguration.DEFAULT_YARN_CLIENT_APPLICATION_CLIENT_PROTOCOL_POLL_INTERVAL_MS); } if (conf.getBoolean(YarnConfiguration.APPLICATION_HISTORY_ENABLED, YarnConfiguration.DEFAULT_APPLICATION_HISTORY_ENABLED)) { historyServiceEnabled = true; historyClient = AHSClient.createAHSClient(); historyClient.init(conf); } if (conf.getBoolean(YarnConfiguration.TIMELINE_SERVICE_ENABLED, YarnConfiguration.DEFAULT_TIMELINE_SERVICE_ENABLED)) { timelineServiceEnabled = true; timelineClient = TimelineClient.createTimelineClient(); timelineClient.init(conf); timelineDTRenewer = getTimelineDelegationTokenRenewer(conf); timelineService = TimelineUtils.buildTimelineTokenService(conf); } super.serviceInit(conf); }
看下这么一大段代码,看着挺多,实际上主要是一些变量的初始化,然后最后执行了AbstractService的ServiceInit方法。
接下来是start方法,同样,是AbstractService中的方法,其中执行了自己的serviceStart方法,我们看下ResourceMgrDelegate自身的ServiceStart方法做了什么:
@Override protected void serviceStart() throws Exception { client.start(); super.serviceStart(); }
主要是调用了YarnClientImpl的启动方法,绕来绕去,还得看YarnClientImpl中的serviceStart方法:
@Override protected void serviceStart() throws Exception { try { rmClient = ClientRMProxy.createRMProxy(getConfig(), ApplicationClientProtocol.class); if (historyServiceEnabled) { historyClient.start(); } if (timelineServiceEnabled) { timelineClient.start(); } } catch (IOException e) { throw new YarnRuntimeException(e); } super.serviceStart(); }
这里,能够窥见,实际上是通过RPC的方式,有关于Hadoop IPC的内容在这里不多说了,追溯到这儿,可以看到是建立了RPC连接,需要实现的是ApplicationClientProtocol中的方法:
既然看到了这儿,那么,瞧瞧ApplicationClientProtocol是做什么的:
/** * <p>The protocol between clients and the <code>ResourceManager</code> * to submit/abort jobs and to get information on applications, cluster metrics, * nodes, queues and ACLs.</p> */
注释如上,这个是真正的client与ResourceManager通信的协议,用于提交工作等事宜,具体的实现先不说,还是回到留下问题的地方,
public JobID getNewJobID() throws IOException, InterruptedException { try { this.application = client.createApplication() .getApplicationSubmissionContext(); this.applicationId = this.application.getApplicationId(); return TypeConverter.fromYarn(applicationId); } catch (YarnException e) { throw new IOException(e); } }
这里的代码还没看完呢?
经过一边溜达,我们终于可以继续了,经过初始化,构建了与ResourceManager的联系,然后开始执行createApplication方法:
@Override public YarnClientApplication createApplication() throws YarnException, IOException { ApplicationSubmissionContext context = Records.newRecord (ApplicationSubmissionContext.class); GetNewApplicationResponse newApp = getNewApplication(); ApplicationId appId = newApp.getApplicationId(); context.setApplicationId(appId); return new YarnClientApplication(newApp, context); }
这里多说一下ApplicationSubmissionContext,注释如下:
/** * <p><code>ApplicationSubmissionContext</code> represents all of the * information needed by the <code>ResourceManager</code> to launch * the <code>ApplicationMaster</code> for an application.</p> * * <p>It includes details such as: * <ul> * <li>{@link ApplicationId} of the application.</li> * <li>Application user.</li> * <li>Application name.</li> * <li>{@link Priority} of the application.</li> * <li> * {@link ContainerLaunchContext} of the container in which the * <code>ApplicationMaster</code> is executed. * </li> * <li>maxAppAttempts. The maximum number of application attempts. * It should be no larger than the global number of max attempts in the * Yarn configuration.</li> * <li>attemptFailuresValidityInterval. The default value is -1. * when attemptFailuresValidityInterval in milliseconds is set to > 0, * the failure number will no take failures which happen out of the * validityInterval into failure count. If failure count reaches to * maxAppAttempts, the application will be failed. * </li> * <li>Optional, application-specific {@link LogAggregationContext}</li> * </ul> * </p> * * @see ContainerLaunchContext * @see ApplicationClientProtocol#submitApplication(org.apache.hadoop.yarn.api.protocolrecords.SubmitApplicationRequest)
其中包含了所有的Application提交所需要的信息,用于让ResourceManager来启动一个新的ApplicationMaster。
本文实在太长,下文请看系列(三)。