引言
为了更全面和更加有意义的分析Hadoop Yarn的源码,我决定从一个作业提交到到作业生命周期结束的角度来分析Yarn的源码。我们将会屏蔽MapReduce部分的过程,因为我们现在的重点是研究Yarn的源码,以后我们单独针对MapReduce的过程以及源码进行分析。分析的版本是最新的 Hadoop 3.1.0版本。
本节,我们主要分析一个作业的上岸的过程,所谓上岸,就说作业从client提交的节点(可能是一个NM节点,也可能是一个RM节点)到RM的过程。
我们以一个最简单的WordCount的作业为例,如下是WordCount的main方法代码:
WordCount.main()
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs();
if (otherArgs.length < 2) {
System.err.println("Usage: wordcount <in> [<in>...] <out>");
System.exit(2);
}
Job job = Job.getInstance(conf, "word count");
job.setJarByClass(WordCount.class);
job.setMapperClass(TokenizerMapper.class);
job.setCombinerClass(IntSumReducer.class);
job.setReducerClass(IntSumReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
for (int i = 0; i < otherArgs.length - 1; ++i) {
FileInputFormat.addInputPath(job, new Path(otherArgs[i]));
}
FileOutputFormat.setOutputPath(job,
new Path(otherArgs[otherArgs.length - 1]));
//作业提交的过程
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
主要是对job一些静态数据的设置,对于这一块我们不做具体分析,作业提交部分的代码为:job.waitForCompletion(true), 如果返回0则告诉shell作业运行成功,否则告诉shell运行失败。这个关键方法的代码为:
WordCount.main() -> Job.waitForCompletion()
public boolean waitForCompletion(boolean verbose
) throws IOException, InterruptedException,
ClassNotFoundException {
if (state == JobState.DEFINE) {
submit();
}
if (verbose) {
monitorAndPrintJob();
} else {
// get the completion poll interval from the client.
int completionPollIntervalMillis =
Job.getCompletionPollInterval(cluster.getConf());
while (!isComplete()) {
try {
Thread.sleep(completionPollIntervalMillis);
} catch (InterruptedException ie) {
}
}
}
return isSuccessful();
}
只有 JobState是DEFINE模式的时候才会提交一次,确保该作业只提交一次,提交后 JobState状态就变成了RUNNING。submit()方法走完了所有提交流程。如果verbose模式开启,那就周期性的报告作业的进展,否则就隔一段时间轮询一下,判断作业是否已经结束。
接下来我们看关键的 Job.submit()方法:
WordCount.main() -> Job.waitForCompletion() -> Job.submit()
public void submit()
throws IOException, InterruptedException, ClassNotFoundException {
//再次确认没有重复提交
ensureState(JobState.DEFINE);
//判断是否采用了新的API
setUseNewAPI();
connect();
final JobSubmitter submitter =
getJobSubmitter(cluster.getFileSystem(), cluster.getClient());
status = ugi.doAs(new PrivilegedExceptionAction<JobStatus>() {
public JobStatus run() throws IOException, InterruptedException,
ClassNotFoundException {
return submitter.submitJobInternal(Job.this, cluster);
}
});
state = JobState.RUNNING;
LOG.info("The url to track the job: " + getTrackingURL());
}
为了兼容之前老板的API,不仅仅是作业提交的API,MapReduce的API也有新老之分,之后分析MapReduce代码的时候再详细展开。无论是新的提交API还是老的,都会走到Job.summit()这里。connect()方法的作用是和集群打通联系,为作业上岸做准备,而submitter就是专门负责这个上岸工作的联络人,通过submitter.submitJobInternal(Job.this, cluster)实现上岸的过程。
首先我们来看一下connect()方法为上岸工作做了哪些准备:
WordCount.main() -> Job.waitForCompletion() -> Job.submit() -> Job.connect()
private synchronized void connect()
throws IOException, InterruptedException, ClassNotFoundException {
if (cluster == null) {
cluster =
ugi.doAs(new PrivilegedExceptionAction<Cluster>() {
public Cluster run()
throws IOException, InterruptedException,
ClassNotFoundException {
return new Cluster(getConfiguration());
}
});
}
}
我们可以看到connect()方法只做了一件事,就是保证集群中有一个Cluster类对象。如果没有就创建一个。那么我们就来看一下,Cluster创建的时候做了什么操作:
WordCount.main() -> Job.waitForCompletion() -> Job.submit() -> Job.connect() ->Cluster()
public Cluster(InetSocketAddress jobTrackAddr, Configuration conf)
throws IOException {
this.conf = conf;
this.ugi = UserGroupInformation.getCurrentUser();
initialize(jobTrackAddr, conf);
}
无论哪个构造方法都会转而调用 initialize():
WordCount.main() -> Job.waitForCompletion() -> Job.submit() -> Job.connect() ->Cluster() -> Cluster.initialize():
private void initialize(InetSocketAddress jobTrackAddr, Configuration conf)
throws IOException {
initProviderList();
final IOException initEx = new IOException(
"Cannot initialize Cluster. Please check your configuration for "
+ MRConfig.FRAMEWORK_NAME
+ " and the correspond server addresses.");
if (jobTrackAddr != null) {
LOG.info(
"Initializing cluster for Job Tracker=" + jobTrackAddr.toString());
}
for (ClientProtocolProvider provider : providerList) {
LOG.debug("Trying ClientProtocolProvider : "
+ provider.getClass().getName());
ClientProtocol clientProtocol = null;
try {
if (jobTrackAddr == null) {
clientProtocol = provider.create(conf);
} else {
clientProtocol = provider.create(jobTrackAddr, conf);
}
if (clientProtocol != null) {
clientProtocolProvider = provider;
client = clientProtocol;
LOG.debug("Picked " + provider.getClass().getName()
+ " as the ClientProtocolProvider");
break;
} else {
LOG.debug("Cannot pick " + provider.getClass().getName()
+ " as the ClientProtocolProvider - returned null protocol");
}
} catch (Exception e) {
final String errMsg = "Failed to use " + provider.getClass().getName()
+ " due to error: ";
initEx.addSuppressed(new IOException(errMsg, e));
LOG.info(errMsg, e);
}
}
if (null == clientProtocolProvider || null == client) {
throw initEx;
}
}
WordCount.main() -> Job.waitForCompletion() -> Job.submit() -> Job.connect() ->Cluster() -> Cluster.initialize() -> ClientProtocolProvider.create()
public ClientProtocol create(Configuration conf) throws IOException {
if (MRConfig.YARN_FRAMEWORK_NAME.equals(conf.get(MRConfig.FRAMEWORK_NAME))) {
return new YARNRunner(conf);
}
return null;
}
主要是根据配置:public static final String FRAMEWORK_NAME = “mapreduce.framework.name”, 得到不同的 client类型:client = clientProtocol,我们这里考虑是选择 yarn的情况,那么client所创建的ClientProtocal 是 YARNRunner。
到此connect()的过程就分析完了,接下来回到Job.submit()部分,接下来看关键的 :WordCount.main() -> Job.waitForCompletion() -> Job.submit() ->JobSubmitter.submitJobinternal()
这个方法有点长,并且很关键,作业就是通过这个方法提交到集群的,我们分段来分析:
1.
//检查输出格式等配置信息
checkSpecs(job);
Configuration conf = job.getConfiguration();
Path jobStagingArea = JobSubmissionFiles.getStagingDir(cluster, conf);
//将要提交的一些相关信息保存到conf中
InetAddress ip = InetAddress.getLocalHost();
if (ip != null) {
submitHostAddress = ip.getHostAddress();
submitHostName = ip.getHostName();
conf.set(MRJobConfig.JOB_SUBMITHOST,submitHostName);
conf.set(MRJobConfig.JOB_SUBMITHOSTADDR,submitHostAddress);
}
//生成一个作业号,存入job对象中
JobID jobId = submitClient.getNewJobID();
job.setJobID(jobId);
//将产生作业号对应的临时子目录
Path submitJobDir = new Path(jobStagingArea, jobId.toString());
JobStatus status = null;
try {
conf.set(MRJobConfig.USER_NAME,
UserGroupInformation.getCurrentUser().getShortUserName());
conf.set("hadoop.http.filter.initializers",
"org.apache.hadoop.yarn.server.webproxy.amfilter.AmFilterInitializer");
conf.set(MRJobConfig.MAPREDUCE_JOB_DIR, submitJobDir.toString());
LOG.debug("Configuring job " + jobId + " with " + submitJobDir
+ " as the submit dir");
// 得到删除目录相关的认证
TokenCache.obtainTokensForNamenodes(job.getCredentials(),
new Path[] { submitJobDir }, conf);
populateTokenCache(conf, job.getCredentials());
// 产生shuffle过程的相关认证
if (TokenCache.getShuffleSecretKey(job.getCredentials()) == null) {
KeyGenerator keyGen;
try {
keyGen = KeyGenerator.getInstance(SHUFFLE_KEYGEN_ALGORITHM);
keyGen.init(SHUFFLE_KEY_LENGTH);
} catch (NoSuchAlgorithmException e) {
throw new IOException("Error generating shuffle secret key", e);
}
SecretKey shuffleKey = keyGen.generateKey();
TokenCache.setShuffleSecretKey(shuffleKey.getEncoded(),
job.getCredentials());
}
if (CryptoUtils.isEncryptedSpillEnabled(conf)) {
conf.setInt(MRJobConfig.MR_AM_MAX_ATTEMPTS, 1);
LOG.warn("Max job attempts set to 1 since encrypted intermediate" +
"data spill is enabled");
}
Path jobStagingArea = JobSubmissionFiles.getStagingDir(cluster, conf)操作是获取舞台目录 ,public static final String MR_AM_STAGING_DIR =
MR_AM_PREFIX+”staging-dir”;如果不配置,默认值为:”/tmp/hadoop-yarn/staging”;
2.
//将可执行文件等拷贝到 HDFS中
copyAndConfigureFiles(job, submitJobDir);
Path submitJobFile = JobSubmissionFiles.getJobConfPath(submitJobDir);
// 将输入数据文件切片,并写入临时目录
LOG.debug("Creating splits at " + jtFs.makeQualified(submitJobDir));
int maps = writeSplits(job, submitJobDir);
conf.setInt(MRJobConfig.NUM_MAPS, maps);
LOG.info("number of splits:" + maps);
int maxMaps = conf.getInt(MRJobConfig.JOB_MAX_MAP,
MRJobConfig.DEFAULT_JOB_MAX_MAP);
if (maxMaps >= 0 && maxMaps < maps) {
throw new IllegalArgumentException("The number of map tasks " + maps +
" exceeded limit " + maxMaps);
}
// write "queue admins of the queue to which job is being submitted"
// to job file.
String queue = conf.get(MRJobConfig.QUEUE_NAME,
JobConf.DEFAULT_QUEUE_NAME);
AccessControlList acl = submitClient.getQueueAdmins(queue);
conf.set(toFullPropertyName(queue,
QueueACL.ADMINISTER_JOBS.getAclName()), acl.getAclString());
// removing jobtoken referrals before copying the jobconf to HDFS
// as the tasks don't need this setting, actually they may break
// because of it if present as the referral will point to a
// different job.
TokenCache.cleanUpTokenReferral(conf);
if (conf.getBoolean(
MRJobConfig.JOB_TOKEN_TRACKING_IDS_ENABLED,
MRJobConfig.DEFAULT_JOB_TOKEN_TRACKING_IDS_ENABLED)) {
// Add HDFS tracking ids
ArrayList<String> trackingIds = new ArrayList<String>();
for (Token<? extends TokenIdentifier> t :
job.getCredentials().getAllTokens()) {
trackingIds.add(t.decodeIdentifier().getTrackingId());
}
conf.setStrings(MRJobConfig.JOB_TOKEN_TRACKING_IDS,
trackingIds.toArray(new String[trackingIds.size()]));
}
// Set reservation info if it exists
ReservationId reservationId = job.getReservationId();
if (reservationId != null) {
conf.set(MRJobConfig.RESERVATION_ID, reservationId.toString());
}
// Write job file to submit dir
writeConf(conf, submitJobFile);
3.
//
// 这里是作业正真提交的过程
//
printTokens(jobId, job.getCredentials());
status = submitClient.submitJob(
jobId, submitJobDir.toString(), job.getCredentials());
if (status != null) {
return status;
} else {
throw new IOException("Could not launch job");
}
} finally {
if (status == null) {
LOG.info("Cleaning up the staging area " + submitJobDir);
if (jtFs != null && submitJobDir != null)
jtFs.delete(submitJobDir, true);
}
}
由于作业提交到YARN集群中,所以这里的submitClient是一个YARNRunner对象:
WordCount.main() -> Job.waitForCompletion() -> Job.submit() ->JobSubmitter.submitJobinternal() -> YARNRunner.submitJob():
public JobStatus submitJob(JobID jobId, String jobSubmitDir, Credentials ts)
throws IOException, InterruptedException {
addHistoryToken(ts);
ApplicationSubmissionContext appContext =
createApplicationSubmissionContext(conf, jobSubmitDir, ts);
// 提交到RM的过程
try {
ApplicationId applicationId =
resMgrDelegate.submitApplication(appContext);
ApplicationReport appMaster = resMgrDelegate
.getApplicationReport(applicationId);
String diagnostics =
(appMaster == null ?
"application report is null" : appMaster.getDiagnostics());
if (appMaster == null
|| appMaster.getYarnApplicationState() == YarnApplicationState.FAILED
|| appMaster.getYarnApplicationState() == YarnApplicationState.KILLED) {
throw new IOException("Failed to run job : " +
diagnostics);
}
return clientCache.getClient(jobId).getJobStatus(jobId);
} catch (YarnException e) {
throw new IOException(e);
}
}
ApplicationSubmissionContext appContext才是RM端正真需要格式的最终用户信息,代表着RM需要发起应用程序的AM所需的全部信息,ApplicationSubmissionContext这个类的部分关键信息如下:
public class ApplicationSubmissionContextPBImpl
extends ApplicationSubmissionContext {
//RPC通信协议部分
ApplicationSubmissionContextProto proto =
ApplicationSubmissionContextProto.getDefaultInstance();
ApplicationSubmissionContextProto.Builder builder = null;
boolean viaProto = false;
//作业号,优先级等信息
private ApplicationId applicationId = null;
private Priority priority = null;
//RM指派AM时候,给的信息
private ContainerLaunchContext amContainer = null;
private Resource resource = null;
private Set<String> applicationTags = null;
private List<ResourceRequest> amResourceRequests = null;
private LogAggregationContext logAggregationContext = null;
private ReservationId reservationId = null;
private Map<ApplicationTimeoutType, Long> applicationTimeouts = null;
private Map<String, String> schedulingProperties = null;
这里面最重要的就是,ContainerLaunchContext amContainer,用CLC表示,代表着RM为这个应用对应的AM所需要的Context,这个Context决定了本应用中各个任务的执行。下面我们来看一下 CLC这个类的具体代码:
public class ContainerLaunchContextPBImpl
extends ContainerLaunchContext {
//这是RPC通信协议部分
ContainerLaunchContextProto proto =
ContainerLaunchContextProto.getDefaultInstance();
ContainerLaunchContextProto.Builder builder = null;
boolean viaProto = false;
private Map<String, LocalResource> localResources = null;
private ByteBuffer tokens = null;
private ByteBuffer tokensConf = null;
private Map<String, ByteBuffer> serviceData = null;
private Map<String, String> environment = null;
private List<String> commands = null;
private Map<ApplicationAccessType, String> applicationACLS = null;
private ContainerRetryContext containerRetryContext = null;
这里面信息量很大,最为关键的是List< String > commands,这是在RM指派的NM节点上面执行的shell命令行,用于启动一个java虚拟机,用于运行 MRAppMaster.class即应用程序对应的AM。
了解了提交给RM的信息后,我们回到上述的最后的提交过程:resMgrDelegate.submitApplication(appContext):
public class ResourceMgrDelegate extends YarnClient {
private static final Logger LOG =
LoggerFactory.getLogger(ResourceMgrDelegate.class);
private YarnConfiguration conf;
private ApplicationSubmissionContext application;
private ApplicationId applicationId;
protected YarnClient client;
private Text rmDTService;
public ResourceMgrDelegate(YarnConfiguration conf) {
super(ResourceMgrDelegate.class.getName());
this.conf = conf;
this.client = YarnClient.createYarnClient();
init(conf);
start();
}
public ApplicationId
submitApplication(ApplicationSubmissionContext appContext)
throws YarnException, IOException {
return client.submitApplication(appContext);
}
可以看到提交过程是由 client.submitApplication(appContext)完成的,在ResourceMgrDelegate的构造函数中完成了 client的初始化,YarnClient.createYarnClient():
public static YarnClient createYarnClient() {
YarnClient client = new YarnClientImpl();
return client;
}
下面看一下client提交的源码:
WordCount.main() -> Job.waitForCompletion() -> Job.submit() ->JobSubmitter.submitJobinternal() -> YARNRunner.submitJob() ->ResourceMgrDelegate.submitApplication() -> YarnClientImpl.submitApplication():
public ApplicationId
submitApplication(ApplicationSubmissionContext appContext)
throws YarnException, IOException {
ApplicationId applicationId = appContext.getApplicationId();
if (applicationId == null) {
throw new ApplicationIdNotProvidedException(
"ApplicationId is not provided in ApplicationSubmissionContext");
}
//把 appContext包装成 SubmitApplicationRequest
SubmitApplicationRequest request =
Records.newRecord(SubmitApplicationRequest.class);
request.setApplicationSubmissionContext(appContext);
// Automatically add the timeline DT into the CLC
// Only when the security and the timeline service are both enabled
if (isSecurityEnabled() && timelineV1ServiceEnabled) {
addTimelineDelegationToken(appContext.getAMContainerSpec());
}
//实际真正的上岸提交
rmClient.submitApplication(request);
int pollCount = 0;
long startTime = System.currentTimeMillis();
EnumSet<YarnApplicationState> waitingStates =
EnumSet.of(YarnApplicationState.NEW,
YarnApplicationState.NEW_SAVING,
YarnApplicationState.SUBMITTED);
EnumSet<YarnApplicationState> failToSubmitStates =
EnumSet.of(YarnApplicationState.FAILED,
YarnApplicationState.KILLED);
while (true) {
try {
ApplicationReport appReport = getApplicationReport(applicationId);
//获取RM端对应application的状态
YarnApplicationState state = appReport.getYarnApplicationState();
if (!waitingStates.contains(state)) {
if(failToSubmitStates.contains(state)) {
throw new YarnException("Failed to submit " + applicationId +
" to YARN : " + appReport.getDiagnostics());
}
//若已经不在等待的状态了,就跳出循环
LOG.info("Submitted application " + applicationId);
break;
}
long elapsedMillis = System.currentTimeMillis() - startTime;
if (enforceAsyncAPITimeout() &&
elapsedMillis >= asyncApiPollTimeoutMillis) {
throw new YarnException("Timed out while waiting for application " +
applicationId + " to be submitted successfully");
}
// 每10次轮询告知一下
if (++pollCount % 10 == 0) {
LOG.info("Application submission is not finished, " +
"submitted application " + applicationId +
" is still in " + state);
}
try {
Thread.sleep(submitPollIntervalMillis);
} catch (InterruptedException ie) {
String msg = "Interrupted while waiting for application "
+ applicationId + " to be successfully submitted.";
LOG.error(msg);
throw new YarnException(msg, ie);
}
} catch (ApplicationNotFoundException ex) {
// FailOver or RM restart happens before RMStateStore saves
// ApplicationState
LOG.info("Re-submit application " + applicationId + "with the " +
"same ApplicationSubmissionContext");
rmClient.submitApplication(request);
}
}
return applicationId;
}
上岸提交之前,先把appContext转化成一个SubmitApplicationRequest记录块,然后通过rmClient.submitApplicaiton()提交请求。while循环主要是为了处理对岸没有接受到或者其他一些异常的情况,重新进行提交。ApplicationClientProtocol rmClient, rmClient是一个ApplicationClientProtocol类对象,这是一个RPC的接口协议, 关于RPC的细节之后另外会开一节进行讲解。Hadoop RPC使用了Protocol Buffer进行了序列化操作,这个ApplicationClientProtocol接口的实现类,对应的也是rmClient实际的类为 ApplicationClientProtocolPBClientImpl。也就是具体上岸提交过程是由ApplicationClientProtocolPBClientImpl.submitApplicaiton()实现:
WordCount.main() -> Job.waitForCompletion() -> Job.submit() ->JobSubmitter.submitJobinternal() -> YARNRunner.submitJob() ->ResourceMgrDelegate.submitApplication() -> YarnClientImpl.submitApplication() -> ApplicationClientProtocolPBClientImpl.submitApplicaiton():
public SubmitApplicationResponse submitApplication(
SubmitApplicationRequest request) throws YarnException,
IOException {
//生成Protocol Buffer序列化的提交信息
SubmitApplicationRequestProto requestProto =
((SubmitApplicationRequestPBImpl) request).getProto();
try {
//返回可以供Protocol Buffer进行反序列化的信息
return new SubmitApplicationResponsePBImpl(proxy.submitApplication(null,
requestProto));
} catch (ServiceException e) {
RPCUtil.unwrapAndThrowException(e);
return null;
}
是由proxy.submitApplication(null,requestProto)完成了临门一脚,然后上岸,那么proxy是怎么来的呢:
public class ApplicationClientProtocolPBClientImpl implements ApplicationClientProtocol,
Closeable {
private ApplicationClientProtocolPB proxy;
public ApplicationClientProtocolPBClientImpl(long clientVersion,
InetSocketAddress addr, Configuration conf) throws IOException {
RPC.setProtocolEngine(conf, ApplicationClientProtocolPB.class,
ProtobufRpcEngine.class);
proxy = RPC.getProxy(ApplicationClientProtocolPB.class, clientVersion, addr, conf);
}
//关键的最里层的调用为RPC.getProxy() ->return :
return getProtocolEngine(protocol, conf).getProxy(protocol, clientVersion,
addr, ticket, conf, factory, rpcTimeout, connectionRetryPolicy,
fallbackToSimpleAuth);
//根据不同的序列化协议,得到不同的RpcEngine,我们这里得到的是ProtobufRpcEngine
static synchronized RpcEngine getProtocolEngine(Class<?> protocol,
Configuration conf) {
RpcEngine engine = PROTOCOL_ENGINES.get(protocol);
if (engine == null) {
Class<?> impl = conf.getClass(ENGINE_PROP+"."+protocol.getName(),
WritableRpcEngine.class);
engine = (RpcEngine)ReflectionUtils.newInstance(impl, conf);
PROTOCOL_ENGINES.put(protocol, engine);
}
return engine;
}
//然后调用RpcEngine的getProxy()方法得到proxy对象:
public <T> ProtocolProxy<T> getProxy(Class<T> protocol, long clientVersion,
InetSocketAddress addr, UserGroupInformation ticket, Configuration conf,
SocketFactory factory, int rpcTimeout, RetryPolicy connectionRetryPolicy,
AtomicBoolean fallbackToSimpleAuth) throws IOException {
final Invoker invoker = new Invoker(protocol, addr, ticket, conf, factory,
rpcTimeout, connectionRetryPolicy, fallbackToSimpleAuth);
return new ProtocolProxy<T>(protocol, (T) Proxy.newProxyInstance(
protocol.getClassLoader(), new Class[]{protocol}, invoker), false);
}
接下来我们再来理一下Client端由高到低调用层次:
1. YARNRunner.submitJob() //处于最顶层的应用层
2. ResourceMgrDelegate.submitApplication() //这是RM的代理
3. YARNClientImpl.submitApplication() //YARN框架的Client一侧
4. ApplicationClientProtocolPBClientImpl.submitApplication() //ApplicaitonClientProtocal接口
5. proxy.submitApplication()
//ApplicationClientProtocalPB接口
6. RPC过程内部的调用
7. socket通信
我们屏蔽socket底层通信部分,RPC Client端的 proxy.submitApplication()对应的RPC Server端的函数为:
ApplicationClientProtocolPBServiceImpl.submitApplication() , 他们是对称的关系,都实现了 ApplicationClientProtocalPB接口:
public class ApplicationClientProtocolPBServiceImpl implements ApplicationClientProtocolPB {
private ApplicationClientProtocol real;
public ApplicationClientProtocolPBServiceImpl(ApplicationClientProtocol impl) {
this.real = impl;
}
public SubmitApplicationResponseProto submitApplication(RpcController arg0,
SubmitApplicationRequestProto proto) throws ServiceException {
SubmitApplicationRequestPBImpl request = new SubmitApplicationRequestPBImpl(proto);
try {
SubmitApplicationResponse response = real.submitApplication(request);
return ((SubmitApplicationResponsePBImpl)response).getProto();
} catch (YarnException e) {
throw new ServiceException(e);
} catch (IOException e) {
throw new ServiceException(e);
}
}
我们看到ApplicationClientProtocolPBServiceImpl.submitApplication(),又调用了 real.submitApplication(), 而 real对象又实现了 ApplicationClientProtocol , 和RPC Client 端的 ApplicationClientProtocolClientImpl对等。在ApplicationClientProtocolPBServiceImpl的构造函数中real得以实现,那么是谁构造了ApplicationClientProtocolPBServiceImpl呢?
是ClientRMService, 这个RM端的服务类是专门用于服务Client的,包括Client的作业提交,作业查询等服务。在RM初始化函数 serviceInit()调用了 createClientRMService创建了 ClientRMService对象, 接下来看一下ClientRMService中创建ApplicationClientProtocolPBServiceImpl类对象的。首先看一下ClientRMService.serviceStart()过程:
protected void serviceStart() throws Exception {
Configuration conf = getConfig();
YarnRPC rpc = YarnRPC.create(conf);
//创建一个RPC层面的Server
this.server =
rpc.getServer(ApplicationClientProtocol.class, this,
clientBindAddress,
conf, this.rmDTSecretManager,
conf.getInt(YarnConfiguration.RM_CLIENT_THREAD_COUNT,
YarnConfiguration.DEFAULT_RM_CLIENT_THREAD_COUNT));
// Enable service authorization?
if (conf.getBoolean(
CommonConfigurationKeysPublic.HADOOP_SECURITY_AUTHORIZATION,
false)) {
InputStream inputStream =
this.rmContext.getConfigurationProvider()
.getConfigurationInputStream(conf,
YarnConfiguration.HADOOP_POLICY_CONFIGURATION_FILE);
if (inputStream != null) {
conf.addResource(inputStream);
}
refreshServiceAcls(conf, RMPolicyProvider.getInstance());
}
this.displayPerUserApps = conf.getBoolean(
YarnConfiguration.DISPLAY_APPS_FOR_LOGGED_IN_USER,
YarnConfiguration.DEFAULT_DISPLAY_APPS_FOR_LOGGED_IN_USER);
this.server.start();
clientBindAddress = conf.updateConnectAddr(YarnConfiguration.RM_BIND_HOST,
YarnConfiguration.RM_ADDRESS,
YarnConfiguration.DEFAULT_RM_ADDRESS,
server.getListenerAddress());
super.serviceStart();
}
ClientRMService.serviceStart() ->HadoopYarnProtocRPC.getServer():
public Server getServer(Class protocol, Object instance,
InetSocketAddress addr, Configuration conf,
SecretManager<? extends TokenIdentifier> secretManager,
int numHandlers, String portRangeConfig) {
LOG.debug("Creating a HadoopYarnProtoRpc server for protocol " + protocol +
" with " + numHandlers + " handlers");
return RpcFactoryProvider.getServerFactory(conf).getServer(protocol,
instance, addr, conf, secretManager, numHandlers, portRangeConfig);
}
ClientRMService.serviceStart() ->HadoopYarnProtocRPC.getServer() -> RpcServerFactoryPBImpl.getServer():
public Server getServer(Class<?> protocol, Object instance,
InetSocketAddress addr, Configuration conf,
SecretManager<? extends TokenIdentifier> secretManager, int numHandlers,
String portRangeConfig) {
//现在 serviceCache中查看是否有实现该 protocal接口的实例
Constructor<?> constructor = serviceCache.get(protocol);
if (constructor == null) {
Class<?> pbServiceImplClazz = null;
try {
//获取接口类的完整的名称 “org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl"对应的Class对象
pbServiceImplClazz = conf
.getClassByName(getPbServiceImplClassName(protocol));
} catch (ClassNotFoundException e) {
throw new YarnRuntimeException("Failed to load class: ["
+ getPbServiceImplClassName(protocol) + "]", e);
}
try {
//获取这个类的构造函数
constructor = pbServiceImplClazz.getConstructor(protocol);
constructor.setAccessible(true);
//把他缓存在serviceCache,以后就方便get了
serviceCache.putIfAbsent(protocol, constructor);
} catch (NoSuchMethodException e) {
throw new YarnRuntimeException("Could not find constructor with params: "
+ Long.TYPE + ", " + InetSocketAddress.class + ", "
+ Configuration.class, e);
}
}
Object service = null;
try {
//调用该构造函数,传入的参数是 instance
service = constructor.newInstance(instance);
} catch (InvocationTargetException e) {
throw new YarnRuntimeException(e);
} catch (IllegalAccessException e) {
throw new YarnRuntimeException(e);
} catch (InstantiationException e) {
throw new YarnRuntimeException(e);
}
Class<?> pbProtocol = service.getClass().getInterfaces()[0];
Method method = protoCache.get(protocol);
if (method == null) {
Class<?> protoClazz = null;
try {
protoClazz = conf.getClassByName(getProtoClassName(protocol));
} catch (ClassNotFoundException e) {
throw new YarnRuntimeException("Failed to load class: ["
+ getProtoClassName(protocol) + "]", e);
}
try {
method = protoClazz.getMethod("newReflectiveBlockingService",
pbProtocol.getInterfaces()[0]);
method.setAccessible(true);
protoCache.putIfAbsent(protocol, method);
} catch (NoSuchMethodException e) {
throw new YarnRuntimeException(e);
}
}
return createServer(pbProtocol, addr, conf, secretManager, numHandlers,
(BlockingService)method.invoke(null, service), portRangeConfig);
回到ClientRMService 类中,上述的service = constructor.newInstance(instance), 这里的instance传入的就是ClientRMService本身:
protected void serviceStart() throws Exception {
Configuration conf = getConfig();
YarnRPC rpc = YarnRPC.create(conf);
this.server =
rpc.getServer(ApplicationClientProtocol.class, this,
clientBindAddress,
conf, this.rmDTSecretManager,
conf.getInt(YarnConfiguration.RM_CLIENT_THREAD_COUNT,
YarnConfiguration.DEFAULT_RM_CLIENT_THREAD_COUNT));
//ApplicationClientProtocolPBServiceImpl构造函数传入的real参数就是ClientRMService类对象
public ApplicationClientProtocolPBServiceImpl(ApplicationClientProtocol impl) {
this.real = impl;
}
也就是说,real提交的过程,就是ClientRMService提交的过程:
ApplicationClientProtocolPBServiceImpl.submitApplication() -> ClientRMService.submitApplication():
public SubmitApplicationResponse submitApplication(
SubmitApplicationRequest request) throws YarnException, IOException {
//省略一些验证工作,队列信息检查,App名称检查等。
try {
// 直接把提交交给RMAppManager类的对象 rmAppManager,并且把当前的时间,作为作业提交的时间
rmAppManager.submitApplication(submissionContext,
System.currentTimeMillis(), user);
LOG.info("Application with id " + applicationId.getId() +
" submitted by user " + user);
RMAuditLogger.logSuccess(user, AuditConstants.SUBMIT_APP_REQUEST,
"ClientRMService", applicationId, callerContext);
} catch (YarnException e) {
LOG.info("Exception in submitting " + applicationId, e);
RMAuditLogger.logFailure(user, AuditConstants.SUBMIT_APP_REQUEST,
e.getMessage(), "ClientRMService",
"Exception in submitting application", applicationId, callerContext);
throw e;
}
return recordFactory
.newRecordInstance(SubmitApplicationResponse.class);
}
由ClientRMService.submitApplication()直接把作业交给RMAppManager类的对象 rmAppManager进行提交,这也是作业最终上岸了,接下来就是RM的事情了。
总结:
作业提交总的调用层次为:
Client端:(客户提交的机器)
WordCount.main() -> Job.waitForCompletion() -> Job.submit() ->JobSubmitter.submitJobinternal() -> YARNRunner.submitJob() ->ResourceMgrDelegate.submitApplication() -> YarnClientImpl.submitApplication() -> ApplicationClientProtocolPBClientImpl.submitApplicaiton()
Server端:(RM所在的机器)
ApplicationClientProtocolPBServiceImpl.submitApplication() -> ClientRMService.submitApplication() -> RMAppManager.submitApplication()
作业终于上岸了,下一节继续分析作业上岸以后的流向。