yarn process

concepts and flow
An application submission client wants to submit an application to YARN RM. This can be done by setting the YarnClient object. After the YarnClient is established, the client sets the application context, prepares the application's first container containing AM, and then submits the application. You need to provide information such as details of local files/jars required by the application, the actual command executed (including required command line arguments), some OS environment settings (optional), etc. In fact, you need to describe the Unix process executed in the applicationMaster.
YARN RM will start the applicationMaster in the allocated container. The applicationMaster communicates with the YARN cluster to complete application execution. It completes the operation asynchronously. During the startup of the application, the main tasks of the applicationMaster are: a) communicate with the RM to negotiate and allocate future container resources; b) communicate with the YARN NodeManager to start the container after the container is allocated. Task a) can be executed asynchronously through the AMRMClientAsync object, using the event handling method specified by the event handler of type AMRMClientAsync.CallbackHandler. This event handler needs to be explicitly set to the client. Task b) can start the container by starting a runnable object after allocating the container. As part of launching the container, the AM must specify the ContainerLaunchContext, which has launch information such as command line instructions, environment, etc.
During application execution, the AM communicates with the NM through the NMClientAsync object. All container events are handled by NMClientAsync.CallbackHandler related to NMClientAsync. A typical callback handler handles client start/stop/status update and error. AM reports execution progress to RM by processing the getProgress() method of AMRMClientAsync.CallbackHandler.
In addition to asynchronous clients, there are also synchronous versions for certain workflows (AMRMClient and NMClient). The asynchronous client is recommended because it is subjectively simpler to use, and this article also mainly describes the asynchronous client. Refer to AMRMCLient and NMClient for more information on synchronization clients.
interfaces write a simple YARN application
- Client<–>RM : YarnClient
- AM<–>RM : AMRMClientAsync
- AM<–>NM : NMClientAsync
  Note:
- The three main protocols for YARN applications (ApplicationClientProtocol, ApplicationMasterProtocol and ContainerManagementProtocol) are still under maintenance. These three clients hide three protocols and provide a simpler programming model for YARN applications.
- In rare cases, a programmer might want to implement an application using the three protocols directly. However, please note that such behavior is no longer encouraged in general.
FAQ
Writing a simple YARN application
- writing a simple client
  - The first step for the client to do is to initialize and start the YarnClient
    YarnClient yarnClient = YarnClient.createYarnClient(); yarnClient.init(conf); yarnClient.start();
  - Once the client is generated, the client needs to generate an application and then get the application id.
    YarnClientApplication app = yarnClient.createApplication(); GetNewApplicationResponse appResponse = app.getNewApplicationResponse();
  - The response from the new application's YarnClientApplication also contains cluster information, such as the cluster's min/max resource capabilities. This way you can set the container parameters correctly. Please refer to GetNewApplicationResponse for more details.
  - The key to the client is to set the ApplicationSubmissionContext, which defines all the information that the RM needs to start the AM. The client needs to set the following into the context:
    - application info ：id，name
    - Queue, priority information: the queue to which the application is to be submitted, the priority assigned to the application
    - User: The user who submitted the app
    - ContainerLaunchContext: defines the container where the AM is started and running, and defines all the information required to run the application such as local resources (binaries, jars, files, etc.), environment settings (CLASSPATH, etc.), commands to be executed and security Tokens (RECT ).
      // set the application submission context ApplicationSubmissionContext appContext = app.getApplicationSubmissionContext(); ApplicationId appId = appContext.getApplicationId(); appContext.setKeepContainersAcrossApplicationAttempts(keepContainers); appContext.setApplicationName(appName); // set local resources for the application master // local files or archives as needed // In this scenario, the jar file for the application master is part of the local resources Map<String, LocalResource> localResources = new HashMap<String, LocalResource>(); LOG.info("Copy App Master jar from local filesystem and add to local environment"); // Copy the application master jar to the filesystem // Create a local resource to point to the destination jar path FileSystem fs = FileSystem.get(conf); addToLocalResources(fs, appMasterJar, appMasterJarPath, appId.toString(), localResources, null); // Set the log4j properties if needed if (!log4jPropFile.isEmpty()) { addToLocalResources(fs, log4jPropFile, log4jPath, appId.toString(), localResources, null); } // The shell script has to be made available on the final container(s) // where it will be executed. // To do this, we need to first copy into the filesystem that is visible // to the yarn framework. // We do not need to set this as a local resource for the application // master as the application master does not need it. String hdfsShellScriptLocation = ""; long hdfsShellScriptLen = 0; long hdfsShellScriptTimestamp = 0; if (!shellScriptPath.isEmpty()) { Path shellSrc = new Path(shellScriptPath); String shellPathSuffix = appName + "/" + appId.toString() + "/" + SCRIPT_PATH; Path shellDst = new Path(fs.getHomeDirectory(), shellPathSuffix); fs.copyFromLocalFile(false, true, shellSrc, shellDst); hdfsShellScriptLocation = shellDst.toUri().toString(); FileStatus shellFileStatus = fs.getFileStatus(shellDst); hdfsShellScriptLen = shellFileStatus.getLen(); hdfsShellScriptTimestamp = shellFileStatus.getModificationTime(); } if (!shellCommand.isEmpty()) { addToLocalResources(fs, null, shellCommandPath, appId.toString(), localResources, shellCommand); } if (shellArgs.length > 0) { addToLocalResources(fs, null, shellArgsPath, appId.toString(), localResources, StringUtils.join(shellArgs, " ")); } // Set the env variables to be setup in the env where the application master will be run LOG.info("Set the environment for the application master"); Map<String, String> env = new HashMap<String, String>(); // put location of shell script into env // using the env info, the application master will create the correct local resource for the // eventual containers that will be launched to execute the shell scripts env.put(DSConstants.DISTRIBUTEDSHELLSCRIPTLOCATION, hdfsShellScriptLocation); env.put(DSConstants.DISTRIBUTEDSHELLSCRIPTTIMESTAMP, Long.toString(hdfsShellScriptTimestamp)); env.put(DSConstants.DISTRIBUTEDSHELLSCRIPTLEN, Long.toString(hdfsShellScriptLen)); // Add AppMaster.jar location to classpath // At some point we should not be required to add // the hadoop specific classpaths to the env. // It should be provided out of the box. // For now setting all required classpaths including // the classpath to "." for the application jar StringBuilder classPathEnv = new StringBuilder(Environment.CLASSPATH.$$()).append(ApplicationConstants.CLASS_PATH_SEPARATOR).append("./*"); for (String c : conf.getStrings(YarnConfiguration.YARN_APPLICATION_CLASSPATH,YarnConfiguration.DEFAULT_YARN_CROSS_PLATFORM_APPLICATION_CLASSPATH)) {classPathEnv.append(ApplicationConstants.CLASS_PATH_SEPARATOR); classPathEnv.append(c.trim()); }classPathEnv.append(ApplicationConstants.CLASS_PATH_SEPARATOR).append( "./log4j.properties"); // Set the necessary command to execute the application master Vector<CharSequence> vargs = new Vector<CharSequence>(30); // Set java executable command LOG.info("Setting up app master command"); vargs.add(Environment.JAVA_HOME.$$() + "/bin/java"); // Set Xmx based on am memory size vargs.add("-Xmx" + amMemory + "m"); // Set class name vargs.add(appMasterMainClass); // Set params for Application Master vargs.add("--container_memory " + String.valueOf(containerMemory)); vargs.add("--container_vcores " + String.valueOf(containerVirtualCores)); vargs.add("--num_containers " + String.valueOf(numContainers)); vargs.add("--priority " + String.valueOf(shellCmdPriority)); for (Map.Entry<String, String> entry : shellEnv.entrySet()) { vargs.add("--shell_env " + entry.getKey() + "=" + entry.getValue()); } if (debugFlag) { vargs.add("--debug"); } vargs.add("1>" + ApplicationConstants.LOG_DIR_EXPANSION_VAR + "/AppMaster.stdout"); vargs.add("2>" + ApplicationConstants.LOG_DIR_EXPANSION_VAR + "/AppMaster.stderr"); // Get final command StringBuilder command = new StringBuilder(); for (CharSequence str : vargs) { command.append(str).append(" "); } LOG.info("Completed setting up app master command " + command.toString()); List<String> commands = new ArrayList<String>(); commands.add(command.toString()); // Set up the container launch context for the application master ContainerLaunchContext amContainer = ContainerLaunchContext.newInstance( localResources, env, commands, null, null, null);// Set up resource type requirements // For now, both memory and vcores are supported, so we set memory and // vcores requirements Resource capability = Resource.newInstance(amMemory, amVCores); appContext.setResource(capability); // Service data is a binary blob that can be passed to the application // Not needed in this scenario // amContainer.setServiceData(serviceData); // Setup security tokens if (UserGroupInformation.isSecurityEnabled()) { // Note: Credentials class is marked as LimitedPrivate for HDFS and MapReduce Credentials credentials = new Credentials(); String tokenRenewer = conf.get(YarnConfiguration.RM_PRINCIPAL); if (tokenRenewer == null | | tokenRenewer.length() == 0) { throw new IOException( "Can't get Master Kerberos principal for the RM to use as renewer"); } // For now, only getting tokens for the default file-system. final Token<?> tokens[] = fs.addDelegationTokens(tokenRenewer, credentials); if (tokens != null) { for (Token<?> token : tokens) { LOG.info("Got dt for " + fs.getUri() + "; " + token); } } DataOutputBuffer dob = new DataOutputBuffer(); credentials.writeTokenStorageToStream(dob); ByteBuffer fsTokens = ByteBuffer.wrap(dob.getData(), 0, dob.getLength()); amContainer.setTokens(fsTokens); } appContext.setAMContainerSpec(amContainer);
  - After the setup process is complete, the client is ready to submit the application with the specified priority and queue.
    // Set the priority for the application master Priority pri = Priority.newInstance(amPriority); appContext.setPriority(pri); // Set the queue to which this application is to be submitted in the RM appContext.setQueue(amQueue); // Submit the application to the applications manager // SubmitApplicationResponse submitResp = applicationsManager.submitApplication(appRequest); yarnClient.submitApplication(appContext);
  - At this time, the RM has accepted the application, and the RM will allocate a container in the background, and then set and start the AM in the container.
  - The client has many ways to track the progress of the actual task.
    - The client can communicate with the RM and request the application report through the YARNClient.getApplicationReport() method.
      // Get application report for the appId we are interested in ApplicationReport report = yarnClient.getApplicationReport(appId);
    - ApplicationReport received from RM consists of:
      - General application information: application id, queue to submit to, user submitting the application and application start time.
      - AM details: the host where the AM is located, the rpc port that listens for client requests, and the token required for communication between the client and the AM.
      - Application tracking information: If the application supports progress tracking, you can set the tracking url obtained through ApplicationReport's getTrackingUrl() method, and the client can view the progress accordingly.
      - Application state: The application state seen by RM can be obtained through ApplicationReport#getYarnApplicationState. If YarnApplicationState is set to final, the client should refer to ApplicationReport#getFinalApplicationStatus to confirm the actual success/failure of the application task itself. In the case of failure, ApplicationReport#getDiagnostics can provide some failure information.
    - If the AM supports it, the client can directly query the AM itself for progress updates, through the host:rpcport information obtained from the application report. If available, the tracking url obtained from the report can also be used.
  - In certain cases, the client wants to kill the application if the application is taking too long or for other reasons. YARNClient supports killApplication call to send kill signal to AM through RM.
    yarnClient.killApplication(appId);
- Writing an ApplicationMaster
  - AM is the true owner of the task. The RM starts the AM and provides all the information and resources necessary to supervise and complete the task through the client.
  - When AM is started in a container, this container may share a physical host with other containers, considering the multi-tenancy nature, among other things, it cannot be assumed that it is listening on a previously configured port.
  - When AM starts, many parameters are available through the environment. Include containerID, application submission time and details about the NM host running AM. Refer to ApplicationConstants for parameter names.
  - All interactions with RM require ApplicationAttemptID (there can be multiple attempts per application to prevent failures). ApplicationAttemptID can be obtained through AM's containerid. Helper APIs can convert values obtained from the environment into objects.
    Map<String, String> envs = System.getenv(); String containerIdString = envs.get(ApplicationConstants.AM_CONTAINER_ID_ENV); if (containerIdString == null) { // container id should always be set in the env by the framework throw new IllegalArgumentException( "ContainerId not set in the environment"); } ContainerId containerId = ConverterUtils.toContainerId(containerIdString); ApplicationAttemptId appAttemptID = containerId.getApplicationAttemptId();
  - After the AM is fully initialized, two clients can be started: one to RM, one to NM. Client-side custom event handlers can be set.
    AMRMClientAsync.CallbackHandler allocListener = new RMCallbackHandler(); amRMClient = AMRMClientAsync.createAMRMClientAsync(1000, allocListener); amRMClient.init(conf); amRMClient.start(); containerListener = createNMCallbackHandler(); nmClientAsync = new NMClientAsyncImpl(containerListener); nmClientAsync.init(conf); nmClientAsync.start();
  - The AM has to send a heartbeat to the RM to let the RM know that the AM is alive. The RM expiration timeout interval is defined by the configuration YarnConfiguration.RM_AM_EXPIRY_INTERVAL_MS, the default is YarnConfiguration.DEFAULT_RM_AM_EXPIRY_INTERVAL_MS. AM needs to register with RM to start heartbeat.
    // Register self with ResourceManager // This will start heartbeating to the RM appMasterHostname = NetUtils.getHostname(); RegisterApplicationMasterResponse response = amRMClient .registerApplicationMaster(appMasterHostname, appMasterRpcPort, appMasterTrackingUrl);
  - There is maximum resource capacity in the registered reply, if present. You might want to use this to inspect the application's requests.
```
// Dump out information about cluster capability as seen by the
// resource manager
int maxMem = response.getMaximumResourceCapability().getMemory();
LOG.info("Max mem capability of resources in this cluster " + maxMem);
int maxVCores = response.getMaximumResourceCapability().getVirtualCores();
LOG.info("Max vcores capability of resources in this cluster " + maxVCores);
// A resource ask cannot exceed the max.
if (containerMemory > maxMem) {
LOG.info("Container memory specified above max threshold of cluster."
+ " Using max value." + ", specified=" + containerMemory + ", max="
+ maxMem);
containerMemory = maxMem;
}
if (containerVirtualCores > maxVCores) {
LOG.info("Container virtual cores specified above max threshold of  cluster."
+ " Using max value." + ", specified=" + containerVirtualCores + ", max="
+ maxVCores);
containerVirtualCores = maxVCores;
}
List<Container> previousAMRunningContainers =
response.getContainersFromPreviousAttempts();
LOG.info("Received " + previousAMRunningContainers.size()
+ " previous AM's running containers on AM registration.");
```
  - Based on task requirements, an AM can apply for a container set to run its tasks. We can now calculate how many containers we need, and apply for containers.
    List<Container> previousAMRunningContainers = response.getContainersFromPreviousAttempts(); LOG.info("Received " + previousAMRunningContainers.size() + "previous AM's running containers on AM registration."); int numTotalContainersToRequest = numTotalContainers - previousAMRunningContainers.size(); // Setup ask for containers from RM // Send request for containers to RM // Until we get our fully allocated quota, we keep on polling RM for // containers // Keep looping until all the containers are launched and shell script // executed on them ( regardless of success/failure). for (int i = 0; i < numTotalContainersToRequest; ++i) { ContainerRequest containerAsk = setupContainerAskForRM(); amRMClient.addContainerRequest(containerAsk); }
  - In setupContainerAskForRM(), the following two need to be set:
    - Resource Capability: Currently YARN supports memory based on resource requirements, so the request should define how much memory is needed. The value is in MB and must be less than the maximum capacity of the cluster and must be an integer multiple of the minimum capacity. The memory resource is related to the physical memory limit imposed on the container of the task.
    - Priority: When applying for a container set, AM will define different priorities for the container. For example, Map-Reduce AM will assign a higher priority to the container required by the Map task and a lower priority to the container of the reduce task.
      private ContainerRequest setupContainerAskForRM() { // setup requirements for hosts // using * as any host will do for the distributed shell app // set the priority for the request Priority pri = Priority.newInstance(requestPriority); // Set up resource type requirements // For now, memory and CPU are supported so we set memory and cpu requirements Resource capability = Resource.newInstance(containerMemory, containerVirtualCores); ContainerRequest request = new ContainerRequest(capability, null, null, pri); LOG.info("Requested container ask: " + request.toString()); return request; }
  - After the application manager issues the container allocation request, the container will be started asynchronously by the event handler of the AMRMClientAsync client. This handler should implement the AMRMClientAsync.CallbackHandler interface.
    - When a container is allocated, the processor sets up the thread and runs the code to start the container. Here we use LaunchContainerRunnable to illustrate. Our UI discusses the LaunchContainerRunnable class in the lower part of the article.
```
public void onContainersAllocated(List<Container> allocatedContainers) {
LOG.info("Got response from RM for container ask, allocatedCnt="
+ allocatedContainers.size());
numAllocatedContainers.addAndGet(allocatedContainers.size());
for (Container allocatedContainer : allocatedContainers) {
LaunchContainerRunnable runnableLaunchContainer =
new LaunchContainerRunnable(allocatedContainer, containerListener);
Thread launchThread = new Thread(runnableLaunchContainer);
// launch and start the container on a separate thread to keep
// the main thread unblocked
// as all containers may not be allocated at one go.
launchThreads.add(launchThread);
launchThread.start();
}
}
```
    - With a heartbeat, the event handler reports the progress of the application.
```
public float getProgress() {
// set progress to deliver to RM on next heartbeat
float progress = (float) numCompletedContainers.get()
/ numTotalContainers;
return progress;
}
```
  - The thread that starts the container actually starts the container on the NM. After the container is assigned to the AM, the container needs to execute the process that the client executes when setting up the ContainerLaunchContext to run the final task on it. Once the ContainerLaunchContext is defined, the AM can start it via NMClientAsync.
    // Set the necessary command to execute on the allocated container Vector<CharSequence> vargs = new Vector<CharSequence>(5); // Set executable command vargs.add(shellCommand); // Set shell script path if (!scriptPath.isEmpty()) { vargs.add(Shell.WINDOWS ? ExecBatScripStringtPath : ExecShellStringPath); } // Set args for the shell command if any vargs.add(shellArgs); // Add log redirect params vargs.add("1>" + ApplicationConstants.LOG_DIR_EXPANSION_VAR + "/stdout"); vargs.add("2>" + ApplicationConstants.LOG_DIR_EXPANSION_VAR + "/stderr"); // Get final command StringBuilder command = new StringBuilder(); for (CharSequence str : vargs) { command.append(str).append(" "); } List<String> commands = new ArrayList<String>(); commands.add(command.toString()); // Set up ContainerLaunchContext, setting local resource, environment, // command and token for constructor. // Note for tokens: Set up tokens for the container too. Today, for normal // shell commands, the container in distribute-shell doesn't need any // tokens. We are populating them mainly for NodeManagers to be able to // download anyfiles in the distributed file-system. The tokens are // otherwise also useful in cases, for e.g., when one is running a // "hadoop dfs" command inside the distributed shell. ContainerLaunchContext ctx = ContainerLaunchContext.newInstance( localResources, shellEnv, commands, null, allTokens.duplicate(), null); containerListener.addContainer(container.getId(), container); nmClientAsync.startContainerAsync(container, ctx);
  - The NMClientAsync object and its event handler handle container events. Including container start, stop, status update and error reporting.
  - After AM determines that the work is over, it needs to remove the registration through the AM-RM client and close the client.
    try { amRMClient.unregisterApplicationMaster(appStatus, appMessage, null); } catch (YarnException ex) { LOG.error("Failed to unregister application", ex); } catch (IOException e) { LOG.error("Failed to unregister application", e); } amRMClient.stop();
sample code

Guess you like