[In-depth explanation of Yarn architecture and implementation] 4-4 RM Management Application

In YARN, Application refers to an application, which may start multiple running instances. Each running instance consists of an ApplicationMaster and a group of tasks started by the ApplicationMaster. It has attributes such as name, queue, and priority, and is a comparison Broad concept, can be a MepReduce job, a DAG application, etc. Application management in YARN involves application rights management, startup and shutdown, life cycle management, etc. This section only introduces the most basic management content, such as rights management, startup and shutdown, etc., and life cycle management is put in the next section introduce.

一、ApplicationACLsManager

ApplicationACLsManager is responsible for managing application access permissions

  • View permissions
    • Basic information of the program: running time, priority, etc.
  • Modify permissions
    • Modify program priority, kill applications

Two, RMAppManager

RMAppManagerResponsible for application startup and shutdown. Next, combined with the source code, we mainly analyze the two operations of starting and ending.

1. Start

In "4-1 ResourceManager function overview", it is mentioned ClientRMServiceto process various RPC requests from the client, such as submitting, terminating and obtaining the running status of the application.
ClientRMServiceAfter receiving the application submitted by the client, the function will be called RMAppManager#submitApplicationto create a RMAppobject to maintain the entire life cycle of the application.

protected void submitApplication() {
    
    
    // 创建 app,并添加到 RMActiveServiceContext.applications
	RMAppImpl application =
    	createAndPopulateNewRMApp(submissionContext, submitTime, user, false);

    // 发送 app start event,继续由其他事件处理器处理
    this.rmContext.getDispatcher().getEventHandler()
        .handle(new RMAppEvent(applicationId, RMAppEventType.START));
}

2. end

When RMAPP has finished running, an RMAPPManagerEventType.APP_COMPLETEDevent . Looking at the source code will perform 3 operations:

  public void handle(RMAppManagerEvent event) {
    
    
    ApplicationId applicationId = event.getApplicationId();
    LOG.debug("RMAppManager processing event for " 
        + applicationId + " of type " + event.getType());
    switch(event.getType()) {
    
    
      case APP_COMPLETED: 
      {
    
    
        finishApplication(applicationId);
        logApplicationSummary(applicationId);
        checkAppNumCompletedLimit(); 
      } 
  • finishApplication()
    • Put Application into the completed list completedAppsin , and users can query historical application execution information (such as yarn web).
  • logApplicationSummary()
    • Print log information.
  • checkAppNumCompletedLimit()
    • completedAppsThe list mentioned above has a limited capacity, the default is 10000, which can be modified. When this value is exceeded, it will be removed from here and can be viewed from History Server later.
    • Remove the application RMStateStorefrom . RMStateStore records the operation logs of running applications. When the cluster restarts after a failure, RM can restore the running status of applications through these logs, thereby avoiding all reruns. Once the application runs, these logs lose their meaning, so It can be deleted.

三、ContainerAllocationExpirer

After the AM obtains the Container, it must start the Container on the corresponding NM within a certain period of time (the default is 10 minutes, which can be modified), otherwise the RM will forcibly recycle the Container. Because YARN does not allow the AM to not be used for a long time, it will reduce the utilization of the entire cluster.

protected void expire(AllocationExpirationInfo allocationExpirationInfo) {
    
    
  dispatcher.handle(new ContainerExpiredSchedulerEvent(
      allocationExpirationInfo.getContainerId(),
          allocationExpirationInfo.isIncrease()));
}

This class also inherits from the abstract class AbstractLivelinessMonitor, which has been mentioned before, so I won't repeat it here.

Guess you like

Origin blog.csdn.net/shuofxz/article/details/128649128