[In-depth explanation of Yarn architecture and implementation] 4-2 RM Management Application Master

The previous article described the overall architecture and functions of ResourceManager. This article will give an in-depth explanation of the part of managing Application Master in RM.
The following will introduce the overall communication execution process between RM and AM, and explain the corresponding services involved in RM in detail.
In order to better learn the knowledge in this article, it is recommended to familiarize yourself with the following knowledge points first. If you don’t understand the parts, you can turn to the previous corresponding articles for learning:

  • RPC (2-2 Yarn base library - underlying communication library RPC)
  • Event handler (2-3 Yarn basic library - service library and event library)
  • AM program execution flow (written by 3-3 Yarn Application Master)

1. AM Execution Process

After the client submits the task to RM, the process from starting AM to task completion is as follows:
image.png

For the specific execution of each step, please refer to the following explanations for each service.

2. Main components of AM management

The ApplicationMaster management part mainly consists of three services, which jointly manage the AM life cycle of the application.
(The following services can find the corresponding class in the source code according to the name, you can see its specific implementation logic)

一)ApplicationMasterLauncher

  • "Service & Event Processor" handles AM's LAUNCH and CLEANUP events
  • You can see from the source code that the EventHandler handlemethod creates a Runnable object after receiving an AM event, and then puts masterEventsit in the blocking queue, launcherHandlingThreadcontinuously takes out events from the queue, and submits them launcherPoolto for processing. (The flowchart is shown below)

image.png

二)AMLivelinessMonitor

  • Check service activity (whether there is a heartbeat)
  • Inherited from the abstract class AbstractLivelinessMonitor, the live check logic has been implemented in the abstract class, if the heartbeat information is not reported within a period of time, the task will hang up. AMLivelinessMonitorJust define the logic that needs to be handled when the AM is considered expired.
  • An EXPIRE event is sent on failure RMAppAttemptEvent.

Abstract class AbstractLivelinessMonitorbrief introduction:

public abstract class AbstractLivelinessMonitor<O> extends AbstractService {
    
    
    
// 里面最重要的检查函数
// 定期遍历记录的 list,看是否有超时的
// 检查周期默认为超时时间的 1/3
  private class PingChecker implements Runnable {
    
    

    @Override
    public void run() {
    
    
      while (!stopped && !Thread.currentThread().isInterrupted()) {
    
    
        synchronized (AbstractLivelinessMonitor.this) {
    
    
          Iterator<Map.Entry<O, Long>> iterator = 
            running.entrySet().iterator();

          //avoid calculating current time everytime in loop
          long currentTime = clock.getTime();

          while (iterator.hasNext()) {
    
    
            Map.Entry<O, Long> entry = iterator.next();
            if (currentTime > entry.getValue() + expireInterval) {
    
    
              iterator.remove();
              expire(entry.getKey());
              LOG.info("Expired:" + entry.getKey().toString() + 
                      " Timed out after " + expireInterval/1000 + " secs");
            }
          }
        }
        try {
    
    
          Thread.sleep(monitorInterval);
        } catch (InterruptedException e) {
    
    
          LOG.info(getName() + " thread interrupted");
          break;
        }
      }
    }
  }

三)ApplicationMasterService

  • It is the implementation class ApplicationMasterProtocolof .
  • Receive and process requests from AM: mainly including registration, heartbeat, and cleanup.
  • The heartbeat is implemented by calling ApplicationMasterProtocol#allocatethe method regularly, and its main functions are:
    • request resource
    • Get newly allocated resources
    • Tell RM periodically that it is alive (heartbeat)

3. Summary

This article mainly introduces the management part of AM in RM. Firstly, it introduces the interaction process between RM related components and AM, and then introduces the execution logic of each service, RPC call, etc. in detail. In this article, only the ApplicationMasterLauncher component is explained in detail, and the drawings are illustrated. Students who are interested in the rest can sort out by themselves.
When learning this part of knowledge, it is recommended to sort out the source code to better understand the process.

Guess you like

Origin blog.csdn.net/shuofxz/article/details/128424638