flink分析使用之六JobMaster

一、JobMaster和JobManager

在上一篇着重分析了工作图的导入和分发,由于版本迭代的缘故,JobMaster和JobManager完成的工作逻辑基本是一样的,这里只介绍Jobmaster,这里不再介绍老的JobManager。在前面提到过,工作的图的传递和分发是通过JobManagerRunner的生成,递送到JobMaster,然后再由ExecutionGraph递送到Task。
在前面的代码中:

public abstract class Dispatcher extends FencedRpcEndpoint<DispatcherId> implements
	DispatcherGateway, LeaderContender, SubmittedJobGraphStore.SubmittedJobGraphListener {
......

	private final Map<JobID, CompletableFuture<JobManagerRunner>> jobManagerRunnerFutures;

	private final LeaderElectionService leaderElectionService;

	private final ArchivedExecutionGraphStore archivedExecutionGraphStore;

  //JobManagerRunner的生成工厂类对象
	private final JobManagerRunnerFactory jobManagerRunnerFactory;
  ......
}
//这里得到了JobManagerRunner并且archivedExecutionGraph和ExecutionGraph继承了共同的接口AccessExecutionGraph
private JobManagerRunner startJobManagerRunner(JobManagerRunner jobManagerRunner) throws Exception {
  final JobID jobId = jobManagerRunner.getJobGraph().getJobID();
  jobManagerRunner.getResultFuture().whenCompleteAsync(
    (ArchivedExecutionGraph archivedExecutionGraph, Throwable throwable) -> {
      // check if we are still the active JobManagerRunner by checking the identity
      //noinspection ObjectEquality
      if (jobManagerRunner == jobManagerRunnerFutures.get(jobId).getNow(null)) {
        if (archivedExecutionGraph != null) {
          jobReachedGloballyTerminalState(archivedExecutionGraph);
        } else {
          final Throwable strippedThrowable = ExceptionUtils.stripCompletionException(throwable);
......
        }
      } else {
        log.debug("There is a newer JobManagerRunner for the job {}.", jobId);
      }
    }, getMainThreadExecutor());

  jobManagerRunner.start();

  return jobManagerRunner;
}

通过上面的两个地方可以看到这两个类的应用,那么,下面看一下这两个类的实现:


public class JobManagerRunner implements LeaderContender, OnCompletionActions, AutoCloseableAsync {

	private static final Logger log = LoggerFactory.getLogger(JobManagerRunner.class);

	// ------------------------------------------------------------------------

	/** Lock to ensure that this runner can deal with leader election event and job completion notifies simultaneously. */
	private final Object lock = new Object();

	/** The job graph needs to run. */
	private final JobGraph jobGraph;

	/** Used to check whether a job needs to be run. */
	private final RunningJobsRegistry runningJobsRegistry;

	/** Leader election for this job. */
	private final LeaderElectionService leaderElectionService;

	private final LibraryCacheManager libraryCacheManager;

	private final Executor executor;

	private final JobMasterService jobMasterService;

	private final FatalErrorHandler fatalErrorHandler;

	private final CompletableFuture<ArchivedExecutionGraph> resultFuture;

	private final CompletableFuture<Void> terminationFuture;

	private CompletableFuture<Void> leadershipOperation;

	/** flag marking the runner as shut down. */
	private volatile boolean shutdown;

	private volatile CompletableFuture<JobMasterGateway> leaderGatewayFuture;

	// ------------------------------------------------------------------------

	/**
	 * Exceptions that occur while creating the JobManager or JobManagerRunner are directly
	 * thrown and not reported to the given {@code FatalErrorHandler}.
	 *
	 * @throws Exception Thrown if the runner cannot be set up, because either one of the
	 *                   required services could not be started, or the Job could not be initialized.
	 */
	public JobManagerRunner(
			final JobGraph jobGraph,
			final JobMasterServiceFactory jobMasterFactory,
			final HighAvailabilityServices haServices,
			final LibraryCacheManager libraryCacheManager,
			final Executor executor,
			final FatalErrorHandler fatalErrorHandler) throws Exception {

		this.resultFuture = new CompletableFuture<>();
		this.terminationFuture = new CompletableFuture<>();
		this.leadershipOperation = CompletableFuture.completedFuture(null);

		// make sure we cleanly shut down out JobManager services if initialization fails
		try {
			this.jobGraph = checkNotNull(jobGraph);
			this.libraryCacheManager = checkNotNull(libraryCacheManager);
			this.executor = checkNotNull(executor);
			this.fatalErrorHandler = checkNotNull(fatalErrorHandler);

			checkArgument(jobGraph.getNumberOfVertices() > 0, "The given job is empty");

			// libraries and class loader first
			try {
				libraryCacheManager.registerJob(
						jobGraph.getJobID(), jobGraph.getUserJarBlobKeys(), jobGraph.getClasspaths());
			} catch (IOException e) {
				throw new Exception("Cannot set up the user code libraries: " + e.getMessage(), e);
			}

			final ClassLoader userCodeLoader = libraryCacheManager.getClassLoader(jobGraph.getJobID());
			if (userCodeLoader == null) {
				throw new Exception("The user code class loader could not be initialized.");
			}

			// high availability services next
			this.runningJobsRegistry = haServices.getRunningJobsRegistry();
			this.leaderElectionService = haServices.getJobManagerLeaderElectionService(jobGraph.getJobID());

			this.leaderGatewayFuture = new CompletableFuture<>();

			// now start the JobManager
			this.jobMasterService = jobMasterFactory.createJobMasterService(jobGraph, this, userCodeLoader);
		}
		catch (Throwable t) {
			terminationFuture.completeExceptionally(t);
			resultFuture.completeExceptionally(t);

			throw new JobExecutionException(jobGraph.getJobID(), "Could not set up JobManager", t);
		}
	}
......

	public void start() throws Exception {
		try {
			leaderElectionService.start(this);
		} catch (Exception e) {
			log.error("Could not start the JobManager because the leader election service did not start.", e);
			throw new Exception("Could not start the leader election service.", e);
		}
	}
......

	private void setNewLeaderGatewayFuture() {
		final CompletableFuture<JobMasterGateway> oldLeaderGatewayFuture = leaderGatewayFuture;

		leaderGatewayFuture = new CompletableFuture<>();

		if (!oldLeaderGatewayFuture.isDone()) {
			leaderGatewayFuture.whenComplete(
				(JobMasterGateway jobMasterGateway, Throwable throwable) -> {
					if (throwable != null) {
						oldLeaderGatewayFuture.completeExceptionally(throwable);
					} else {
						oldLeaderGatewayFuture.complete(jobMasterGateway);
					}
				});
		}
	}
......
}
//在org.apache.flink.runtime.dispatcher这个包中创建了这个JobManagerRunner
public enum DefaultJobManagerRunnerFactory implements JobManagerRunnerFactory {
	INSTANCE;

	@Override
	public JobManagerRunner createJobManagerRunner(
			JobGraph jobGraph,
			Configuration configuration,
			RpcService rpcService,
			HighAvailabilityServices highAvailabilityServices,
			HeartbeatServices heartbeatServices,
			JobManagerSharedServices jobManagerServices,
			JobManagerJobMetricGroupFactory jobManagerJobMetricGroupFactory,
			FatalErrorHandler fatalErrorHandler) throws Exception {
......

		return new JobManagerRunner(
			jobGraph,
			jobMasterFactory,
			highAvailabilityServices,
			jobManagerServices.getLibraryCacheManager(),
			jobManagerServices.getScheduledExecutorService(),
			fatalErrorHandler);
	}
}

//下面是分发执行图的类
public class ExecutionGraph implements AccessExecutionGraph {

......

	/** Job specific information like the job id, job name, job configuration, etc. */
	private final JobInformation jobInformation;

	/** Serialized job information or a blob key pointing to the offloaded job information. */
	private final Either<SerializedValue<JobInformation>, PermanentBlobKey> jobInformationOrBlobKey;

	/** The executor which is used to execute futures. */
	private final ScheduledExecutorService futureExecutor;

	/** The executor which is used to execute blocking io operations. */
	private final Executor ioExecutor;

	/** Executor that runs tasks in the job manager's main thread. */
	@Nonnull
	private ComponentMainThreadExecutor jobMasterMainThreadExecutor;

	/** {@code true} if all source tasks are stoppable. */
	private boolean isStoppable = true;

	/** All job vertices that are part of this graph. */
	private final ConcurrentHashMap<JobVertexID, ExecutionJobVertex> tasks;

	/** All vertices, in the order in which they were created. **/
	private final List<ExecutionJobVertex> verticesInCreationOrder;

	/** All intermediate results that are part of this graph. */
	private final ConcurrentHashMap<IntermediateDataSetID, IntermediateResult> intermediateResults;

	/** The currently executed tasks, for callbacks. */
	private final ConcurrentHashMap<ExecutionAttemptID, Execution> currentExecutions;

	/** Listeners that receive messages when the entire job switches it status
	 * (such as from RUNNING to FINISHED). */
	private final List<JobStatusListener> jobStatusListeners;

	/** Listeners that receive messages whenever a single task execution changes its status. */
	private final List<ExecutionStatusListener> executionListeners;

	/** The implementation that decides how to recover the failures of tasks. */
	private final FailoverStrategy failoverStrategy;
......

	/** Current status of the job execution. */
	private volatile JobStatus state = JobStatus.CREATED;

	/** A future that completes once the job has reached a terminal state. */
	private volatile CompletableFuture<JobStatus> terminationFuture;

	/** On each global recovery, this version is incremented. The version breaks conflicts
	 * between concurrent restart attempts by local failover strategies. */
	private volatile long globalModVersion;

	/** The exception that caused the job to fail. This is set to the first root exception
	 * that was not recoverable and triggered job failure. */
	private volatile Throwable failureCause;

	/** The extended failure cause information for the job. This exists in addition to 'failureCause',
	 * to let 'failureCause' be a strong reference to the exception, while this info holds no
	 * strong reference to any user-defined classes.*/
	private volatile ErrorInfo failureInfo;

	/**
	 * Future for an ongoing or completed scheduling action.
	 */
	@Nullable
	private volatile CompletableFuture<Void> schedulingFuture;

	// ------ Fields that are relevant to the execution and need to be cleared before archiving  -------

	/** The coordinator for checkpoints, if snapshot checkpoints are enabled. */
	private CheckpointCoordinator checkpointCoordinator;

	/** Checkpoint stats tracker separate from the coordinator in order to be
	 * available after archiving. */
	private CheckpointStatsTracker checkpointStatsTracker;

	// ------ Fields that are only relevant for archived execution graphs ------------
	private String jsonPlan;

	@VisibleForTesting
	ExecutionGraph(
			ScheduledExecutorService futureExecutor,
			Executor ioExecutor,
			JobID jobId,
			String jobName,
			Configuration jobConfig,
			SerializedValue<ExecutionConfig> serializedConfig,
			Time timeout,
			RestartStrategy restartStrategy,
			SlotProvider slotProvider) throws IOException {

		this(
			new JobInformation(
				jobId,
				jobName,
				serializedConfig,
				jobConfig,
				Collections.emptyList(),
				Collections.emptyList()),
			futureExecutor,
			ioExecutor,
			timeout,
			restartStrategy,
			slotProvider);
	}

......
}

这两个类的基本作用知道了,一个用于从分发器在创建JobManagerRunner时内部启动服务createJobMasterService时由分发向下给JobMaster,一个经JobMaster分发到Task,可见JobMaster是一个中继者,看一下它的定义。

二、JobMaster的构成

这个类是作业管理的类,有点大:

public class JobMaster extends FencedRpcEndpoint<JobMasterId> implements JobMasterGateway, JobMasterService {

	/** Default names for Flink's distributed components. */
	public static final String JOB_MANAGER_NAME = "jobmanager";
	public static final String ARCHIVE_NAME = "archive";

	// ------------------------------------------------------------------------

	private final JobMasterConfiguration jobMasterConfiguration;

	private final ResourceID resourceId;

	private final JobGraph jobGraph;

	private final Time rpcTimeout;

	private final HighAvailabilityServices highAvailabilityServices;

	private final BlobWriter blobWriter;

	private final JobManagerJobMetricGroupFactory jobMetricGroupFactory;

	private final HeartbeatManager<AccumulatorReport, Void> taskManagerHeartbeatManager;

	private final HeartbeatManager<Void, Void> resourceManagerHeartbeatManager;

	private final ScheduledExecutorService scheduledExecutorService;

	private final OnCompletionActions jobCompletionActions;

	private final FatalErrorHandler fatalErrorHandler;

	private final ClassLoader userCodeLoader;

	private final SlotPool slotPool;

	private final Scheduler scheduler;

	private final RestartStrategy restartStrategy;

	// --------- BackPressure --------

	private final BackPressureStatsTracker backPressureStatsTracker;

	// --------- ResourceManager --------

	private final LeaderRetrievalService resourceManagerLeaderRetriever;

	// --------- TaskManagers --------

	private final Map<ResourceID, Tuple2<TaskManagerLocation, TaskExecutorGateway>> registeredTaskManagers;

	// -------- Mutable fields ---------

	private ExecutionGraph executionGraph;

	@Nullable
	private JobManagerJobStatusListener jobStatusListener;

	@Nullable
	private JobManagerJobMetricGroup jobManagerJobMetricGroup;

	@Nullable
	private String lastInternalSavepoint;

	@Nullable
	private ResourceManagerAddress resourceManagerAddress;

	@Nullable
	private ResourceManagerConnection resourceManagerConnection;

	@Nullable
	private EstablishedResourceManagerConnection establishedResourceManagerConnection;

	private Map<String, Object> accumulators;

	// ------------------------------------------------------------------------

	public JobMaster(
			RpcService rpcService,
			JobMasterConfiguration jobMasterConfiguration,
			ResourceID resourceId,
			JobGraph jobGraph,
			HighAvailabilityServices highAvailabilityService,
			SlotPoolFactory slotPoolFactory,
			SchedulerFactory schedulerFactory,
			JobManagerSharedServices jobManagerSharedServices,
			HeartbeatServices heartbeatServices,
			JobManagerJobMetricGroupFactory jobMetricGroupFactory,
			OnCompletionActions jobCompletionActions,
			FatalErrorHandler fatalErrorHandler,
			ClassLoader userCodeLoader) throws Exception {

		super(rpcService, AkkaRpcServiceUtils.createRandomName(JOB_MANAGER_NAME));

		final JobMasterGateway selfGateway = getSelfGateway(JobMasterGateway.class);

		this.jobMasterConfiguration = checkNotNull(jobMasterConfiguration);
		this.resourceId = checkNotNull(resourceId);
		this.jobGraph = checkNotNull(jobGraph);
		this.rpcTimeout = jobMasterConfiguration.getRpcTimeout();
		this.highAvailabilityServices = checkNotNull(highAvailabilityService);
		this.blobWriter = jobManagerSharedServices.getBlobWriter();
		this.scheduledExecutorService = jobManagerSharedServices.getScheduledExecutorService();
		this.jobCompletionActions = checkNotNull(jobCompletionActions);
		this.fatalErrorHandler = checkNotNull(fatalErrorHandler);
		this.userCodeLoader = checkNotNull(userCodeLoader);
		this.jobMetricGroupFactory = checkNotNull(jobMetricGroupFactory);

		this.taskManagerHeartbeatManager = heartbeatServices.createHeartbeatManagerSender(
			resourceId,
			new TaskManagerHeartbeatListener(selfGateway),
			rpcService.getScheduledExecutor(),
			log);

		this.resourceManagerHeartbeatManager = heartbeatServices.createHeartbeatManager(
				resourceId,
				new ResourceManagerHeartbeatListener(),
				rpcService.getScheduledExecutor(),
				log);

		final String jobName = jobGraph.getName();
		final JobID jid = jobGraph.getJobID();

		log.info("Initializing job {} ({}).", jobName, jid);

		final RestartStrategies.RestartStrategyConfiguration restartStrategyConfiguration =
				jobGraph.getSerializedExecutionConfig()
						.deserializeValue(userCodeLoader)
						.getRestartStrategy();

		this.restartStrategy = RestartStrategyResolving.resolve(restartStrategyConfiguration,
			jobManagerSharedServices.getRestartStrategyFactory(),
			jobGraph.isCheckpointingEnabled());

		log.info("Using restart strategy {} for {} ({}).", this.restartStrategy, jobName, jid);

		resourceManagerLeaderRetriever = highAvailabilityServices.getResourceManagerLeaderRetriever();

		this.slotPool = checkNotNull(slotPoolFactory).createSlotPool(jobGraph.getJobID());

		this.scheduler = checkNotNull(schedulerFactory).createScheduler(slotPool);

		this.registeredTaskManagers = new HashMap<>(4);

		this.backPressureStatsTracker = checkNotNull(jobManagerSharedServices.getBackPressureStatsTracker());
		this.lastInternalSavepoint = null;

		this.jobManagerJobMetricGroup = jobMetricGroupFactory.create(jobGraph);
		this.executionGraph = createAndRestoreExecutionGraph(jobManagerJobMetricGroup);
		this.jobStatusListener = null;

		this.resourceManagerConnection = null;
		this.establishedResourceManagerConnection = null;

		this.accumulators = new HashMap<>();
	}



	@Override
	public CompletableFuture<Acknowledge> rescaleOperators(
			Collection<JobVertexID> operators,
			int newParallelism,
			RescalingBehaviour rescalingBehaviour,
			Time timeout) {

		if (newParallelism <= 0) {
			return FutureUtils.completedExceptionally(
				new JobModificationException("The target parallelism of a rescaling operation must be larger than 0."));
		}

		// 1. Check whether we can rescale the job & rescale the respective vertices
		try {
			rescaleJobGraph(operators, newParallelism, rescalingBehaviour);
		} catch (FlinkException e) {
			final String msg = String.format("Cannot rescale job %s.", jobGraph.getName());

			log.info(msg, e);
			return FutureUtils.completedExceptionally(new JobModificationException(msg, e));
		}

		final ExecutionGraph currentExecutionGraph = executionGraph;

		final JobManagerJobMetricGroup newJobManagerJobMetricGroup = jobMetricGroupFactory.create(jobGraph);
		final ExecutionGraph newExecutionGraph;

		try {
			newExecutionGraph = createExecutionGraph(newJobManagerJobMetricGroup);
		} catch (JobExecutionException | JobException e) {
			return FutureUtils.completedExceptionally(
				new JobModificationException("Could not create rescaled ExecutionGraph.", e));
		}

		// 3. disable checkpoint coordinator to suppress subsequent checkpoints
		final CheckpointCoordinator checkpointCoordinator = currentExecutionGraph.getCheckpointCoordinator();
		checkpointCoordinator.stopCheckpointScheduler();

		// 4. take a savepoint
		final CompletableFuture<String> savepointFuture = getJobModificationSavepoint(timeout);

		final CompletableFuture<ExecutionGraph> executionGraphFuture = restoreExecutionGraphFromRescalingSavepoint(
			newExecutionGraph,
			savepointFuture)
			.handleAsync(
				(ExecutionGraph executionGraph, Throwable failure) -> {
					if (failure != null) {
						// in case that we couldn't take a savepoint or restore from it, let's restart the checkpoint
						// coordinator and abort the rescaling operation
						if (checkpointCoordinator.isPeriodicCheckpointingConfigured()) {
							checkpointCoordinator.startCheckpointScheduler();
						}

						throw new CompletionException(ExceptionUtils.stripCompletionException(failure));
					} else {
						return executionGraph;
					}
				},
				getMainThreadExecutor());

		// 5. suspend the current job
		final CompletableFuture<JobStatus> terminationFuture = executionGraphFuture.thenComposeAsync(
			(ExecutionGraph ignored) -> {
				suspendExecutionGraph(new FlinkException("Job is being rescaled."));
				return currentExecutionGraph.getTerminationFuture();
			},
			getMainThreadExecutor());

		final CompletableFuture<Void> suspendedFuture = terminationFuture.thenAccept(
			(JobStatus jobStatus) -> {
				if (jobStatus != JobStatus.SUSPENDED) {
					final String msg = String.format("Job %s rescaling failed because we could not suspend the execution graph.", jobGraph.getName());
					log.info(msg);
					throw new CompletionException(new JobModificationException(msg));
				}
			});

		// 6. resume the new execution graph from the taken savepoint
		final CompletableFuture<Acknowledge> rescalingFuture = suspendedFuture.thenCombineAsync(
			executionGraphFuture,
			(Void ignored, ExecutionGraph restoredExecutionGraph) -> {
				// check if the ExecutionGraph is still the same
				if (executionGraph == currentExecutionGraph) {
					clearExecutionGraphFields();
					assignExecutionGraph(restoredExecutionGraph, newJobManagerJobMetricGroup);
					scheduleExecutionGraph();

					return Acknowledge.get();
				} else {
					throw new CompletionException(new JobModificationException("Detected concurrent modification of ExecutionGraph. Aborting the rescaling."));
				}

			},
			getMainThreadExecutor());

		rescalingFuture.whenCompleteAsync(
			(Acknowledge ignored, Throwable throwable) -> {
				if (throwable != null) {
					// fail the newly created execution graph
					newExecutionGraph.failGlobal(
						new SuppressRestartsException(
							new FlinkException(
								String.format("Failed to rescale the job %s.", jobGraph.getJobID()),
								throwable)));
				}
			}, getMainThreadExecutor());

		return rescalingFuture;
	}



	@Override
	public CompletableFuture<Collection<SlotOffer>> offerSlots(
			final ResourceID taskManagerId,
			final Collection<SlotOffer> slots,
			final Time timeout) {

		Tuple2<TaskManagerLocation, TaskExecutorGateway> taskManager = registeredTaskManagers.get(taskManagerId);

		if (taskManager == null) {
			return FutureUtils.completedExceptionally(new Exception("Unknown TaskManager " + taskManagerId));
		}

		final TaskManagerLocation taskManagerLocation = taskManager.f0;
		final TaskExecutorGateway taskExecutorGateway = taskManager.f1;

		final RpcTaskManagerGateway rpcTaskManagerGateway = new RpcTaskManagerGateway(taskExecutorGateway, getFencingToken());

		return CompletableFuture.completedFuture(
			slotPool.offerSlots(
				taskManagerLocation,
				rpcTaskManagerGateway,
				slots));
	}

	@Override
	public void failSlot(
			final ResourceID taskManagerId,
			final AllocationID allocationId,
			final Exception cause) {

		if (registeredTaskManagers.containsKey(taskManagerId)) {
			internalFailAllocation(allocationId, cause);
		} else {
			log.warn("Cannot fail slot " + allocationId + " because the TaskManager " +
			taskManagerId + " is unknown.");
		}
	}


	@Override
	public CompletableFuture<RegistrationResponse> registerTaskManager(
			final String taskManagerRpcAddress,
			final TaskManagerLocation taskManagerLocation,
			final Time timeout) {

		final ResourceID taskManagerId = taskManagerLocation.getResourceID();

		if (registeredTaskManagers.containsKey(taskManagerId)) {
			final RegistrationResponse response = new JMTMRegistrationSuccess(resourceId);
			return CompletableFuture.completedFuture(response);
		} else {
			return getRpcService()
				.connect(taskManagerRpcAddress, TaskExecutorGateway.class)
				.handleAsync(
					(TaskExecutorGateway taskExecutorGateway, Throwable throwable) -> {
						if (throwable != null) {
							return new RegistrationResponse.Decline(throwable.getMessage());
						}

						slotPool.registerTaskManager(taskManagerId);
						registeredTaskManagers.put(taskManagerId, Tuple2.of(taskManagerLocation, taskExecutorGateway));

						// monitor the task manager as heartbeat target
						taskManagerHeartbeatManager.monitorTarget(taskManagerId, new HeartbeatTarget<Void>() {
							@Override
							public void receiveHeartbeat(ResourceID resourceID, Void payload) {
								// the task manager will not request heartbeat, so this method will never be called currently
							}

							@Override
							public void requestHeartbeat(ResourceID resourceID, Void payload) {
								taskExecutorGateway.heartbeatFromJobManager(resourceID);
							}
						});

						return new JMTMRegistrationSuccess(resourceId);
					},
					getMainThreadExecutor());
		}
	}



	private Acknowledge startJobExecution(JobMasterId newJobMasterId) throws Exception {

		validateRunsInMainThread();

		checkNotNull(newJobMasterId, "The new JobMasterId must not be null.");

		if (Objects.equals(getFencingToken(), newJobMasterId)) {
			log.info("Already started the job execution with JobMasterId {}.", newJobMasterId);

			return Acknowledge.get();
		}

		setNewFencingToken(newJobMasterId);

		startJobMasterServices();

		log.info("Starting execution of job {} ({}) under job master id {}.", jobGraph.getName(), jobGraph.getJobID(), newJobMasterId);

		resetAndScheduleExecutionGraph();

		return Acknowledge.get();
	}

	private void startJobMasterServices() throws Exception {
		// start the slot pool make sure the slot pool now accepts messages for this leader
		slotPool.start(getFencingToken(), getAddress(), getMainThreadExecutor());
		scheduler.start(getMainThreadExecutor());

		//TODO: Remove once the ZooKeeperLeaderRetrieval returns the stored address upon start
		// try to reconnect to previously known leader
		reconnectToResourceManager(new FlinkException("Starting JobMaster component."));

		// job is ready to go, try to establish connection with resource manager
		//   - activate leader retrieval for the resource manager
		//   - on notification of the leader, the connection will be established and
		//     the slot pool will start requesting slots
		resourceManagerLeaderRetriever.start(new ResourceManagerLeaderListener());
	}

	private void setNewFencingToken(JobMasterId newJobMasterId) {
		if (getFencingToken() != null) {
			log.info("Restarting old job with JobMasterId {}. The new JobMasterId is {}.", getFencingToken(), newJobMasterId);

			// first we have to suspend the current execution
			suspendExecution(new FlinkException("Old job with JobMasterId " + getFencingToken() +
				" is restarted with a new JobMasterId " + newJobMasterId + '.'));
		}

		// set new leader id
		setFencingToken(newJobMasterId);
	}


	private void assignExecutionGraph(
			ExecutionGraph newExecutionGraph,
			JobManagerJobMetricGroup newJobManagerJobMetricGroup) {
		validateRunsInMainThread();
		checkState(executionGraph.getState().isTerminalState());
		checkState(jobManagerJobMetricGroup == null);

		executionGraph = newExecutionGraph;
		jobManagerJobMetricGroup = newJobManagerJobMetricGroup;
	}


	private void scheduleExecutionGraph() {
		checkState(jobStatusListener == null);
		// register self as job status change listener
		jobStatusListener = new JobManagerJobStatusListener();
		executionGraph.registerJobStatusListener(jobStatusListener);

		try {
			executionGraph.scheduleForExecution();
		}
		catch (Throwable t) {
			executionGraph.failGlobal(t);
		}
	}

	private ExecutionGraph createAndRestoreExecutionGraph(JobManagerJobMetricGroup currentJobManagerJobMetricGroup) throws Exception {

		ExecutionGraph newExecutionGraph = createExecutionGraph(currentJobManagerJobMetricGroup);

		final CheckpointCoordinator checkpointCoordinator = newExecutionGraph.getCheckpointCoordinator();

		if (checkpointCoordinator != null) {
			// check whether we find a valid checkpoint
			if (!checkpointCoordinator.restoreLatestCheckpointedState(
				newExecutionGraph.getAllVertices(),
				false,
				false)) {

				// check whether we can restore from a savepoint
				tryRestoreExecutionGraphFromSavepoint(newExecutionGraph, jobGraph.getSavepointRestoreSettings());
			}
		}

		return newExecutionGraph;
	}

	private ExecutionGraph createExecutionGraph(JobManagerJobMetricGroup currentJobManagerJobMetricGroup) throws JobExecutionException, JobException {
		return ExecutionGraphBuilder.buildGraph(
			null,
			jobGraph,
			jobMasterConfiguration.getConfiguration(),
			scheduledExecutorService,
			scheduledExecutorService,
			scheduler,
			userCodeLoader,
			highAvailabilityServices.getCheckpointRecoveryFactory(),
			rpcTimeout,
			restartStrategy,
			currentJobManagerJobMetricGroup,
			blobWriter,
			jobMasterConfiguration.getSlotRequestTimeout(),
			log);
	}

......

	private CompletableFuture<ExecutionGraph> restoreExecutionGraphFromRescalingSavepoint(ExecutionGraph newExecutionGraph, CompletableFuture<String> savepointFuture) {
		return savepointFuture
			.thenApplyAsync(
				(@Nullable String savepointPath) -> {
					if (savepointPath != null) {
						try {
							tryRestoreExecutionGraphFromSavepoint(newExecutionGraph, SavepointRestoreSettings.forPath(savepointPath, false));
						} catch (Exception e) {
							final String message = String.format("Could not restore from temporary rescaling savepoint. This might indicate " +
									"that the savepoint %s got corrupted. Deleting this savepoint as a precaution.",
								savepointPath);

							log.info(message);

							CompletableFuture
								.runAsync(
									() -> {
										if (savepointPath.equals(lastInternalSavepoint)) {
											lastInternalSavepoint = null;
										}
									},
									getMainThreadExecutor())
								.thenRunAsync(
									() -> disposeSavepoint(savepointPath),
									scheduledExecutorService);

							throw new CompletionException(new JobModificationException(message, e));
						}
					} else {
						// No rescaling savepoint, restart from the initial savepoint or none
						try {
							tryRestoreExecutionGraphFromSavepoint(newExecutionGraph, jobGraph.getSavepointRestoreSettings());
						} catch (Exception e) {
							final String message = String.format("Could not restore from initial savepoint. This might indicate " +
								"that the savepoint %s got corrupted.", jobGraph.getSavepointRestoreSettings().getRestorePath());

							log.info(message);

							throw new CompletionException(new JobModificationException(message, e));
						}
					}

					return newExecutionGraph;
				}, scheduledExecutorService);
	}

	private CompletableFuture<String> getJobModificationSavepoint(Time timeout) {
		return triggerSavepoint(
......
				getMainThreadExecutor());
	}

	 */
	private void rescaleJobGraph(Collection<JobVertexID> operators, int newParallelism, RescalingBehaviour rescalingBehaviour) throws FlinkException {
		for (JobVertexID jobVertexId : operators) {
			final JobVertex jobVertex = jobGraph.findVertexByID(jobVertexId);

			// update max parallelism in case that it has not been configured
			final ExecutionJobVertex executionJobVertex = executionGraph.getJobVertex(jobVertexId);

			if (executionJobVertex != null) {
				jobVertex.setMaxParallelism(executionJobVertex.getMaxParallelism());
			}

			rescalingBehaviour.accept(jobVertex, newParallelism);
		}
	}

	@Override
	public JobMasterGateway getGateway() {
		return getSelfGateway(JobMasterGateway.class);
	}

	private class ResourceManagerLeaderListener implements LeaderRetrievalListener {

		@Override
		public void notifyLeaderAddress(final String leaderAddress, final UUID leaderSessionID) {
			runAsync(
				() -> notifyOfNewResourceManagerLeader(
					leaderAddress,
					ResourceManagerId.fromUuidOrNull(leaderSessionID)));
		}

		@Override
		public void handleError(final Exception exception) {
			handleJobMasterError(new Exception("Fatal error in the ResourceManager leader service", exception));
		}
	}

	private class ResourceManagerConnection
			extends RegisteredRpcConnection<ResourceManagerId, ResourceManagerGateway, JobMasterRegistrationSuccess> {
		private final JobID jobID;
......
	}

	//----------------------------------------------------------------------------------------------

	private class JobManagerJobStatusListener implements JobStatusListener {

		private volatile boolean running = true;
......
	}

	private class TaskManagerHeartbeatListener implements HeartbeatListener<AccumulatorReport, Void> {

		private final JobMasterGateway jobMasterGateway;
......
	}

	private class ResourceManagerHeartbeatListener implements HeartbeatListener<Void, Void> {
......
		}

}

这个类有点长。但是通过看它的变量和功能函数,可以发现它主要有以下几个功能:
1、工作图的调度执行和管理
2、资源管理(Leader、Gateway、心跳等)
3、任务管理
4、调度分配
5、BackPressure控制
这里重点介绍一下作业图的调度管理,在JobMaster的构造函数里,会把大量的相关服务注册进来,同时得到JobGraph的ID,同时拿到Leader的信息。当然其它的一些基本的状态和管理数据结构也会根据配置文件等进行创建。其中最典型的是createSlotPool和createScheduler等,详细的内容可以看一下上面的构造函数的代码。需要注意的是,这里就包含上面提到的ExecutionGraph,如果仔细看,会发现这个变量几乎贯穿了整个JobMaster这个类,下面会重点分析一下这个变量的创建和使用。
在JobMaster中最主要的就是干了两件事,一个是JobGraph(通过ExecutionGraph)的处理分配,另外一个就是监听并处理分配任务的结果及状态。为了提高处理的效率,这里肯定要使用异步的通信机制了,所以这里要把CompletableFuture和CompletionStage这两个JAVA的基础类的用法搞清楚。
在成员函数里可以看到开始就有start,suspend,onStop,cancel,stop这几个最基础的控制接口。完成的功能也相对来说简单,启动里启动RPC服务,异步启动工作执行,在取消和停止里可以看到对ExecutionGraph的相关操作。在rescaleJob和rescaleOperators中,涉及到了JobVertex,在前面提到过,它是多个operator组成的。它和ExecutionJobVertex对应,而其又和ExecutionVertex相对应。或者这样来理解,为了提高异步的并行度,每个JobGraph对应着并行化ExecutionGraph,它是JobMaster最主要的数据结构和功能。而每个ExecutionVertex是ExecutionJobVertex的一个并发的子任务。
在rescaleOperators,一个重要的部分是重新扩展的动作,会引起检查点和保存点的重新处理,以保证数据流计算的安全性和及时性。而一下updateTaskExecutionState由后面的Task相关来进行更新,requestNextInputSplit函数则获得下一个Task的split.在这个函数内部,得到数据后,得调用Execution按照尝试进行执行,需要注意的是作业和任务之间通过ExecutionAttemptID 来进行联系。再向后是两个检查点相关的函数declineCheckpoint和acknowledgeCheckpoint。
在JobMaster中,还有一个KvStateRegistryGateway的接口的相关实现,它是从JobMasterService继承下来,其实就是一个对象的查找表,看名字也可以大致猜得出来。查找以后进行通知动作。下面的一些槽的处理和相关动作略过,看一下registerTaskManager这个函数,它主要是通过RPC来实现槽池的注册。而requestJobDetails这个函数,则是前面提到的通过具体的实时控制来得到阶段性的状态信息(最后完成才叫结果)。
startCheckpointScheduler检查点调度触发。startJobExecution,内部的JOB启动执行。startJobMasterServices启动JOB相关的服务。setNewFencingToken标记令牌和ID的绑定。assignExecutionGraph,分配执行图,等于是给定了自定义的执行方式。resetAndScheduleExecutionGraph这个函数用来重新规划和调度执行图,这个非常有用,在实际情况中可能会不断的重新执行某一段作业标记。
scheduleExecutionGraph,这个把真正的作业图调度起来,供任务使用。createAndRestoreExecutionGraph,createExecutionGraph,这几个都是对ExecutionGraph的管理和调度。JobStatusChanged这个从字面上就可以看出是作业状态的变化,它由ExecutionGraph进行控制,具体的可以看一下JobStatus这个枚举体的定义。
restoreExecutionGraphFromRescalingSavepoint,恢复保存点。rescaleJobGraph这个函数被rescaleOperators调用,重新处理操作的并行性。
从这上面的分析来看,requestNextInputSplit和scheduleExecutionGraph这两个函数,基本上把ExecutionGraph和Task挂接起来,而rescaleOperators则是提供了一个非常灵活的调用机制。

三、总结

flink的jobmaster还是很好的完成了作业图从作业到任务的传递的。不过,仍然有Java语言中的一些不爽的部分,继承又乱又不好查找,可能熟悉了就会好一些。到这里,作业图的传递基本完成了,下来就开始任务的启动和相关的处理动作。
在这里插入图片描述

发布了104 篇原创文章 · 获赞 12 · 访问量 11万+

猜你喜欢

转载自blog.csdn.net/fpcc/article/details/91126031