Apache Flink Task类源码分析

1. 简介

Apache Flink由两类运行时JVM进程管理分布式集群的计算资源。

  • JobManager进程负责分布式任务管理,如任务调度、检查点、故障恢复等。在高可用性(HA)分布式部署时,系统存在多个JobManager,一个leader和多个standby。JobManager是Flink主从架构中的master。
  • TaskManager进程负责执行任务线程(即子任务subtask)、缓存和传输stream。TaskManager是Flink主从架构中的slave。
    processes.png

Task.java类表示在TaskManager上执行的operator subtask(子任务),这些operator subtask在不同的线程、不同的物理机或不同的容器中彼此互不依赖得执行。

每个operator subtask由一个专用的线程运行。

2. 代码分析

org.apache.flink.runtime.taskmanager.Task类在flink 1.8中有1645行,是个非常冗长的类,它实现了Runnable、TaskActions、CheckpointListener接口。

public interface TaskActions {

	/**
	 * Check the execution state of the execution producing a result partition.
	 *
	 * @param jobId ID of the job the partition belongs to.
	 * @param intermediateDataSetId ID of the parent intermediate data set.
	 * @param resultPartitionId ID of the result partition to check. This
	 * identifies the producing execution and partition.
	 */
	void triggerPartitionProducerStateCheck(
		JobID jobId,
		IntermediateDataSetID intermediateDataSetId,
		ResultPartitionID resultPartitionId);

	/**
	 * Fail the owning task with the given throwable.
	 *
	 * @param cause of the failure
	 */
	void failExternally(Throwable cause);
}

TaskActions接口定义了Task可以被执行的操作,目前包含两个方法:

  • triggerPartitionProducerStateCheck:检查执行状态
  • failExternally:根据输入的Throwable令当前Task失败
public interface CheckpointListener {

	/**
	 * This method is called as a notification once a distributed checkpoint has been completed.
	 * 
	 * Note that any exception during this method will not cause the checkpoint to
	 * fail any more.
	 * 
	 * @param checkpointId The ID of the checkpoint that has been completed.
	 * @throws Exception
	 */
	void notifyCheckpointComplete(long checkpointId) throws Exception;
}

CheckpointListener接口定义了checkpoint完成后的通知逻辑。

2.1 构造函数

    public Task(
        JobInformation jobInformation,
        TaskInformation taskInformation,
        ExecutionAttemptID executionAttemptID,
        AllocationID slotAllocationId,
        int subtaskIndex,
        int attemptNumber,
        Collection<ResultPartitionDeploymentDescriptor> resultPartitionDeploymentDescriptors,
        Collection<InputGateDeploymentDescriptor> inputGateDeploymentDescriptors,
        int targetSlotNumber,
        MemoryManager memManager,
        IOManager ioManager,
        NetworkEnvironment networkEnvironment,
        BroadcastVariableManager bcVarManager,
        TaskStateManager taskStateManager,
        TaskManagerActions taskManagerActions,
        InputSplitProvider inputSplitProvider,
        CheckpointResponder checkpointResponder,
        GlobalAggregateManager aggregateManager,
        BlobCacheService blobService,
        LibraryCacheManager libraryCache,
        FileCache fileCache,
        TaskManagerRuntimeInfo taskManagerConfig,
        @Nonnull TaskMetricGroup metricGroup,
        ResultPartitionConsumableNotifier resultPartitionConsumableNotifier,
        PartitionProducerStateChecker partitionProducerStateChecker,
        Executor executor) {

        Preconditions.checkNotNull(jobInformation);
        Preconditions.checkNotNull(taskInformation);

        Preconditions.checkArgument(0 <= subtaskIndex, "The subtask index must be positive.");
        Preconditions.checkArgument(0 <= attemptNumber, "The attempt number must be positive.");
        Preconditions.checkArgument(0 <= targetSlotNumber, "The target slot number must be positive.");

        this.taskInfo = new TaskInfo(
                taskInformation.getTaskName(),
                taskInformation.getMaxNumberOfSubtaks(),
                subtaskIndex,
                taskInformation.getNumberOfSubtasks(),
                attemptNumber,
                String.valueOf(slotAllocationId));

        this.jobId = jobInformation.getJobId();
        this.vertexId = taskInformation.getJobVertexId();
        this.executionId  = Preconditions.checkNotNull(executionAttemptID);
        this.allocationId = Preconditions.checkNotNull(slotAllocationId);
        this.taskNameWithSubtask = taskInfo.getTaskNameWithSubtasks();
        this.jobConfiguration = jobInformation.getJobConfiguration();
        this.taskConfiguration = taskInformation.getTaskConfiguration();
        this.requiredJarFiles = jobInformation.getRequiredJarFileBlobKeys();
        this.requiredClasspaths = jobInformation.getRequiredClasspathURLs();
        this.nameOfInvokableClass = taskInformation.getInvokableClassName();
        this.serializedExecutionConfig = jobInformation.getSerializedExecutionConfig();

        Configuration tmConfig = taskManagerConfig.getConfiguration();
        this.taskCancellationInterval = tmConfig.getLong(TaskManagerOptions.TASK_CANCELLATION_INTERVAL);
        this.taskCancellationTimeout = tmConfig.getLong(TaskManagerOptions.TASK_CANCELLATION_TIMEOUT);

        this.memoryManager = Preconditions.checkNotNull(memManager);
        this.ioManager = Preconditions.checkNotNull(ioManager);
        this.broadcastVariableManager = Preconditions.checkNotNull(bcVarManager);
        this.taskStateManager = Preconditions.checkNotNull(taskStateManager);
        this.accumulatorRegistry = new AccumulatorRegistry(jobId, executionId);

        this.inputSplitProvider = Preconditions.checkNotNull(inputSplitProvider);
        this.checkpointResponder = Preconditions.checkNotNull(checkpointResponder);
        this.aggregateManager = Preconditions.checkNotNull(aggregateManager);
        this.taskManagerActions = checkNotNull(taskManagerActions);

        this.blobService = Preconditions.checkNotNull(blobService);
        this.libraryCache = Preconditions.checkNotNull(libraryCache);
        this.fileCache = Preconditions.checkNotNull(fileCache);
        this.network = Preconditions.checkNotNull(networkEnvironment);
        this.taskManagerConfig = Preconditions.checkNotNull(taskManagerConfig);

        this.metrics = metricGroup;

        this.partitionProducerStateChecker = Preconditions.checkNotNull(partitionProducerStateChecker);
        this.executor = Preconditions.checkNotNull(executor);

        // create the reader and writer structures

        final String taskNameWithSubtaskAndId = taskNameWithSubtask + " (" + executionId + ')';

        // Produced intermediate result partitions
        this.producedPartitions = new ResultPartition[resultPartitionDeploymentDescriptors.size()];

        int counter = 0;

        for (ResultPartitionDeploymentDescriptor desc: resultPartitionDeploymentDescriptors) {
            ResultPartitionID partitionId = new ResultPartitionID(desc.getPartitionId(), executionId);

            this.producedPartitions[counter] = new ResultPartition(
                taskNameWithSubtaskAndId,
                this,
                jobId,
                partitionId,
                desc.getPartitionType(),
                desc.getNumberOfSubpartitions(),
                desc.getMaxParallelism(),
                networkEnvironment.getResultPartitionManager(),
                resultPartitionConsumableNotifier,
                ioManager,
                desc.sendScheduleOrUpdateConsumersMessage());

            ++counter;
        }

        // Consumed intermediate result partitions
        this.inputGates = new SingleInputGate[inputGateDeploymentDescriptors.size()];
        this.inputGatesById = new HashMap<>();

        counter = 0;

        for (InputGateDeploymentDescriptor inputGateDeploymentDescriptor: inputGateDeploymentDescriptors) {
            SingleInputGate gate = SingleInputGate.create(
                taskNameWithSubtaskAndId,
                jobId,
                executionId,
                inputGateDeploymentDescriptor,
                networkEnvironment,
                this,
                metricGroup.getIOMetricGroup());

            inputGates[counter] = gate;
            inputGatesById.put(gate.getConsumedResultId(), gate);

            ++counter;
        }

        invokableHasBeenCanceled = new AtomicBoolean(false);

        // finally, create the executing thread, but do not start it
        executingThread = new Thread(TASK_THREADS_GROUP, this, taskNameWithSubtask);
    }

构造函数代码较长,但是逻辑比较简单

  • 首先,根据构造函数入参初始化类实例变量
  • 根据分区数量,初始化结果分区数组
  • 创建执行线程,但不启动它。执行线程启动的入口是在TaskManager的submitTask函数中

2.2 run

    public void run() {

        // ----------------------------
        //  Initial State transition
        // ----------------------------
        while (true) {
            ExecutionState current = this.executionState;
            if (current == ExecutionState.CREATED) {
                if (transitionState(ExecutionState.CREATED, ExecutionState.DEPLOYING)) {
                    // success, we can start our work
                    break;
                }
            }
            else if (current == ExecutionState.FAILED) {
                // we were immediately failed. tell the TaskManager that we reached our final state
                notifyFinalState();
                if (metrics != null) {
                    metrics.close();
                }
                return;
            }
            else if (current == ExecutionState.CANCELING) {
                if (transitionState(ExecutionState.CANCELING, ExecutionState.CANCELED)) {
                    // we were immediately canceled. tell the TaskManager that we reached our final state
                    notifyFinalState();
                    if (metrics != null) {
                        metrics.close();
                    }
                    return;
                }
            }
            else {
                if (metrics != null) {
                    metrics.close();
                }
                throw new IllegalStateException("Invalid state for beginning of operation of task " + this + '.');
            }
        }

        // all resource acquisitions and registrations from here on
        // need to be undone in the end
        Map<String, Future<Path>> distributedCacheEntries = new HashMap<>();
        AbstractInvokable invokable = null;

        try {
            // ----------------------------
            //  Task Bootstrap - We periodically
            //  check for canceling as a shortcut
            // ----------------------------

            // activate safety net for task thread
            LOG.info("Creating FileSystem stream leak safety net for task {}", this);
            FileSystemSafetyNet.initializeSafetyNetForThread();

            blobService.getPermanentBlobService().registerJob(jobId);

            // first of all, get a user-code classloader
            // this may involve downloading the job's JAR files and/or classes
            LOG.info("Loading JAR files for task {}.", this);

            userCodeClassLoader = createUserCodeClassloader();
            final ExecutionConfig executionConfig = serializedExecutionConfig.deserializeValue(userCodeClassLoader);

            if (executionConfig.getTaskCancellationInterval() >= 0) {
                // override task cancellation interval from Flink config if set in ExecutionConfig
                taskCancellationInterval = executionConfig.getTaskCancellationInterval();
            }

            if (executionConfig.getTaskCancellationTimeout() >= 0) {
                // override task cancellation timeout from Flink config if set in ExecutionConfig
                taskCancellationTimeout = executionConfig.getTaskCancellationTimeout();
            }

            if (isCanceledOrFailed()) {
                throw new CancelTaskException();
            }

            // ----------------------------------------------------------------
            // register the task with the network stack
            // this operation may fail if the system does not have enough
            // memory to run the necessary data exchanges
            // the registration must also strictly be undone
            // ----------------------------------------------------------------

            LOG.info("Registering task at network: {}.", this);

            network.registerTask(this);

            // add metrics for buffers
            this.metrics.getIOMetricGroup().initializeBufferMetrics(this);

            // register detailed network metrics, if configured
            if (taskManagerConfig.getConfiguration().getBoolean(TaskManagerOptions.NETWORK_DETAILED_METRICS)) {
                // similar to MetricUtils.instantiateNetworkMetrics() but inside this IOMetricGroup
                MetricGroup networkGroup = this.metrics.getIOMetricGroup().addGroup("Network");
                MetricGroup outputGroup = networkGroup.addGroup("Output");
                MetricGroup inputGroup = networkGroup.addGroup("Input");

                // output metrics
                for (int i = 0; i < producedPartitions.length; i++) {
                    ResultPartitionMetrics.registerQueueLengthMetrics(
                        outputGroup.addGroup(i), producedPartitions[i]);
                }

                for (int i = 0; i < inputGates.length; i++) {
                    InputGateMetrics.registerQueueLengthMetrics(
                        inputGroup.addGroup(i), inputGates[i]);
                }
            }

            // next, kick off the background copying of files for the distributed cache
            try {
                for (Map.Entry<String, DistributedCache.DistributedCacheEntry> entry :
                        DistributedCache.readFileInfoFromConfig(jobConfiguration)) {
                    LOG.info("Obtaining local cache file for '{}'.", entry.getKey());
                    Future<Path> cp = fileCache.createTmpFile(entry.getKey(), entry.getValue(), jobId, executionId);
                    distributedCacheEntries.put(entry.getKey(), cp);
                }
            }
            catch (Exception e) {
                throw new Exception(
                    String.format("Exception while adding files to distributed cache of task %s (%s).", taskNameWithSubtask, executionId), e);
            }

            if (isCanceledOrFailed()) {
                throw new CancelTaskException();
            }

            // ----------------------------------------------------------------
            //  call the user code initialization methods
            // ----------------------------------------------------------------

            TaskKvStateRegistry kvStateRegistry = network.createKvStateTaskRegistry(jobId, getJobVertexId());

            Environment env = new RuntimeEnvironment(
                jobId,
                vertexId,
                executionId,
                executionConfig,
                taskInfo,
                jobConfiguration,
                taskConfiguration,
                userCodeClassLoader,
                memoryManager,
                ioManager,
                broadcastVariableManager,
                taskStateManager,
                aggregateManager,
                accumulatorRegistry,
                kvStateRegistry,
                inputSplitProvider,
                distributedCacheEntries,
                producedPartitions,
                inputGates,
                network.getTaskEventDispatcher(),
                checkpointResponder,
                taskManagerConfig,
                metrics,
                this);

            // now load and instantiate the task's invokable code
            invokable = loadAndInstantiateInvokable(userCodeClassLoader, nameOfInvokableClass, env);

            // ----------------------------------------------------------------
            //  actual task core work
            // ----------------------------------------------------------------

            // we must make strictly sure that the invokable is accessible to the cancel() call
            // by the time we switched to running.
            this.invokable = invokable;

            // switch to the RUNNING state, if that fails, we have been canceled/failed in the meantime
            if (!transitionState(ExecutionState.DEPLOYING, ExecutionState.RUNNING)) {
                throw new CancelTaskException();
            }

            // notify everyone that we switched to running
            taskManagerActions.updateTaskExecutionState(new TaskExecutionState(jobId, executionId, ExecutionState.RUNNING));

            // make sure the user code classloader is accessible thread-locally
            executingThread.setContextClassLoader(userCodeClassLoader);

            // run the invokable
            invokable.invoke();

            // make sure, we enter the catch block if the task leaves the invoke() method due
            // to the fact that it has been canceled
            if (isCanceledOrFailed()) {
                throw new CancelTaskException();
            }

            // ----------------------------------------------------------------
            //  finalization of a successful execution
            // ----------------------------------------------------------------

            // finish the produced partitions. if this fails, we consider the execution failed.
            for (ResultPartition partition : producedPartitions) {
                if (partition != null) {
                    partition.finish();
                }
            }

            // try to mark the task as finished
            // if that fails, the task was canceled/failed in the meantime
            if (!transitionState(ExecutionState.RUNNING, ExecutionState.FINISHED)) {
                throw new CancelTaskException();
            }
        }
        catch (Throwable t) {

            // unwrap wrapped exceptions to make stack traces more compact
            if (t instanceof WrappingRuntimeException) {
                t = ((WrappingRuntimeException) t).unwrap();
            }

            // ----------------------------------------------------------------
            // the execution failed. either the invokable code properly failed, or
            // an exception was thrown as a side effect of cancelling
            // ----------------------------------------------------------------

            try {
                // check if the exception is unrecoverable
                if (ExceptionUtils.isJvmFatalError(t) ||
                        (t instanceof OutOfMemoryError && taskManagerConfig.shouldExitJvmOnOutOfMemoryError())) {

                    // terminate the JVM immediately
                    // don't attempt a clean shutdown, because we cannot expect the clean shutdown to complete
                    try {
                        LOG.error("Encountered fatal error {} - terminating the JVM", t.getClass().getName(), t);
                    } finally {
                        Runtime.getRuntime().halt(-1);
                    }
                }

                // transition into our final state. we should be either in DEPLOYING, RUNNING, CANCELING, or FAILED
                // loop for multiple retries during concurrent state changes via calls to cancel() or
                // to failExternally()
                while (true) {
                    ExecutionState current = this.executionState;

                    if (current == ExecutionState.RUNNING || current == ExecutionState.DEPLOYING) {
                        if (t instanceof CancelTaskException) {
                            if (transitionState(current, ExecutionState.CANCELED)) {
                                cancelInvokable(invokable);
                                break;
                            }
                        }
                        else {
                            if (transitionState(current, ExecutionState.FAILED, t)) {
                                // proper failure of the task. record the exception as the root cause
                                failureCause = t;
                                cancelInvokable(invokable);

                                break;
                            }
                        }
                    }
                    else if (current == ExecutionState.CANCELING) {
                        if (transitionState(current, ExecutionState.CANCELED)) {
                            break;
                        }
                    }
                    else if (current == ExecutionState.FAILED) {
                        // in state failed already, no transition necessary any more
                        break;
                    }
                    // unexpected state, go to failed
                    else if (transitionState(current, ExecutionState.FAILED, t)) {
                        LOG.error("Unexpected state in task {} ({}) during an exception: {}.", taskNameWithSubtask, executionId, current);
                        break;
                    }
                    // else fall through the loop and
                }
            }
            catch (Throwable tt) {
                String message = String.format("FATAL - exception in exception handler of task %s (%s).", taskNameWithSubtask, executionId);
                LOG.error(message, tt);
                notifyFatalError(message, tt);
            }
        }
        finally {
            try {
                LOG.info("Freeing task resources for {} ({}).", taskNameWithSubtask, executionId);

                // clear the reference to the invokable. this helps guard against holding references
                // to the invokable and its structures in cases where this Task object is still referenced
                this.invokable = null;

                // stop the async dispatcher.
                // copy dispatcher reference to stack, against concurrent release
                ExecutorService dispatcher = this.asyncCallDispatcher;
                if (dispatcher != null && !dispatcher.isShutdown()) {
                    dispatcher.shutdownNow();
                }

                // free the network resources
                network.unregisterTask(this);

                // free memory resources
                if (invokable != null) {
                    memoryManager.releaseAll(invokable);
                }

                // remove all of the tasks library resources
                libraryCache.unregisterTask(jobId, executionId);
                fileCache.releaseJob(jobId, executionId);
                blobService.getPermanentBlobService().releaseJob(jobId);

                // close and de-activate safety net for task thread
                LOG.info("Ensuring all FileSystem streams are closed for task {}", this);
                FileSystemSafetyNet.closeSafetyNetAndGuardedResourcesForThread();

                notifyFinalState();
            }
            catch (Throwable t) {
                // an error in the resource cleanup is fatal
                String message = String.format("FATAL - exception in resource cleanup of task %s (%s).", taskNameWithSubtask, executionId);
                LOG.error(message, t);
                notifyFatalError(message, t);
            }

            // un-register the metrics at the end so that the task may already be
            // counted as finished when this happens
            // errors here will only be logged
            try {
                metrics.close();
            }
            catch (Throwable t) {
                LOG.error("Error during metrics de-registration of task {} ({}).", taskNameWithSubtask, executionId, t);
            }
        }
    }

run方法是task的启动和执行核心方法。

2.2.1 状态初始化

  • 自旋等待状态从CREATED修改为DEPLOYING成功,修改成功后退出自旋
  • 如果task当前状态不是CREATED则退出run方法

2.2.2 启动

  • 创建一个jobId粒度的class loader并下载缺失的jar files;基于不同的class loader的类加载隔离机制可以在JVM进程内隔离不同的task运行环境
  • 将当前task实例注册到network stack,如果可用内存不足,注册可能会失败
  • 后台拷贝分布式缓存文件
  • 加载和初始化任务的invokable代码
  • subtask状态从DEPLOYING切换到RUNNING
  • 通知订阅方subtask状态已修改

2.2.3 运行

  • 将执行线程的线程上下文类加载器修改为刚刚启动时创建的class loader
  • 调用invokable的invoke方法,执行用户代码

2.2.4 结束

  • 逐一调用每个结果分区的finish方法
  • subtask状态从RUNNING切换到FINISHED

2.2.5 异常捕获

  • 如果是OOM error,并且配置了jvm-exit-on-oom参数为true,则调用Runtime.getRuntime().halt(-1)来停止JVM。
  • 自旋修改subtask状态为CANCELED或FAILED

2.3 createUserCodeClassloader

    private ClassLoader createUserCodeClassloader() throws Exception {
        long startDownloadTime = System.currentTimeMillis();

        // triggers the download of all missing jar files from the job manager
        libraryCache.registerTask(jobId, executionId, requiredJarFiles, requiredClasspaths);

        LOG.debug("Getting user code class loader for task {} at library cache manager took {} milliseconds",
                executionId, System.currentTimeMillis() - startDownloadTime);

        ClassLoader userCodeClassLoader = libraryCache.getClassLoader(jobId);
        if (userCodeClassLoader == null) {
            throw new Exception("No user code classloader available.");
        }
        return userCodeClassLoader;
    }

createUserCodeClassloader方法调用BlobLibraryCacheManager类的getClassLoader方法获取用户代码class loader。

  /** Registered entries per job */
  private final Map<JobID, LibraryCacheEntry> cacheEntries = new HashMap<>();

BlobLibraryCacheManager类内部维护了一个HashMap缓存每个job的class loader。

3. 总结

  • Flink Task.java类实现了Runnable接口,执行线程将执行Task类的run方法;
  • Flink task的线程模型没有使用线程池,仅仅只用了内置的thread group;
  • 针对每个job创建一个class loader,并设置为执行线程的上下文类加载器。

猜你喜欢

转载自blog.csdn.net/a860MHz/article/details/91877325