Flink principle (c) - Task (task), Operator Chain (operator chain) and Slot (resources)

This article is a reference to the official documentation in conjunction with their understanding of written literature cited sources have indicated, if infringement please leave a message informing you that I will be immediately deleted. In addition, if the expression of local defective, everyone welcome message said.


Foreword

  In a previous blog Flink principle (b) - Resources a brief article has to say in the allocation of resources in the cluster Flink, try this blog from the operator after the definition, how tasks are allocated, and the task is how to use resources of.

A, Task and Operator Chains

  Flink JobGraph phase will generate the optimized code to optimize operator a chain operator (Operator Chains) to put a task (a thread) is performed, in order to reduce the overhead of switching between threads and buffered, to improve overall throughput and latency. In the following examples the official website will be described, shown in Figure 1:

   FIG, source, map, [keyBy | window | apply], the degree of parallelism sink operator 2,2,2,2,1 respectively, after optimization Flink, source map and the operator form a chain operator, as a task running on a thread, which is shown in the diagram of FIG condensed view, parallel parallelized view as shown in FIG. Can you count up a child between the Operator Chains see if the following conditions are met:

  • Consensus on the downstream operator parallelism
  • Of the downstream node 1 
  • The downstream node are in the same slot group in
  • chain strategy downstream node is ALWAYS (and may the downstream link, map, flatmap, filter, etc. The default is ALWAYS)
  • chain strategy or upstream node is ALWAYS HEAD (only the downstream link, not with upstream links, Source default is HEAD)
  • Between two nodes forward the data partitioning method is
  • The user does not disable the chain (the code is configured disableChain ())
  [If there are doubts on here, you can see the Operator Chains article]

Two, Task slot and resource

  A text before the resource binding, we can see that the distribution of tasks in the above Flink cluster as shown in Figure 2 should be:

   FIG., There are two nodes (TaskManage, i.e. the two processes), each node has three slot, each task (a Thread) are running in a slot.

  But in fact, Flink by default, as long as the child tasks are from the same Job, is to allow subtasks (subtask, is similar to the source / map, window, etc.) share a slot, even sub-tasks is different tasks can be shared a slot. This has two advantages:

  1) The maximum degree of parallelism is a Job Flink slot cluster number, so we do not calculate a program may contain a plurality of Task;

  2) get better resource utilization. If there is no slot sharing, as source / map that is not very resource-intensive operators (official website saying non-resource-intensive, non-intensive) and operators on this very resource-intensive window occupy the same amount of resources (a slot), as shown in FIG. 2; if the slot to allow sharing, then clusters in FIG. 2 may be a maximum degree of parallelism 6, as shown in Figure 3:

  In the case of slot can be shared, subtask consumption of resources can be more evenly distributed over the comparison Flink cluster taskManager. What it means? 3, a similar window operator evenly distributed in each slot, whereas in FIG. 2, only two in the slot. From Figure 3 we can see a slot can run multiple Thread.

 
  In summary, the operator is defined, according to the conditions to optimize operator chain, and a one SubTask composition, according to whether the last slot can be shared in the distributed slot taskmanager performed. The details, Let's hear next decomposition.
 
 Ref:
 

Guess you like

Origin www.cnblogs.com/love-yh/p/11298144.html