flink task performance optimization

How to improve task performance Flink

A, Operator Chain

In order to more efficiently distributed execution, Flink will be the operator of the subtask link (chain) together to form a possible task, each task execution in a thread. The operators to link optimization task is very efficient: it reduces switching between threads, reducing the sequence of messages / deserialization reduce exchange data buffer, while reducing the overall throughput of the delay increases.

Flink JobGraph phase will generate the optimized code to optimize operator a chain operator (Operator Chains) to put a task (a thread) is performed, in order to reduce the overhead of switching between threads and buffered, to improve overall throughput and latency. In the following examples the official website will be described.

Figure above, source, map, [keyBy | window | apply], sink operator parallelism are 2,2,2,1, after optimization Flink, source map and the operator form a chain operator, as a task running on a thread, which is shown in the diagram of FIG condensed view, parallel parallelized view as shown in FIG. Can you count up a child between the Operator Chains see if the following conditions are met:

L parallelism on the downstream operator coincide;

L of the downstream node is 1;

L on the downstream node are in the same slot group are;

Strategies l chain is ALWAYS downstream node;

l policy chain upstream node or the HEAD is ALWAYS;

L between two nodes is data partitioning Forward;

l user does not disable the chain.

二、Slot Sharing

Slot Sharing refers to a Job from the same and have the same slotSharingGroup (default: default) between the different Task SubTask names can share a Slot, which makes a Slot have the opportunity to hold an entire Pipeline Job, which is above under the conditions mentioned in default slotSharing reasons equal number of Slot Operator and Job in the maximum parallelism required Job start. Can be further improved by Slot Sharing Mechanism Job operating performance, increases the maximum degree of parallelism Operator can be set in the case Slot number unchanged, so similar to the window that consumption of resources with the greatest degree of parallelism Task distributed on different TM at the same time like map, filter this relatively simple operation will not be exclusive Slot resources and reduce the possibility of waste of resources.

FIG comprising source-map [6 parallelism], keyBy / window / apply [6 parallelism], sink [1 parallelism] three kinds Task, occupying a total of six Slot; right start of the first internal slot 3 running from left a SubTask [3 Thread], holds a complete pipeline Job; and the remaining five are running Slot 2 SubTask [2 Thread], the data is ultimately sent over the network to complete the data processing Sink.

Three, Flink asynchronous IO

Flow calculations, often need to interact with external systems, and often a single connection that you get a higher connection will wait for time-consuming communication accounted for. The following is a comparative example in two ways:

FIG brown bars represent the long waiting time, network latency can be found greatly hindered the throughput and delay. To solve the problem of synchronous access, asynchronous mode can concurrently handle multiple requests and replies. That is, you may be sent continuously to the database users a, b, c, etc. the request, at the same time, which is requested to reply to the returned reply to which process, thereby eliminating the need to wait for the blocking between successive requests, as described above in FIG. shown to the right. This is exactly the realization of the principle Async I / O's.

四、Checkpoint 优化

Flink 实现了一套强大的 checkpoint 机制,使它在获取高吞吐量性能的同时,也能保证 Exactly Once 级别的快速恢复。

首先提升各节点 checkpoint 的性能考虑的就是存储引擎的执行效率。Flink官方支持的三种 checkpoint state 存储方案中,Memory 仅用于调试级别,无法做故障后的数据恢复。其次还有 Hdfs 与 Rocksdb,当所做 Checkpoint 的数据大小较大时,可以考虑采用 Rocksdb 来作为 checkpoint 的存储以提升效率。

其次的思路是资源设置,我们都知道 checkpoint 机制是在每个 task 上都会进行,那么当总的状态数据大小不变的情况下,如何分配减少单个 task 所分的 checkpoint 数据变成了提升 checkpoint 执行效率的关键。

最后,增量快照。非增量快照下,每次 checkpoint 都包含了作业所有状态数据。而大部分场景下,前后 checkpoint 里,数据发生变更的部分相对很少,所以设置增量 checkpoint,仅会对上次 checkpoint 和本次 checkpoint 之间状态的差异进行存储计算,减少了 checkpoint 的耗时。

总结

Operator Chain 是将多个 Operator 链接在一起放置在一个 Task 中,只针对 Operator。Slot Sharing 是在一个 Slot 中执行多个 Task,针对的是 Operator Chain 之后的 Task。这两种优化都充分利用了计算资源,减少了不必要的开销,提升了 Job 的运行性能。异步IO能解决需要高效访问其他系统的问题,提升任务执行的性能。Checkpoint优化是集群配置上的优化,提升集群本身的处理能力。

 

参考:

https://www.infoq.cn/article/ZmL7TCcEchvANY-9jG1H

https://blog.icocoro.me/2019/06/10/1906-apache-flink-asyncio/

 

Guess you like

Origin www.cnblogs.com/luxiaoxun/p/12114728.html