[Translation] Flink Table Api & SQL - Configuration

This translation from the official website: the Configuration  https://ci.apache.org/projects/flink/flink-docs-release-1.9/dev/table/config.html

By default, Table & SQL API pre-configured to produce accurate results with acceptable performance.

The requirements of the program table, may need to adjust certain parameters for optimization. For example, unbounded flow of the program may need to ensure that the required size of the state is capped (See flow concept ).

Overview

In each table environments, TableConfigit provides options for configuring the current session.

For the common or important configuration options TableConfigprovide getter and setter methods with detailed inline documentation.

For more advanced configuration, users can directly access key mapping basis. The following section lists all the available options can be used to adjust Flink Table and SQL API program.

Note: Due reads options at different points in time to perform an operation, it is recommended to set configuration options after an instance of the table environment as soon as possible.

// instantiate table environment
val tEnv: TableEnvironment = ...

// access flink configuration
val configuration = tEnv.getConfig().getConfiguration()
// set low-level key-value options
configuration.setString("table.exec.mini-batch.enabled", "true")
configuration.setString("table.exec.mini-batch.allow-latency", "5 s")
configuration.setString("table.exec.mini-batch.size", "5000")

Main: Blink planner currently only supports key-value pairs of configuration options

Implementation of configuration options

The following options can be used to tune the performance of query execution.

 

Key Default Description
table.exec.async-lookup.buffer-capacity

Batch Streaming
100  The maximum number of async i async lookup join may trigger a / o operation
table.exec.async-lookup.timeout

Batch Streaming
"3 min" Asynchronous operation is complete timeout
table.exec.disabled-operators

Batch
(none)

Mainly for testing. Operator a list of names separated by commas, each name represents a disabled operator.

The operator can disable include "NestedLoopJoin", "ShuffleHashJoin", "BroadcastHashJoin",

"SortMergeJoin", "HashAgg", "SortAgg". By default, no operator is not disabled.

table.exec.mini-batch.allow-latency

Streaming
"-1 ms"

The maximum waiting time can be used to record MiniBatch buffered input. MiniBatch recording buffer input is optimized to reduce the access state.

MiniBatch to allow the waiting period and the maximum number of records trigger buffer.

Note: If table.exec.mini-batch.enabled set to true, then it must be greater than zero.

table.exec.mini-batch.enabled

Streaming
false

Specifies whether the MiniBatch optimization enabled. MiniBatch recording buffer input is optimized to reduce the access state.

This feature is disabled by default. To enable this feature, users should configure this setting to true.

Note: If the mini batch processing is enabled,

It must set "table.exec.mini-batch.allow-latency" and "table.exec.mini-batch.size".

table.exec.mini-batch.size

Streaming
-1

It can be recorded as the maximum number of input buffer MiniBatch. MiniBatch recording buffer input is optimized to reduce the access state.

MiniBatch to allow the waiting period and the maximum number of records trigger buffer. Note: MiniBatch this applies only to non-window polymerization.

如果将table.exec.mini-batch.enabled设置为true,则其值必须为正.

table.exec.resource.default-parallelism

Batch Streaming
-1

为所有运算符(例如聚合,联接,过滤器)设置默认并行度以与并行实例一起运行。

此配置比StreamExecutionEnvironment的并行性具有更高的优先级

(实际上,此配置优先于StreamExecutionEnvironment的并行性)。

值-1表示未设置默认的并行性,则使用StreamExecutionEnvironment的并行性将回退.

table.exec.resource.external-buffer-memory

Batch
"10 mb" 设置在排序合并联接和嵌套联接以及窗口上使用的外部缓冲存储器大小.
table.exec.resource.hash-agg.memory

Batch
"128 mb" 设置哈希聚合运算符的托管内存大小.
table.exec.resource.hash-join.memory

Batch
"128 mb" 设置哈希联接运算符的托管内存。 定义下限.
table.exec.resource.sort.memory

Batch
"128 mb" 设置排序运算符的托管缓冲区内存大小.
table.exec.shuffle-mode

Batch
"batch"

设置执行 shuffle 模式。 只能设置 batch 或 pipeline。 batch:工作将逐步进行。

pipeline:作业将以流模式运行,但是当发送方拥有资源等待将数据发送到接收方时,

接收方等待资源启动可能会导致资源死锁.

table.exec.sort.async-merge-enabled

Batch
true 是否异步合并排序的溢出文件.
table.exec.sort.default-limit

Batch
-1 用户 order 后未设置限制时的默认限制。 -1表示此配置被忽略.
table.exec.sort.max-num-file-handles

Batch
128

外部合并排序的最大扇入。 它限制了每个运算符的文件句柄数。

如果太小,可能会导致中间合并。

但是,如果太大,将导致同时打开太多文件,占用内存并导致随机读取.

table.exec.source.idle-timeout

Streaming
"-1 ms"

当 source 在超时时间内未收到任何元素时,它将被标记为临时空闲。

这样,下游任务就可以前进其水印,而无需在空闲时等待来自该源的水印.

table.exec.spill-compression.block-size

Batch
"64 kb"

溢出数据时用于压缩的内存大小。

内存越大,压缩率越高,但是作业将消耗更多的内存资源.

table.exec.spill-compression.enabled

Batch
true

是否压缩溢出的数据。

目前,我们仅支持对sort和hash-agg和hash-join运算符压缩溢出的数据.

table.exec.window-agg.buffer-size-limit

Batch
100000 设置组窗口agg运算符中使用的窗口元素缓冲区大小限制。

 

优化器选项 

以下选项可用于调整查询优化器的行为,以获得更好的执行计划。 

 

Key Default Description
table.optimizer.agg-phase-strategy

Batch Streaming
"AUTO"

汇总阶段的策略。 只能设置AUTO,TWO_PHASE或ONE_PHASE。

自动:聚合阶段没有特殊的执行器。

选择两阶段汇总还是一阶段汇总取决于成本。 TWO_PHASE:强制使用具有localAggregate和globalAggregate的两阶段聚合。

请注意,如果聚合调用不支持分为两阶段的优化,我们仍将使用一级聚合。

ONE_PHASE:强制使用仅具有CompleteGlobalAggregate的一级聚合.

table.optimizer.distinct-agg.split.bucket-num

Streaming
1024

拆分独立聚合时配置存储桶数。

该数字在第一级聚合中用于计算存储区密钥“ hash_code(distinct_key)%BUCKET_NUM”,该存储区密钥在拆分后用作附加组密钥.

table.optimizer.distinct-agg.split.enabled

Streaming
false

告诉优化程序是否将不同的聚合(例如COUNT(DISTINCT col),SUM(DISTINCT col))分成两个级别。

第一次聚合被一个附加 key shuffle,该附加 key 使用distinct_key的哈希码和存储桶数计算得出。

当不同的聚合中存在数据倾斜时,此优化非常有用,并且可以扩大工作量。 默认为false.

table.optimizer.join-reorder-enabled

Batch Streaming
false 在优化器中启用联接重新排序。 默认为禁用.
table.optimizer.join.broadcast-threshold

Batch
1048576

配置表的最大大小(以字节为单位),该表在执行联接时将广播到所有工作程序节点。

通过将此值设置为-1以禁用广播.

table.optimizer.reuse-source-enabled

Batch Streaming
true

如果为true,则优化器将尝试找出重复的表源并重新使用它们。

仅当启用table.optimizer.reuse-sub-plan为true时,此方法才有效.

table.optimizer.reuse-sub-plan-enabled

Batch Streaming
true 当为 true 时,优化器将尝试找出重复的子计划并重用它们。
table.optimizer.source.predicate-pushdown-enabled

Batch Streaming
true 如果为true,则优化器会将谓词下推到FilterableTableSource中。 默认值为true.

 

欢迎关注Flink菜鸟公众号,会不定期更新Flink(开发技术)相关的推文

Guess you like

Origin www.cnblogs.com/Springmoon-venn/p/11982345.html