This translation from the official website: the Configuration https://ci.apache.org/projects/flink/flink-docs-release-1.9/dev/table/config.html
By default, Table & SQL API pre-configured to produce accurate results with acceptable performance.
The requirements of the program table, may need to adjust certain parameters for optimization. For example, unbounded flow of the program may need to ensure that the required size of the state is capped (See flow concept ).
Overview
In each table environments, TableConfig
it provides options for configuring the current session.
For the common or important configuration options TableConfig
provide getter and setter methods with detailed inline documentation.
For more advanced configuration, users can directly access key mapping basis. The following section lists all the available options can be used to adjust Flink Table and SQL API program.
Note: Due reads options at different points in time to perform an operation, it is recommended to set configuration options after an instance of the table environment as soon as possible.
// instantiate table environment val tEnv: TableEnvironment = ... // access flink configuration val configuration = tEnv.getConfig().getConfiguration() // set low-level key-value options configuration.setString("table.exec.mini-batch.enabled", "true") configuration.setString("table.exec.mini-batch.allow-latency", "5 s") configuration.setString("table.exec.mini-batch.size", "5000")
Main: Blink planner currently only supports key-value pairs of configuration options
Implementation of configuration options
The following options can be used to tune the performance of query execution.
Key | Default | Description |
---|---|---|
table.exec.async-lookup.buffer-capacityBatch Streaming |
100 | The maximum number of async i async lookup join may trigger a / o operation |
table.exec.async-lookup.timeoutBatch Streaming |
"3 min" | Asynchronous operation is complete timeout |
table.exec.disabled-operatorsBatch |
(none) | Mainly for testing. Operator a list of names separated by commas, each name represents a disabled operator. The operator can disable include "NestedLoopJoin", "ShuffleHashJoin", "BroadcastHashJoin", "SortMergeJoin", "HashAgg", "SortAgg". By default, no operator is not disabled. |
table.exec.mini-batch.allow-latencyStreaming |
"-1 ms" | The maximum waiting time can be used to record MiniBatch buffered input. MiniBatch recording buffer input is optimized to reduce the access state. MiniBatch to allow the waiting period and the maximum number of records trigger buffer. Note: If table.exec.mini-batch.enabled set to true, then it must be greater than zero. |
table.exec.mini-batch.enabledStreaming |
false | Specifies whether the MiniBatch optimization enabled. MiniBatch recording buffer input is optimized to reduce the access state. This feature is disabled by default. To enable this feature, users should configure this setting to true. Note: If the mini batch processing is enabled, It must set "table.exec.mini-batch.allow-latency" and "table.exec.mini-batch.size". |
table.exec.mini-batch.sizeStreaming |
-1 | It can be recorded as the maximum number of input buffer MiniBatch. MiniBatch recording buffer input is optimized to reduce the access state. MiniBatch to allow the waiting period and the maximum number of records trigger buffer. Note: MiniBatch this applies only to non-window polymerization. 如果将table.exec.mini-batch.enabled设置为true,则其值必须为正. |
table.exec.resource.default-parallelismBatch Streaming |
-1 | 为所有运算符(例如聚合,联接,过滤器)设置默认并行度以与并行实例一起运行。 此配置比StreamExecutionEnvironment的并行性具有更高的优先级 (实际上,此配置优先于StreamExecutionEnvironment的并行性)。 值-1表示未设置默认的并行性,则使用StreamExecutionEnvironment的并行性将回退. |
table.exec.resource.external-buffer-memoryBatch |
"10 mb" | 设置在排序合并联接和嵌套联接以及窗口上使用的外部缓冲存储器大小. |
table.exec.resource.hash-agg.memoryBatch |
"128 mb" | 设置哈希聚合运算符的托管内存大小. |
table.exec.resource.hash-join.memoryBatch |
"128 mb" | 设置哈希联接运算符的托管内存。 定义下限. |
table.exec.resource.sort.memoryBatch |
"128 mb" | 设置排序运算符的托管缓冲区内存大小. |
table.exec.shuffle-modeBatch |
"batch" | 设置执行 shuffle 模式。 只能设置 batch 或 pipeline。 batch:工作将逐步进行。 pipeline:作业将以流模式运行,但是当发送方拥有资源等待将数据发送到接收方时, 接收方等待资源启动可能会导致资源死锁. |
table.exec.sort.async-merge-enabledBatch |
true | 是否异步合并排序的溢出文件. |
table.exec.sort.default-limitBatch |
-1 | 用户 order 后未设置限制时的默认限制。 -1表示此配置被忽略. |
table.exec.sort.max-num-file-handlesBatch |
128 | 外部合并排序的最大扇入。 它限制了每个运算符的文件句柄数。 如果太小,可能会导致中间合并。 但是,如果太大,将导致同时打开太多文件,占用内存并导致随机读取. |
table.exec.source.idle-timeoutStreaming |
"-1 ms" | 当 source 在超时时间内未收到任何元素时,它将被标记为临时空闲。 这样,下游任务就可以前进其水印,而无需在空闲时等待来自该源的水印. |
table.exec.spill-compression.block-sizeBatch |
"64 kb" | 溢出数据时用于压缩的内存大小。 内存越大,压缩率越高,但是作业将消耗更多的内存资源. |
table.exec.spill-compression.enabledBatch |
true | 是否压缩溢出的数据。 目前,我们仅支持对sort和hash-agg和hash-join运算符压缩溢出的数据. |
table.exec.window-agg.buffer-size-limitBatch |
100000 | 设置组窗口agg运算符中使用的窗口元素缓冲区大小限制。 |
优化器选项
以下选项可用于调整查询优化器的行为,以获得更好的执行计划。
Key | Default | Description |
---|---|---|
table.optimizer.agg-phase-strategyBatch Streaming |
"AUTO" | 汇总阶段的策略。 只能设置AUTO,TWO_PHASE或ONE_PHASE。 自动:聚合阶段没有特殊的执行器。 选择两阶段汇总还是一阶段汇总取决于成本。 TWO_PHASE:强制使用具有localAggregate和globalAggregate的两阶段聚合。 请注意,如果聚合调用不支持分为两阶段的优化,我们仍将使用一级聚合。 ONE_PHASE:强制使用仅具有CompleteGlobalAggregate的一级聚合. |
table.optimizer.distinct-agg.split.bucket-numStreaming |
1024 | 拆分独立聚合时配置存储桶数。 该数字在第一级聚合中用于计算存储区密钥“ hash_code(distinct_key)%BUCKET_NUM”,该存储区密钥在拆分后用作附加组密钥. |
table.optimizer.distinct-agg.split.enabledStreaming |
false | 告诉优化程序是否将不同的聚合(例如COUNT(DISTINCT col),SUM(DISTINCT col))分成两个级别。 第一次聚合被一个附加 key shuffle,该附加 key 使用distinct_key的哈希码和存储桶数计算得出。 当不同的聚合中存在数据倾斜时,此优化非常有用,并且可以扩大工作量。 默认为false. |
table.optimizer.join-reorder-enabledBatch Streaming |
false | 在优化器中启用联接重新排序。 默认为禁用. |
table.optimizer.join.broadcast-thresholdBatch |
1048576 | 配置表的最大大小(以字节为单位),该表在执行联接时将广播到所有工作程序节点。 通过将此值设置为-1以禁用广播. |
table.optimizer.reuse-source-enabledBatch Streaming |
true | 如果为true,则优化器将尝试找出重复的表源并重新使用它们。 仅当启用table.optimizer.reuse-sub-plan为true时,此方法才有效. |
table.optimizer.reuse-sub-plan-enabledBatch Streaming |
true | 当为 true 时,优化器将尝试找出重复的子计划并重用它们。 |
table.optimizer.source.predicate-pushdown-enabledBatch Streaming |
true | 如果为true,则优化器会将谓词下推到FilterableTableSource中。 默认值为true. |
欢迎关注Flink菜鸟公众号,会不定期更新Flink(开发技术)相关的推文