3. Which execution modes can be selected for flink computing tasks at runtime

Table of contents

1. What is bounded flow and unbounded flow

2. What is batch execution mode and stream execution mode

3. How to choose the execution mode?

4. How to configure the execution mode?

Method 1: When submitting a computing task, specify it through parameters (recommended, this method is more flexible)

Method 2: Configure the execution mode in the Driver program (not recommended)

5. This is a complete introductory case


1. What is bounded flow and unbounded flow

Bounded style:

        The data flow defines the start position and the end position. For a computing task, all input data is known before the calculation , and no new data will appear

Unbounded flow:

        The data flow defines the start position, but does not define the end position. For a computing task, all input data is unknown before calculation , and new data will appear


2. What is batch execution mode and stream execution mode

Flink tasks support different execution modes ( batch execution mode, stream execution mode ) at runtime, and you can choose which mode to use according to business requirements and job characteristics.

BATCH - batch execution mode:

        This mode of calculating data is similar to MR and SPARK. It is suitable for a known fixed input, and it will only be calculated once and will not run jobs continuously.

STREAMING - Streaming execution mode:

        Continuously incrementally process data, continuously run indefinitely on unbounded data.


3. How to choose the execution mode?

BATCH-batch execution mode:
        can only handle bounded data streams,
not unbounded streams

STREAMING-stream execution mode:
        can handle bounded data flow and unbounded data flow

Selection principle:
        If the data source to be processed is a bounded stream:

                        Only use the BATCH processing mode first, which will be more efficient (the bottom layer is optimized during join and aggregation operations)

        If the processed data source is an unbounded stream:

                        If you only need to calculate the snapshot results at a certain moment, you can choose the BATCH processing mode

                        If you need real-time statistical calculation results, you need to select the STREAMING processing mode


4. How to configure the execution mode?

Method 1: When submitting a computing task, specify it through parameters (recommended, this method is more flexible)

#提交Flink计算任务,并指定执行模式(默认为流执行模式)
bin/flink run -Dexecution.runtime-mode=BATCH|STREAMING|AUTOMATIC <jarFile>

Method 2: Configure the execution mode in the Driver program (not recommended)

  test("批执行模式&流执行模式") {
    // 获取执行环境
    val env = StreamExecutionEnvironment.getExecutionEnvironment
    // 指定批执行模式
    env.setRuntimeMode(RuntimeExecutionMode.BATCH)
    // 指定流执行模式(默认模式)
    env.setRuntimeMode(RuntimeExecutionMode.STREAMING)
    // 指定自动模式(根据数据源的边界性来决定使用哪种模式)
    env.setRuntimeMode(RuntimeExecutionMode.AUTOMATIC)
  }

Official website link: Execution mode


5. This is a complete introductory case

Development language: Java1.8

Flink version: flink1.17

package com.baidu.datastream.env;


/*
 * TODO 运行flink计算任务,flink提供了两种执行模式(批执行模式&流执行模式)
 *      批执行模式:接收的数据只计算一次,计算完毕后停止计算任务
 *      流执行模式:持续的计算接收数据,连续无期限的运行在无界数据上
 * 思考:怎样选择执行模式呢?
 *      一般根据数据源的特点来选择执行模式
 *      有界流,优先选择BATCH执行模式,这样会更高效(在join、聚合操作时底层做了优化)
 *      无界流,只能选择STREAMING处理模式
 * 重要提示:
 *      很少会在代码中指定执行模式,一般都是在提交任务时,通过参数来指定(这种方式更加灵活)
 *      bin/flink run -Dexecution.runtime-mode=BATCH|STREAMING|AUTOMATIC <jarFile>
 *
 * */

import org.apache.flink.api.common.RuntimeExecutionMode;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;

public class ExecutionMode {
    public static void main(String[] args) throws Exception {
        // 1.获取执行环境
        StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
        // 指定批执行模式
        env.setRuntimeMode(RuntimeExecutionMode.BATCH);
        // 指定流执行模式(默认模式)
        //env.setRuntimeMode(RuntimeExecutionMode.STREAMING);
        // 指定自动模式(根据数据源的边界性来决定使用哪种模式)
        //env.setRuntimeMode(RuntimeExecutionMode.AUTOMATIC);

        // 注意:当数据源为`无界流`时,不能使用BATCH模式
        env.socketTextStream("127.0.0.1", 9999).print();

        env.execute();
    }
}

important hint:

Guess you like

Origin blog.csdn.net/weixin_42845827/article/details/131406991