Flink Task parallelism

  1. Parallel data stream

    • Flink program consists of a plurality of tasks (conversion / operator, the data source and receiver), with the procedural Flink essentially parallel and distributed.

    • During execution, the stream having one or more flow partitions, and each having one or more operator operator * * subtasks .

    • operator subtasks independently of each other, and can be executed in different threads, and these threads may execute on different machines or containers.

    • Is the number of subtasks operator parallelism of the particular operator.

    • Parallelism of the parallel streams which always generates the operator.

    • Different operator of the same program may have different levels of parallelism.

    • schematic diagram:

      image-20191113083419692

    • Data stream may be transmitted between the two carriers one to one (or reallocation) mode or in re-distribution mode:

      • One stream
        • The above figure Source and map between operators
        • Zoning and sorting retention elements
        • This means that the map operator subtask [1] will see Source same subtask operator sequence elements [1] generated
      • Redistribution of flow
        • In the above map and keyBy / window between, and keyBy / window and between Sink redistribute flow
        • Each operator subtask transmit data to a different target subtask, depending on the selected conversion.
        • FIG data redistribution is performed in accordance with keyby operator.
  2. Task parallelism settings

    • Operator level

      • You may be defined a single operator parallelism, data source or a receiver by calling its setParallelism () method.

                //1.初始化环境
                StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
                //2.读取数据源,并进行转换操作
                DataStream<Tuple2<String, Integer>> dataStream = env
                        .socketTextStream("ronnie01", 9999)
                        .flatMap(new Splitter())
                        .keyBy(0)
                        //每5秒触发一批计算
                        .timeWindow(Time.seconds(5))
                        // 设置并行度
                        .sum(1).setParallelism(3);
    • Level execution environment

      • Execution environment level parallelism are all missions operator parallelism data sources and data receivers.

      • Execution environment can be covered by the parallelism explicitly, the operator parallelism.

                //1.初始化环境
                StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
        
                env.setParallelism(3);
    • The client level

      • When submitting a job to Flink, the degree of parallelism may be provided on the client side, by using the specified parallelism parameter -p.
      • E.g:
        • ./bin/flink run -p 10 ../examples/WordCount-java.jar
    • System level

      • By setting configuration item parallelism.default flink_home / conf / flink-conf.yaml configuration file to define a default degree of parallelism.

Guess you like

Origin www.cnblogs.com/ronnieyuan/p/11846623.html