Parallel data stream
Flink program consists of a plurality of tasks (conversion / operator, the data source and receiver), with the procedural Flink essentially parallel and distributed.
During execution, the stream having one or more flow partitions, and each having one or more operator operator * * subtasks .
operator subtasks independently of each other, and can be executed in different threads, and these threads may execute on different machines or containers.
Is the number of subtasks operator parallelism of the particular operator.
Parallelism of the parallel streams which always generates the operator.
Different operator of the same program may have different levels of parallelism.
schematic diagram:
Data stream may be transmitted between the two carriers one to one (or reallocation) mode or in re-distribution mode:
- One stream
- The above figure Source and map between operators
- Zoning and sorting retention elements
- This means that the map operator subtask [1] will see Source same subtask operator sequence elements [1] generated
- Redistribution of flow
- In the above map and keyBy / window between, and keyBy / window and between Sink redistribute flow
- Each operator subtask transmit data to a different target subtask, depending on the selected conversion.
- FIG data redistribution is performed in accordance with keyby operator.
- One stream
Task parallelism settings
Operator level
You may be defined a single operator parallelism, data source or a receiver by calling its setParallelism () method.
//1.初始化环境 StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); //2.读取数据源,并进行转换操作 DataStream<Tuple2<String, Integer>> dataStream = env .socketTextStream("ronnie01", 9999) .flatMap(new Splitter()) .keyBy(0) //每5秒触发一批计算 .timeWindow(Time.seconds(5)) // 设置并行度 .sum(1).setParallelism(3);
Level execution environment
Execution environment level parallelism are all missions operator parallelism data sources and data receivers.
Execution environment can be covered by the parallelism explicitly, the operator parallelism.
//1.初始化环境 StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); env.setParallelism(3);
The client level
- When submitting a job to Flink, the degree of parallelism may be provided on the client side, by using the specified parallelism parameter -p.
- E.g:
- ./bin/flink run -p 10 ../examples/WordCount-java.jar
System level
- By setting configuration item parallelism.default flink_home / conf / flink-conf.yaml configuration file to define a default degree of parallelism.
Flink Task parallelism
Guess you like
Origin www.cnblogs.com/ronnieyuan/p/11846623.html
Ranking