Data transfer mode between Flink operators

The form of data transmission between operators can be either one-to-one (forwarding)a mode or redistributinga mode. The specific form depends on the type of the operator.
Insert picture description here

  • One-to-one:Stream (for example, between source and map operator) maintains the order of partitions and elements. That means that the number and order of the elements seen by the subtasks of the map operator are the same as the number and order of the elements produced by the subtasks of the source operator. Map, fliter, flatMap and other operators are all one-to-one. The corresponding relationship (类似于 spark 中的窄依赖).
  • Redistributing:The partition of Stream (between map() and keyBy/window or between keyBy/window and sink) will change. The subtasks of each operator send data to different target tasks according to the selected transformation. For example, keyBy() repartitions based on hashCode, broadcast and rebalance will randomly repartition, these operators will cause the redistribute process, and the redistribute process is similar to the shuffle process in Spark (类似于Spark 中的宽依赖).

For one to one operations with the same degree of parallelism, connected operators like Flink are linked together to form a task, and the original operator becomes a part of it. Linking operators into tasks is a very effective optimization: it can reduce the switching between threads and the data exchange based on the buffer area, and improve throughput while reducing delay.

Pay attention to the official account, 数据工匠记and focus on the offline and real-time technical dry goods in the big data field to share regularly! Personal website www.lllpan.top
Insert picture description here

Guess you like

Origin blog.csdn.net/lp284558195/article/details/114974760