Apache Flink 1.17


1. Flink 1.17 Overview

insert image description here
The Flink 1.17 version has completed 7 FLIPs, with 170+ contributors in total, 600+ Issues and 1100+ Commits resolved, overall it is a relatively large version.
insert image description here
From the perspective of Issue distribution, version 1.17 has made many improvements at the Runtime level and the Table level, of which there are about 170+Issues at the Runtime level and about 120 at the Table level. In addition, many enhancements and improvements have been made at the Checkpoint & State, API, and Connector levels.
insert image description here
The FLIP completed in version 1.17 is shown in the figure above, which are:

  • FLIP-256: Extended the Rest API to support specifying parameters when submitting jobs, basically aligned with Flink CLI.

  • FLIP-265: Mark the Scala API support as deprecated. There are two sets of APIs in Flink, Scala and Java. With the continuous development and evolution of the community, various problems have appeared in the Scala API, such as the difficulty of upgrading the Scala version. In Flink 1.15 In the upgrade from Scala2.12.7 to 2.12.15, compatibility-breaking transformation must be made; on the other hand, the Java API evolves faster in the community than the Scala API, and the former will have more features; in addition, the community is relatively lacking in familiarity with Scala technology The Contributor of the stack, so the community decided to slowly remove the Scala API and focus more on the Java API.

  • FLIP-266: A lot of simplifications have been made to the network layer configuration of TM, and a number of new core features have been added to improve the out-of-the-box operation of the network at the runtime level. Users can obtain better job optimization results with less configuration.

  • FLIP-280: Introduced the PLAN ADVICE function at the SQL level to help users check the correctness of PLAN and optimize SQL, such as whether aggregation should be split, non-deterministic columns lead to incorrect problems, etc., and prompt users to rewrite and optimize SQL.

  • FLIP-281: Sink supports speculative execution for batch jobs. Predictive execution is mainly divided into three FLIPs for gradual implementation. The first FLIP supports operators other than Source and Sink in the job link, the second FLIP supports Source operators, and FLIP-281 is the last FLIP that supports Sink operator. The Sink operator is quite special. In the Flink job topology, it will flush data to the external system and need to write data. The execution of multiple Tasks in coordination with the external system will bring great challenges to data consistency. After FLIP-281 supports the predictive execution of Sink, the entire link of the Batch job supports predictive execution.

  • FLIP-282: Introduced Delete and Update APIs. In the evolution of Flink from Streaming
    Processing to Streaming Warehouse, it is necessary to customize some APIs for Streaming Warehouse, such as the Delete and Update APIs for row-level data, to facilitate the connection with other Connectors.

  • FLIP-283: Adaptive Batch scheduler as default scheduler. The previous 1.16 version has introduced Adaptive BatchScheduler, but it is not the default scheduler, and the 1.17 version will be set as the default scheduler.

2. Flink 1.17 Overall Story

Flink version 1.17 is a big step towards Streaming Warehouse.
insert image description here
As shown in the figure, after Flink moves from Streaming Processing to Streaming Warehouse, we no longer need batch processing links, nor do we need to split stream processing links. Batch processing and stream processing links are unified, and stream batching is integrated of.

Data flows between each layer of the data warehouse through Flink in real time, and the data of each layer can be checked in real time. The data in the lake storage can be queried through other engines. The lake storage can be Paimon (from the Flink Table Store sub-project Incubated Apache project), or Hudi, etc., which provide real streaming services.

The advantage of this architecture is that two systems are no longer needed, and the architecture is simpler. At the same time, offline and real-time are integrated, only one storage is needed, and the cost is lower. The processing is done through the Flink SQL stream-batch engine, and the semantics and data can be consistent. In the vertical direction, the data of each layer can be checked in real time, and the architecture is transparent and open.
insert image description here
In order to better move towards the streaming data warehouse, we have made many enhancements in Batch.

  • Streaming Warehouse: introduces Delete and Update APIs, and provides add/modify/drop columns, primary keys, and Watermark syntax.
  • Batch performance optimization block: predictive execution, adaptive Batch scheduler, hybrid Shuffle mode, and Join-reorder algorithm.
  • Submission tool: SQL Client supports Gateway mode and supports managing Flink jobs through SQL statements.

insert image description here
Streaming performance is also constantly evolving:

  • Streaming SQL semantic enhancement: Fixed PLAN errors caused by non-deterministic operations, introduced PLAN ADVICE to provide SQL optimization suggestions and error warnings, and improved Watermark alignment.
  • Checkpoint improvement: A common incremental checkpoint is proposed, which mainly improves speed and stability. At the same time, Unaligned Checkpoint is available for production.
  • Statebackend upgrade: The version of FRocksDB has been upgraded to bring more features and support Apple chipsets, such as Mac M1.

3. Flink 1.17 Key Features

insert image description here
We have optimized the end-to-end performance of Batch, covering the entire process of SQL PLAN, Runtime operator, and scheduling.

  • Runtime's predictive execution: supports the sink operator, and improves the detection of slow tasks. Before, only the execution time of slow tasks was considered, but now the amount of data is also considered.
  • Adaptive Batch scheduler: Use the adaptive scheduler as the default scheduler. The scheduler can automatically set concurrency according to the amount of data processed by each job and node, which is more intelligent. In addition, the configuration is simplified to improve the overall usability.
  • Hybrid Shuffle: Hybrid Shuffle is a new Shuffle mode that combines the advantages of blocking and pipeline. In version 1.17, it supports custom Batch scheduler, predictive execution, and supports reuse of intermediate data to improve performance. In addition, the stability of the hybrid Shuffle mode in a large-scale production environment has been further improved.
  • Optimization at the SQL level: Planner introduces the dynamic programming Join-reorder algorithm. The PLAN tree optimized by the previous Join-reorder algorithm is equivalent to a left-leaning tree, and there are often only two paths for concurrent processing; while the dynamic programming Join-reorder algorithm It will make the PLAN tree more balanced and more concurrent. At the operator level, dynamic local hash aggregation optimization is implemented through code keys. For example, when count aggregation is performed, the aggregation can be skipped directly where the data is relatively sparse to improve performance. At the same time, the call of some virtual functions is eliminated on the operator, which further improves the performance.

insert image description here
After the optimization of the above layers, Flink 1.17 has an overall TPC-DS performance improvement of 26% compared to Flink 1.16.
Flink 1.16 took close to 7000 seconds, and 1.17 dropped to 5000+ seconds. As can be seen from the figure above, the performance improvement of some queries is very obvious. For example, the performance of Q58 is reduced from 150+ seconds to tens of seconds.
insert image description here
In addition, we have made many improvements to Checkpoint and State.

For example, the speed of General Incremental Checkpoint (GIC) has been greatly improved. After General Incremental Checkpoint is enabled, the performance of WordCount and Window jobs has increased by 4.23 times and 38.39 times, and the completion time of WordCount has been reduced by nearly 90%. Checkpoint time is reduced from 130s to 1.58s.
For streaming jobs, the speed and stability have been qualitatively improved after the universal incremental checkpoint is enabled.

insert image description here
In addition, we have also improved the stability of GIC. As shown in the figure above, the red line represents the time-consuming of turning on the general incremental Checkpoint, which is shorter and has fewer glitches, which shows that the stability of WordCount and Window jobs has been significantly improved. However, if the general incremental checkpoint is not enabled, the Windows job can take as long as 400s and is extremely unstable.
insert image description here
After the user writes a SQL Query, there may be dual-stream Join, aggregation, and dimension table association in the Query. So, how to judge whether there is a problem with a Query?

For this purpose, we provide the PLAN ADVICE function, which supports the PLAN_ADVICE option when executing the Explain statement. For example, you can do an Explain before executing Query to get some suggestions.

As shown in the figure above, the alarm message indicates that current_timestamp is a non-deterministic function, and the data in the source table is a Changelog stream. Because the primary keys of the source table and the result table are inconsistent, a SinkUpsertMaterializer operator will be generated to materialize the input in the state and output the correct result to Sink, but the SinkUpsertMaterializer node requires that the input cannot have non-deterministic updates, and users will get corresponding suggestions when using PLAN_ADVIC to avoid such correctness problems. In addition, the community is also planning to allow SinkUpsertMaterializer to support upsertKey mode, and this problem can be solved on the framework side in subsequent versions.

insert image description here
In addition to PLAN correctness advice, PLAN_ADVIC also provides SQL optimization advice. As shown in the figure above, PLAN_ADVIC recommends enabling local and global two-stage aggregation to improve SQL performance.
insert image description here
In terms of job monitoring, Flink 1.17 refines the flame graph to the Task level, which provides more help for online job tuning and problem location. For example, you can view the time-consuming distribution details of each Task thread.

4. Summary

insert image description here
In general, the work of Flink 1.17 mainly includes the following five aspects:

  • In order to better move towards Streaming Warehouse, relevant Streaming Warehouse APIs have been proposed one after another.
  • Optimized performance and improved stability for Batch.
  • The semantics of Streaming SQL has been enhanced and perfected.
  • The speed and stability of Checkpoint have been further improved.
  • Further extensions have been made to SQL Client and Gateway tools.

insert image description here
The work on Flink 1.18 has started, Feature Freeze is expected to be on July 11, and Release is expected to be at the end of September. Users can click here to follow specific Feature and FLIP progress.

The key work of Flink 1.18 will be carried out in the following four directions:

  • Streaming Warehouse API completed.
  • Batch performance optimization and ecological expansion.
  • Semantics and usability improvements for Streaming SQL.
  • The architecture evolution of Checkpoint with separation of storage and calculation.

5. Q&A

Q: Does Flink CDC support Delta Lake?
A: Flink CDC is mainly Source, and Delta Lake is written by Sink. The data captured by CDC can be written to the downstream supported by Flink, and Delta Lake is also possible.

Q: The new version optimizes the batch processing performance, and what improvements can be made in the scene application?
A: They are all universal performance optimizations, which can improve the performance and stability of Batch jobs.

Q: What impact does the hybrid Shuffle mode improvement have on the production environment?
A: You can combine your own business scenarios and existing machine resources to provide users with more flexible choices.

Q: Is there a way to realize real-time wide table? For example, the join of ten streams has no time window.
A: You can do multi-flow Join and configure different State TTLs in accordance with the update policy of each flow.

Q: Does Flink support es and clickhouse?
A: These two data sources are supported.

Q: When the fact table Left Join multiple dimension tables, is there any effective optimization to reduce the State size and latency?
A: SQL optimization such as filtering forward, setting operator-level State TTL (supported in 1.18).

Q: Which scenarios are codegen used for optimization?
A: Some SQL operators, UDFs, and SQL expressions all use codegen technology.

Q: Can dynamic scaling of Flink resources be used? For example, resources are used more during peak hours, and resources are automatically returned to yarn during low peak hours.
A: You can learn about Flink's K8s operator.

Guess you like

Origin blog.csdn.net/qq_40822132/article/details/131102066