Official Announcement|Apache Flink 1.17 Release Announcement

Author: Xu Bangjiang (Xue Jin) @ Alibaba Cloud

The Apache Flink PMC (Project Management Committee) is pleased to announce the release of Apache Flink 1.17.0. Apache Flink is a leading stream processing standard, and the stream-batch unified data processing concept has been recognized by more and more companies. Thanks to our great community and great contributors, Apache Flink has been growing rapidly and is one of the most active in the Apache community. Flink 1.17 has 172 contributors enthusiastically participated, completed 7 FLIPs and more than 600 issues, bringing many exciting new features and improvements to the community.

Towards Streaming Warehouse

In order to   achieve more efficient processing in the field of streaming data warehouses , Flink 1.17 has made substantial improvements to the performance and semantics of batch and stream processing. These enhancements represent a major step toward the goal of creating a more efficient and streamlined data warehouse capable of processing large volumes of data in real time.

For batch processing, this release includes several new features and improvements described below.

  • Streaming Warehouse API:  FLIP-282  introduces new Delete and Update APIs in Flink SQL, which can work in Batch mode. On this basis, external storage systems such as Flink Table Store can implement row-level deletion and update through these new APIs. At the same time, the ALTER TABLE syntax has been enhanced, including the ability to ADD/MODIFY/DROP columns, primary keys, and watermark. These enhancements make it easier for users to maintain metadata.
  • Batch  performance optimization:  In Flink 1.17, the execution of batch jobs has been significantly improved in terms of performance, stability, and availability. In terms of performance, through policy optimization and operator optimization, such as the new join-reorder algorithm and adaptive local hash aggregation optimization, Hive aggregation function improvement, and hybrid shuffle mode optimization, these improvements bring 26% TPC- DS performance improvements. In terms of stability, Flink 1.17 predictive execution can support all operators, and adaptive batch scheduling can better deal with data skew scenarios. In terms of usability, the tuning effort required for batch jobs has been greatly reduced. Adaptive batch scheduling has been enabled by default, hybrid shuffle mode is now compatible with predictive execution and adaptive batch scheduling, and various configurations required are simplified.
  • SQL Client/Gateway:  Apache Flink 1.17 supports the gateway mode of SQL Client, allowing users to submit SQL to the remote SQL Gateway. At the same time, users can use SQL statements in SQL Client to manage jobs, including querying job information and stopping running jobs. This means that SQL Client/Gateway has evolved into a job management and submission tool.

For stream processing, Flink 1.17 has completed the following functions and improvements:

  • Streaming SQL  Semantic Enhancement:  Non-deterministic operations may lead to incorrect results or exceptions, which is a very challenging topic in Streaming SQL. Flink 1.17 fixes incorrect optimization plans and functions, and introduces the experimental function PLAN_ADVICE, which can provide SQL users with potential correctness risk prompts and SQL optimization suggestions.
  • Checkpoint  improvements:  General Incremental Checkpoint (GIC) enhances the speed and stability of Checkpoint, and the stability of Unaligned Checkpoint (UC) during job backpressure is also improved to the production-available level in Flink 1.17. In addition, this version introduces a new REST API that allows users to trigger checkpoints of custom Checkpoint types.
  •  Perfect watermark alignment : Efficient watermark processing directly affects the execution efficiency of event time jobs. In Flink 1.17,  FLIP-217  improves the watermark alignment function by launching data alignment on the split inside the Source operator. This improvement makes the progress of watermark in Source more coordinated, thereby reducing the excessive data cached by downstream operators and enhancing the overall efficiency of stream job execution.
  • StateBackend  upgrade:  This release  upgrades the version of FRocksDB  to 6.20.3-ververica-2.0, bringing many improvements to RocksDBStateBackend. For example, sharing memory between slots, supporting Apple Silicon chipsets, such as the Mac M1. Flink version 1.17 also provides parameters to expand the scope of shared memory between slots of TaskManager, improving the efficiency of slot memory usage in TaskManager when it is uneven.

batch processing

As a stream-batch integrated computing engine, Apache Flink continues to lead in the field of stream processing. In order to further enhance its batch processing capabilities, Flink community contributors have made great efforts in the performance optimization and ecological improvement of Flink 1.17 batch processing. This makes it easier for users to build Streaming Warehouse based on Flink.

predictive execution

In this release, speculative execution supports the Sink operator. In previous versions, speculative execution did not happen on sink operators in order to avoid instability or incorrect results. Flink 1.17 enriches the context information of Sink, so that  the new version of Sink  and  OutputFormat Sink  can obtain the serial number (attempt number) of the current execution instance. According to this serial number, the Sink operator can process the data generated by multiple different instances of the same subtask. Isolation, even if those instances are running concurrently. The FinalizeOnMaster interface has also been improved so that the OutputFormat Sink can know which serial number instances have successfully produced data, so that the resulting data can be submitted correctly. When the developer of the Sink determines that the Sink can correctly support multiple concurrent instances running at the same time, it can implement the decorative interface SupportsConcurrentExecutionAttempts, allowing it to perform predictive execution. Several built-in sinks already support speculative execution, including DiscardingSink, PrintSinkFunction, PrintSink, FileSink, FileSystemOutputFormat, and HiveTableSink.

Additionally, the detection of slow tasks that are speculatively executed has been improved. Previously, only the execution time of tasks was considered when deciding which tasks were slow. The slow task detector now also takes into account the task's input data volume. Tasks that take longer to execute may not necessarily be considered slow tasks if they consume more data. This improvement helps to eliminate the negative impact of data skew on slow task detection.

Adaptive Batch Scheduler

With this release, the Adaptive Batch Scheduler becomes the default scheduler for batch jobs. The scheduler can automatically set the appropriate parallelism for each job vertex according to the amount of data processed by it. It is also the only scheduler that supports speculative execution.

The configuration of the Adaptive Batch Scheduler has been improved to improve its ease of use. Users no longer need to explicitly set the global default parallelism to -1 to enable automatic derivation of parallelism. Now, if a global default parallelism is set, it is used as an upper bound for the automatic derivation of parallelism. The names of some configuration items have also been improved to be easier for users to understand.

Additionally, the capabilities of the adaptive batch scheduler have been enhanced. Now it can distribute data more evenly to downstream tasks based on fine-grained data distribution information. The automatically derived parallelism is now also no longer limited to powers of 2.

Hybrid Shuffle Mode

In this release, the Hybrid Shuffle mode brings several important improvements:

  • Hybrid Shuffle mode now supports Adaptive Batch Scheduler and Speculative Execution.
  • Hybrid Shuffle mode now supports reuse of intermediate data, which brings significant performance improvements.
  • Improved stability, avoiding stability issues in large-scale production environments.

More details can be found in  the Hybrid Shuffle  section.

TPC-DS

Since Flink 1.16, the Flink community has continued to optimize the performance of the batch engine. In Flink 1.16, dynamic partition pruning optimization was introduced, but not all TPC-DS queries can be optimized. Flink 1.17 improves the optimization algorithm, so that most TPC-DS queries can be optimized. In addition, the dynamic programming join-reorder algorithm was introduced in Flink 1.17, which performs better than the previous version of the algorithm, but has a larger search space. The optimizer can automatically select the appropriate join-reorder algorithm according to the number of joins in the query, and the user does not need to care about the details of the join-reorder algorithm (note: join-reorder is not enabled by default, and needs to be enabled explicitly when running TPC-DS). At the operator level, Flink 1.17 introduces a dynamic local hash aggregation strategy, which can dynamically determine whether aggregation operations need to be performed locally to improve performance based on the distribution of data. At the runtime level, this release removes some unnecessary virtual function calls to speed up execution. From the overall test results, compared with Flink 1.16, Flink 1.17 has a 26% performance improvement for the partition table under the 10T data set.

SQL Client/Gateway

Apache Flink 1.17 introduces a new feature called "gateway mode", which allows users to submit SQL queries to remote SQL Gateway to use various functions of Gateway like embedded mode. This new mode provides more convenience for users when using SQL Gateway.

Additionally, SQL Client/SQL Gateway now supports SQL statements to manage job lifecycle. Users can use SQL statements to display all job information stored in the JobManager, and can use the job's unique job ID to stop a running job. With this new feature, SQL Client/SQL Gateway now has almost the same functionality as Flink CLI, becoming another more powerful tool for managing Flink jobs.

SQL API

In modern big data workflows, the row-level delete and update capabilities of SQL engines are becoming increasingly important. Use cases include deleting a specific set of data to comply with regulatory requirements, updating a row of data for data correction, etc. Many popular computing engines such as Trino, Hive, etc. already provide such support. Flink 1.17 introduces new Delete and Update APIs for Batch mode and exposes them to connectors, so that external storage systems can implement row-level updates and deletions based on this API. Additionally, this release extends the ALTER TABLE syntax to include the ability to ADD/MODIFY/DROP columns, primary keys, and Watermarks. These enhancements increase the flexibility for users to maintain metadata on demand.

Apache Flink 1.17 supports the gateway mode of SQL Client, which allows users to submit SQL queries to SQL Gateway to use various functions of Gateway. Users can use SQL statements to manage the life cycle of jobs, including displaying job information and stopping running jobs, which provides a powerful tool for managing Flink jobs.

Hive Compatible

Apache Flink 1.17 introduces a series of improvements to the Hive connector to make it more production-ready. In previous versions, for Hive writing, only automatic file merging was supported in streaming mode, not in batch mode. Starting from Flink 1.17, files can also be merged automatically in batch mode, which can greatly reduce the number of small files. At the same time, for scenarios where Hive built-in functions are used by loading  HiveModule  , this release introduces some native Hive aggregation functions such as SUM/COUNT/AVG/MIN/MAX into HiveModule, and these functions can be used in hash-based aggregation operators execution, resulting in significant performance improvements.

stream processing

Flink 1.17 solves some difficult Streaming SQL semantics and correctness issues, optimizes Checkpoint performance, improves the watermark alignment mechanism, extends Streaming FileSink, and upgrades Calcite and FRocksDB to newer versions. These improvements further consolidate Flink's leading position in the field of stream processing.

Streaming SQL Semantic Improvement

In order to solve the problem of correctness and improve the semantics of Streaming SQL, Flink 1.17 introduces an experimental function called  PLAN_ADVICE  , which can detect potential correctness risks of user SQL and provide optimization suggestions. For example, if the user discovers  NDU (non-deterministic update)  problems in the query through the EXPLAIN PLAN_ADVICE command, the optimizer will append a suggestion at the end of the physical plan output, mark the suggestion on the corresponding operation node, and prompt the user to update the query and configuration. By providing these specific suggestions, the optimizer can help users improve the accuracy of query results.

== Optimized Physical Plan With Advice ==
...
advice[1]: [WARNING] The column(s): day(generated by non-deterministic function: CURRENT_TIMESTAMP ) can not satisfy the determinism requirement for correctly processing update message('UB'/'UA'/'D' in changelogMode, not 'I' only), this usually happens when input node has no upsertKey(upsertKeys=[{}]) or current node outputs non-deterministic update messages. Please consider removing these non-deterministic columns or making them deterministic by using deterministic functions.

The PLAN_ADVICE feature can also help users improve query performance and efficiency. For example, if it detects that an aggregation operation can be optimized into a more efficient local-global aggregation operation, the optimizer will provide corresponding optimization suggestions. By applying these specific recommendations, the optimizer can help users improve the performance and efficiency of their queries.

== Optimized Physical Plan With Advice ==
...
advice[1]: [ADVICE] You might want to enable local-global two-phase optimization by configuring ('table.optimizer.agg-phase-strategy' to 'AUTO').

In addition, Flink 1.17 also fixes several plan optimization issues that may affect data correctness, such as:  FLINK-29849  ,  FLINK-30006  , and  FLINK-30841  .

Watermark alignment enhancements

In an earlier version,  FLIP-182  proposed a solution called watermark alignment to address source data skew in event time jobs. However, there is a limitation in this scheme, that is, the source parallelism must match the number of partitions. This is because in a source operator with multiple partitions, if one partition emits data faster than another partition, a large amount of data needs to be cached at this time. In order to solve this limitation, Flink 1.17 introduced  FLIP-217  , which enhances watermark alignment to perform data emission alignment on multiple partitions in the Source operator while considering watermark boundaries. This enhancement ensures that the progress of Watermark in Source is more coordinated, avoiding excessive caching of data by downstream operators, thereby improving the execution efficiency of streaming jobs.

Streaming FileSink extension

After adding ABFS support, Streaming  FileSink  can now support five different file systems: HDFS, S3, OSS, ABFS, and Local. This extension effectively covers mainstream file systems, providing users with more choices and greater flexibility.

Checkpoint improvements

Generic Incremental Checkpoint (Generic Incremental Checkpont, GIC for short) is designed to improve the speed and stability of the Checkpoint process. Some experimental results in the WordCount case are shown below. Please refer to this  benchmark article  for more details, which shows the benefits and costs of GIC by combining theoretical analysis and practical results.

Table-1: Benefits after enabling GIC in WordCount

Table-2: Overhead after enabling GIC in WordCount

Unaligned Checkpoint (UC) can greatly improve the completion rate of Checkpoint under back pressure. The previous version of UC will write too many small files, which may further cause the HDFS namenode load to be too high. The community fixed this issue in version 1.17, making UC more usable in production.

Flink version 1.17 provides a  REST API  based on which users can manually trigger a Checkpoint with a custom Checkpoint type when the job is running. For example, for a job that uses incremental checkpoints, users can periodically or manually trigger full checkpoints to remove the association between multiple incremental checkpoints, thereby avoiding reference to files that have been around for a long time.

RocksDBStateBackend upgrade

Flink version 1.17  upgrades the version of FRocksDB  to 6.20.3-ververica-2.0, which brings some improvements to RocksDBStateBackend:

  1. Support for building FRocksDB Java on Apple silicon
  2. Improve Compaction Filter performance by avoiding expensive ToString() operations
  3. Upgrade the ZLIB version of FRocksDB to avoid Memory Corruption
  4. Add periodic_compaction_seconds option for RocksJava

You can refer to  FLINK-30836  for more details.

Flink version 1.17 also provides parameters to expand the range of shared memory between slots of TaskManager, which can improve memory efficiency when slot memory usage in TaskManager is uneven. Based on this, after adjusting the parameters, the overall memory consumption can be reduced at the cost of resource isolation. Please refer to  state.backend.rocksdb.memory.fixed-per-tm  for more information.

Calcite Upgrade

Flink 1.17  upgrades the Calcite  version to 1.29.0 to improve the performance and efficiency of the Flink SQL system. Flink 1.16 uses the Calcite 1.26.0 version, which has serious problems such as RexNode simplification caused by the SEARCH operator. These problems will lead to wrong data after query optimization, such as the cases reported by CALCITE-4325 and   CALCITE -  4352  . By upgrading to this version of Calcite, Flink can take advantage of its functional improvements and new features in Flink SQL. This not only fixes multiple bugs, but also speeds up query processing.

other

PyFlink

In Flink 1.17, PyFlink has also completed several functions. PyFlink is the Python language interface of Apache Flink. In PyFlink, some important improvements include support for Python 3.10, support for running PyFlink on Mac M1 and M2 computers, etc. In addition, some small function optimizations have been completed in this version, such as improving the stability of cross-process communication between Java and Python processes, supporting the declaration of the result type of Python UDF as a string, and supporting Python UDF Access to job parameters and more. Generally speaking, this version mainly focuses on improving the usability of PyFlink, rather than introducing some new functions. It is expected that through these usability improvements, the user experience will be improved, so that users can process data more efficiently.

Performance Monitoring Benchmark

 During this version cycle, we also added daily performance monitoring reports to the Slack channel (  #flink-dev-benchmarks ) to help developers quickly discover performance regression problems, which is very meaningful for code quality assurance. After discovering a performance regression through a Slack channel or  Speed ​​Center  , developers can follow  Benchmark's wiki  to deal with it.

Task level flame graph

Starting from Flink 1.17, the Flame Graph function provides task-level visualization support, allowing users to understand the performance of each task in more detail. This feature is a significant improvement over previous versions of Flame Graph, as it allows users to select subtasks of interest and view the corresponding flame graph. In this way, users can identify specific areas where tasks may experience performance issues and then take steps to resolve them. This can significantly improve the overall efficiency of user data processing pipelines.

Universal Token Mechanism

Prior to Flink 1.17, Flink only supported Kerberos authentication and Hadoop-based tokens. With  the implementation of FLIP-272  , Flink's delegation token framework is more general, making its authentication protocol no longer limited to Hadoop. This will allow contributors to add support for non-Hadoop frameworks whose authentication protocols are not based on Kerberos in the future. Additionally,  FLIP-211  improves Flink's interaction with Kerberos, reducing the number of requests required to exchange delegation tokens in Flink.

Upgrade Instructions

The Apache Flink community has worked hard to ensure that the upgrade process is as smooth as possible, but upgrading to version 1.17 may require users to make some adjustments to existing applications. Please refer to  the Release Notes  for a more detailed list of required changes and possible issues when upgrading.

list of contributors

The Apache Flink community would like to thank each of the contributors who contributed to this release:

Ahmed Hamdy, Aitozi, Alexander Pilipenko, Alexander Fedulov, Alexander Preuß, Anton Kalashnikov, Arvid Heise, Bo Cui, Brayno, Carlos Castro, ChangZhuo Chen, Chen Qin, Chesnay Schepler, Clemens, ConradJam, Danny Cranmer, Dawid Wysakowicz, Dian Fu, Dong Lin, Dongjoon Hyun, Elphas Toringepi, Eric Xiao, Fabian Paul, Ferenc Csaky, Gabor Somogyi, Gen Luo, Gunnar Morling, Gyula France, Hangxiang Yu, Hong Liang Teoh, HuangXingBo, Jacky Lau, Jane Chan, Jark Wu , Jiale , Jin , Jing Ge , Jinzhong Li , Joao Boto , John Roesler , Jun He , JunRuiLee , Junrui Lee , Juntao Hu , Krzysztof Chmielewski , Leonard Xu , Licho , Lijie Wang , Mark Canlas , Martijn Visser , MartijnVisser , Martin Liu Marton Balassi, Mason Chen, Matt, Matthias Pohl, Maximilian Michels, Mingliang Liu, Mulavar, Nico Kruber, Noah, Paul Lin, Peter Huang, Piotr Nowojski, Qing Lim, QingWei, Qingsheng Ren,Rakesh, Ran Tao, Robert Metzger, Roc Marshal, Roman Khachatryan, Ron, Rui Fan, Ryan Skraba, Salva Alcántara, Samrat, Samrat Deb, Samrat002, Sebastian Mattheis, Sergey Nuyanzin, Seth Saperstein, Shengkai, Shuiqiang Chen, Smirnov Alexander Ganesh, Steven van Rossum, Tartarus0zm, Timo Walther, Venkata krishnan Sowrirajan, Wei Zhong, Weihua Hu, Weijie Guo, Xianxun Ye, Xintong Song, Yash Mayya, YasuoStudyJava, Yu Chen, Yubin Li, Yufan Sheng, Yun Gao, Yun Tang, Yuxin Tan, Zakelly, Zhanghao Chen, Zhenqiu Huang, Zhu Zhu, ZmmBigdata, bzhaoopenstack, chengshuo.cs, chenxujun, chenyuzhi, chenyuzhi459, chenzihao, dependabot[bot], fanrui, fengli, frankeshi, fredia, godfreyhe, gongzhonghe201ang, , hiscat, huangxingbo, hunter-cloud09, ifndef-SleePy, jeremyber-aws, jiangjiguang, jingge, kevin.cyj, kristoffSC, kurt, laughingman7743,libowen, lincoln lee, lincoln.lil, liujiangang, liujingmao, liuyongvs, liuzhuang2017, luoyuxia, mas-chen, moqimoqidea, muggleChen, noelo, ouyangwulin, ramkrish86, saikikun, sammieliu, shihong90, shuiqiangchen, snuyanzin, sunxia, ​​sxnan, tison, todd5167 , tonyzhu918, wangfeifan, wenbingshen, xuyang, yiksanchan, yunfengzhou-hub, yunhong, yuxia Luo, yuzelin, zhangjingcun, zhangmang, zhengyunhong.zyh, zhouli, zoucao, Shen Jiaqi

Click to view more technical content

Guess you like

Origin blog.csdn.net/weixin_44904816/article/details/129742573
Recommended