Flink Practical Combat (11)-Exactly-Once Semantics Two-Phase Submission

0 outline

[Apache Flink] Starting from version 1.4.0 released in December 2017, a milestone feature was introduced for flow computing: TwoPhaseCommitSinkFunction. It extracts the common logic of the two-phase commit protocol, making it possible to build end-to-end Exactly-Once programs through Flink. Also supports:

  • data source
  • and output terminal (sink)

Includes Apache Kafka 0.11 and higher. It provides an abstraction layer so that users only need to implement a few methods to achieve end-to-end Exactly-Once semantics.

New functions and Flink implementation logic:

  • Describe how the Flink checkpoint mechanism ensures that Flink program results are exactly-once
  • Shows how Flink interacts with data sources and data outputs via a two-phase commit protocol to provide end-to-end Exactly-Once guarantees
  • Through a simple example, learn how to use TwoPhaseCommitSinkFunction to implement Exactly-Once file output

1 Exactly-Once semantics in Flink applications

Exactly-Once means that each input event affects the final result only once. Even if a machine or software fails, there is neither data duplication nor data loss.

Flink has provided Exactly-Once for a long time, and the checkpoint mechanism is the core of Flink's ability to provide Exactly-Once semantics.

A checkpoint is a consistent snapshot of:

  • The current state of the application
  • The position of the input stream

Flink can configure a fixed time point, generate checkpoints regularly, and write the checkpoint data to a persistent storage system, such as S3 or HDFS. Writing checkpoint data to persistent storage is asynchronous, that is, the Flink application can continue to process data during the checkpoint process.

If a machine or software failure occurs, after restarting, the Flink application will resume processing from the latest checkpoint; Flink will restore the application state, roll back the input stream to the location where the last checkpoint was saved, and then restart running. This means that Flink can compute results as if the failure never occurred.

Before Flink 1.4.0, Exactly-Once semantics were limited to within the Flink application and did not extend to most external systems that were sent after Flink data was processed. Flink applications interact with various data output terminals, and developers maintain component contexts themselves to ensure Exactly-Once semantics.

In order to provide end-to-end Exactly-Once semantics – that is, in addition to the inside of the Flink application, the external systems written by Flink also need to satisfy the Exactly-Once semantics – these external systems must provide commit or rollback methods, and then pass Flink’s checkpoint mechanism. coordination.

In distributed systems, a common method for coordinating commits and rollbacks is the 2pc protocol. Discuss how Flink’s TwoPhaseCommitSinkFunction leverages 2pc to provide end-to-end Exactly-Once semantics.

2 End-to-end Exactly-Once semantics for Flink applications

Kafka is often used with Flink. Kafka version 0.11 adds transaction support. This means that there is now necessary support for reading and writing Kafaka through Flink and providing end-to-end Exactly-Once semantics.

Flink’s support for end-to-end Exactly-Once semantics is not limited to Kafka, it can be used with any source/output that provides the necessary coordination mechanism. For example, Pravega, an open source streaming media storage system from DELL/EMC, can also support end-to-end Exactly-Once semantics through Flink's TwoPhaseCommitSinkFunction.

image-20231124142310942

Example programs are:

  • Data sources read from Kafka (Flink’s built-in KafkaConsumer)
  • window aggregation
  • Write data back to Kafka's data output (Flink's built-in KafkaProducer)

To provide the Exactly-Once guarantee on the data output side, all data must be submitted to Kafka through a transaction. The commit bundles all the data to be written between the two checkpoints. This ensures that written data can be rolled back in the event of a failure. However, in a distributed system, there are usually multiple write tasks running concurrently, and all components must be "consistent" when committing or rolling back to ensure consistent results. Flink uses 2PC and the pre-commit stage to solve this problem.

pre-commit

When checkpoint begins, it is the "pre-commit" phase of 2PC. When the checkpoint starts, Flink's JobManager will inject the checkpoint barrier (which divides the records in the data stream into the current checkpoint and the next checkpoint) into the data stream.

The barrier is passed between operators. For each operator, it triggers the operator's state snapshot to be written to the state backend.

The data source saves the offset of consuming Kafka, and then passes the checkpoint barrier to the next operator.

This approach only works if the operator has "internal" state.

internal state

Refers to the Flink state backend saved and managed. For example, the sum value calculated by window aggregation in the second operator. When a process has its internal state, there is no need to perform other operations in the pre-commit phase except writing data changes to the state backend before checkpointing.

Flink is responsible for correctly committing these writes if the checkpoint succeeds or aborting them if the checkpoint fails.

image-20231124150402626

3 Flink application starts the pre-commit phase

When a process has "external" status, additional processing is required. External state usually comes in the form of writes to external systems such as Kafka. At this time, in order to provide the Exactly-Once guarantee, the external system must [support transactions] in order to integrate with the two-phase commit protocol.

The sample data needs to be written to Kafka, so the data output (Data Sink) has external state. At this point, in the pre-commit phase:

  • In addition to writing its status to the state backend
  • The data output side must also pre-commit its external transaction

The pre-commit phase ends when the checkpoint barrier is passed to all operators and the triggered checkpoint callback completes successfully. All triggered state snapshots are considered part of this checkpoint. A checkpoint is a snapshot of the entire application state, including pre-committed external state. In the event of a failure, you can roll back to the time when the last snapshot was successfully completed.

The next step is to notify all operators that the checkpoint has been successful. This is the submission phase of 2PC, and the JobManager issues a checkpoint completed callback for each operator in the application.

The datasource and widthnow operators have no external state, so these operators do not have to perform any operations during the commit phase. However, the data output (Data Sink) has external state and should commit external transactions at this time.

Summarize

  • Once all operators have completed pre-commit, submit a commit.
  • If at least one precommit fails, all other commits will be aborted and we will roll back to the last checkpoint that completed successfully.
  • After the pre-commit is successful, the submitted commit needs to guarantee final success – both the operator and the external system need to guarantee this. If the commit fails (for example, due to intermittent network issues), the entire Flink application will fail, the application will be restarted according to the user's restart policy, and another commit will be attempted. This process is crucial because if the commit is ultimately unsuccessful, data will be lost.

Therefore, we can be sure that all operators agree on the final outcome of the checkpoint: all operators agree that the data has been committed, or that the submission has been aborted and rolled back.

4 Implement two-phase submission Operator in Flink

The complete implementation of the two-phase commit protocol can be a bit complex, which is why Flink extracts its common logic into the abstract class TwoPhaseCommitSinkFunction.

Next, we explain how to use TwoPhaseCommitSinkFunction based on a simple example of outputting to a file. Users only need to implement four functions to implement Exactly-Once semantics for the data output end:

  • beginTransaction – Before the transaction begins, we create a temporary file in the temporary directory of the target file system. We can then write the data to this file as we process it.
  • preCommit – During the precommit phase, we flush the file to storage, close the file, and never rewrite it. We will also start a new transaction for any subsequent file writes that belong to the next checkpoint.
  • commit – During the commit phase, we atomically move the pre-commit phase files to the real target directory. Note that this will increase the latency of output data visibility.
  • abort – During the abort phase, we delete temporary files.

We know that if any failure occurs, Flink will restore the state of the application to the latest checkpoint. An extreme case is that the pre-commit succeeds, but a failure occurs before the commit notification reaches the operator. In this case, Flink will restore the operator's state to a state that has been pre-committed but not yet actually committed.

We need to save enough information to the checkpoint state during the pre-commit phase so that the transaction can be correctly aborted or committed after restarting. In this example, the information is the path to the temporary file and the target directory.

TwoPhaseCommitSinkFunction has taken this situation into account and will give priority to issuing a commit when restoring the state from the checkpoint point. We need to implement submission in an idempotent way, and generally speaking, this is not difficult. In this example, we can identify situations where the temporary file is not in the temporary directory, but has been moved to the target directory.

In TwoPhaseCommitSinkFunction, there are some other edge cases that are also taken into account, please refer to the Flink documentation for more information.

FAQ

If flink sink comes with a checkpoint barrier, it will store the state. Will this action be parallel to ordinary write? Or serial?

In Flink's checkpoint mechanism, when a Checkpoint Barrier comes over, the sink will trigger a snapshot of the state. This snapshot action is performed in parallel with the ordinary write operation by default.

Specifically:

  • Flink's checkpoint mechanism is implemented by injecting Checkpoint Barrier into the datastream.

  • When the source receives the Checkpoint Barrier, it will pass it to the downstream transformation and sink.

  • When the sink receives the Checkpoint Barrier, a new thread will be started to perform state snapshot (state saving).

  • This state snapshot thread will Snapshot State from the state backend and store the checkpoint.

  • When the sink's main thread receives the Checkpoint Barrier, it will continue to process normal writes.

  • In this way, the state snapshot and normal write operations are performed in parallel.

But you can also set the execution strategy of snapshot and write through Sink configuration. There are two main modes:

  1. Parallel mode (default): snapshot and write are performed simultaneously

  2. Serial mode: write after the snapshot is completed

In summary, in the default parallel checkpoint mode of Flink sink, status snapshot and ordinary write operations are executed in parallel. Its behavior can be changed through configuration. This can be balanced according to actual needs.

Summarize

  • Flink's checkpoint mechanism is the basis for supporting the two-phase commit protocol and providing end-to-end Exactly-Once semantics.
  • The advantages of this solution are: Flink does not transfer storage data over the network like some other systems – there is no need to write each stage of the calculation to disk like most batch programs.
  • Flink's TwoPhaseCommitSinkFunction extracts the common logic of the two-phase commit protocol. Based on this, it is possible to combine Flink with external systems that support transactions, making it possible to build an end-to-end Exactly-Once.
  • Starting from Flink 1.4.0, both Pravega and Kafka 0.11 producers provide Exactly-Once semantics; Kafka introduced transactions for the first time in version 0.11, providing the possibility to use Kafka producer in Flink programs to provide Exactly-Once semantics.
  • Kafaka 0.11 producer's transactions are implemented based on TwoPhaseCommitSinkFunction, which only adds very low overhead compared to at-least-once producer.

This article is published by OpenWrite, a blog that publishes multiple articles !

OpenAI opens ChatGPT to all users for free. Voice programmers tampered with ETC balances and embezzled more than 2.6 million yuan a year. Spring Boot 3.2.0 was officially released. Google employees criticized the big boss after leaving the company. He was deeply involved in the Flutter project and formulated HTML-related standards. Microsoft Copilot Web AI will be Officially launched on December 1st, supporting Chinese Microsoft's open source Terminal Chat Rust Web framework Rocket releases v0.5: supports asynchronous, SSE, WebSockets, etc. The father of Redis implements the Telegram Bot framework using pure C language code . If you are an open source project maintainer, encounter How far can you endure this kind of response? PHP 8.3 GA
{{o.name}}
{{m.name}}

おすすめ

転載: my.oschina.net/u/3494859/blog/10151137