How is Flink's checkpoint implemented?

Analysis & Answer

Checkpoint Introduction

The Checkpoint fault-tolerant mechanism is the cornerstone of Flink's reliability. It can ensure that when an operator fails for some reason (such as abnormal exit), the Flink cluster can restore the state of the entire application flow graph to a certain state before the failure, ensuring Apply flow graph state consistency. The principle of Flink's Checkpoint mechanism comes from the "Chandy-Lamport algorithm" algorithm .

Barriers

The core element of flink's distributed snapshot is stream barriers, which are injected into the stream and flow with the stream as part of the stream. Barriers divide the records of the data flow into the records entering the current snapshot and the records entering the next snapshot. Each barrier carries the ID of the snapshot, and the data of the snapshot is pushed in front of the barriers. Barriers are very lightweight and do not interrupt the flow of streams. At the same time, there will be multiple checkpoints running concurrently.

The barrier is injected into the data source of the parallel stream. The barriers injected into the snapshot n (called Sn) are a position in the data source, and in Kafka it is the offset of the last record of a certain partition. This position Sn will be reported to JM's checkpoint coordinator (coordinating checkpoint function). Barriers flow downstream with the stream, and when an intermediate operator receives checkpoint n barriers from all of its input streams, the operator sends the barrier to its downstream operators. Once the end of the DAG is reached, the sink will report the state handle of this stream to the checkpoint coordinator of JM. When the sink receives checkpoint n barrier from all its input streams, Jm will return a completed checkpoint meta, and then the checkpoint will be marked as completed. The state is stored in the corresponding state backend.

barrier alignment

When an opeator has multiple input streams, checkpoint barrier n will be aligned, that is, the arrived ones will be cached in the buffer to wait for other unreached ones, and once all streams arrive, they will be broadcast downstream, exactly-once is Using this feature to achieve, at least once, some data will be processed repeatedly because it will not be aligned.

checkpoint data structure

When an operator receives all the checkpoint n barriers sent upstream, it will take a snapshot of the state and save the offset state and other values. By default, it is saved in the memory of Jm. Since it may be relatively large, you can In the status backend, it is recommended to put hdfs in the generation.

The complete data structure of the final checkpoint snapshot is similar to a table. Each operator fills in its own part after processing, and finally saves it in the state backend for use during failover.

Reflect & Expand

Internal implementation of Flink's fault tolerance mechanism (checkpoint)

When each application that needs Checkpoint is started, Flink's JobManager creates a CheckpointCoordinator (checkpoint coordinator) for it, and CheckpointCoordinator is fully responsible for the snapshot production of the application.

CheckpointCoordinator (checkpoint coordinator), CheckpointCoordinator is fully responsible for the snapshot production of this application.

CheckpointCoordinator (checkpoint coordinator) periodically sends barriers to all source operators of the stream application.

When a source operator receives a barrier, it suspends the data processing process, then makes a snapshot of its current state and saves it in the designated persistent storage, and finally reports its snapshot creation status to the CheckpointCoordinator, and at the same time reports to itself All downstream operators broadcast the barrier and resume data processing

After the downstream operator receives the barrier, it will suspend its own data processing process, then make a snapshot of its own related state, and save it in the designated persistent storage, and finally report its own snapshot status to the CheckpointCoordinator, and at the same time report its own snapshot status to all downstream operators. The child broadcasts the barrier and resumes data processing.

Each operator continuously makes snapshots according to step 3 and broadcasts them downstream until the barrier is passed to the sink operator and the snapshot is completed.

When the CheckpointCoordinator receives the reports from all operators, it considers that the snapshot of this cycle is successfully made; otherwise, if it does not receive the reports of all operators within the specified time, it considers that the snapshot of this cycle has failed.

RocksDB implements the principle of incremental checkpoint:

A RocksDb storage checkpoint is provided in the state backend, which is the only method provided by Flink that can implement incremental checkpoints. The principle is that every time a checkpoint is generated, an sst file will be generated (it will not be modified again), and it will be compared with the previous file, and the newly added sst file can be uploaded each time, which is probably the case.

Meow Interview Assistant: One-stop solution to interview questions, you can search the WeChat applet [Meow Interview Assistant]  or follow [Meow Brush Questions] -> Interview Assistant  free questions. If you have good interview knowledge or skills, look forward to your sharing!

Guess you like

Origin blog.csdn.net/jjclove/article/details/127406818