[Flink] fault tolerance

How to save and restore the state

1. Timing production distributed snapshot of the program state backup
2. failure:
the entire job all the task are rolled back to the last successful checkpoint, and then started from that point;
3. Prerequisites: support data retransmission
4. conformance statement: exactly once, at least once

checkpoint enforcement mechanisms

1.checkpoint coordinate sent to all the Trigger checkpoint Source
2. All task after receiving the barrier, will take a snapshot, and pass their output to the new barrier, their status will persist.
3. After the backup when the task is completed, the data will address notification checkpoint coordinate.

Guess you like

Origin www.cnblogs.com/yankang/p/11576103.html