Flink checkpoint is the core mechanism for fault tolerance. It can periodically perform data processing of each snapshot storage Operator (Snapshot). If Flink program downtime, data can be restored from these snapshots.
1. checkpoint coordinator (coordinator) the thread cycle generating Barrier (fence), sent to each source
2. source the current state snapshot (can be saved to HDFS)
3. source confirmed to the coordinator snapshot has been completed
4. source continues to send to the downstream barrier transformation operator
5. transformation operator repeat source until sink operator to confirm complete snapshot coordinator
6. coordinator snapshot confirm completion of the current cycle
Code Setting Example:
// 5 second start time checkpoint env.enableCheckpointing(5000) // set the checkpoint only checkpoint once env.getCheckpointConfig.setCheckpointingMode(CheckpointingMode.EXACTLY_ONCE) // set twice checkpoint minimum time interval env.getCheckpointConfig.setMinPauseBetweenCheckpoints(1000) // checkpoint timeout length env.getCheckpointConfig.setCheckpointTimeout(60000) // The maximum permissible checkpoint parallelism env.getCheckpointConfig.setMaxConcurrentCheckpoints(1) // When the program is closed, triggering an additional checkpoint env.getCheckpointConfig.enableExternalizedCheckpoints(CheckpointConfig.ExternalizedCheckpoin tCleanup.RETAIN_ON_CANCELLATION) // set the checkpoint address env.setStateBackend(new FsStateBackend("hdfs://cdh1:8020/flink-checkpoint/")) |
|