Flink fault tolerance (checkpoint)

Flink checkpoint is the core mechanism for fault tolerance. It can periodically perform data processing of each snapshot storage Operator (Snapshot). If Flink program downtime, data can be restored from these snapshots.

1. checkpoint coordinator (coordinator) the thread cycle generating Barrier (fence), sent to each source

2. source the current state snapshot (can be saved to HDFS)

3. source confirmed to the coordinator snapshot has been completed

4. source continues to send to the downstream barrier transformation operator

5. transformation operator repeat source until sink operator to confirm complete snapshot coordinator

6. coordinator snapshot confirm completion of the current cycle

Code Setting Example:

// 5 second start time checkpoint

env.enableCheckpointing(5000)

// set the checkpoint only checkpoint once

env.getCheckpointConfig.setCheckpointingMode(CheckpointingMode.EXACTLY_ONCE)

// set twice checkpoint minimum time interval

env.getCheckpointConfig.setMinPauseBetweenCheckpoints(1000)

// checkpoint timeout length

env.getCheckpointConfig.setCheckpointTimeout(60000)

// The maximum permissible checkpoint parallelism

env.getCheckpointConfig.setMaxConcurrentCheckpoints(1)

// When the program is closed, triggering an additional checkpoint

env.getCheckpointConfig.enableExternalizedCheckpoints(CheckpointConfig.ExternalizedCheckpoin

tCleanup.RETAIN_ON_CANCELLATION)

// set the checkpoint address

env.setStateBackend(new FsStateBackend("hdfs://cdh1:8020/flink-checkpoint/"))

  

   

Guess you like

Origin www.cnblogs.com/starzy/p/11439988.html