Flink's checkpoint and savepoint, after Flink calculation is released, can the calculation operator be modified? (Refers to state restoration)

1、chekpoint、savepoint

CheckPoint is a mechanism for Flink to achieve fault tolerance. The system will automatically back up the calculation state of the program on a regular basis according to the configured checkpoint. Once the program fails during the calculation process, the system will select a nearest checkpoint for failure recovery.

SavePoint is an effective means of operation and maintenance, requiring the user to manually trigger the program to perform state backup, essentially doing CheckPoint.

./bin/flink cancel -m centos:8081 -s hdfs:///savepoints f21795e74312eb06fbf0d48cb8d90489

The prerequisites for achieving failure recovery:

  • Persistent data source, which can replay records within a certain period of time (for example, FlinkKafkaConsumer)
  • Permanent storage of state, usually a distributed file system (for example, HDFS)

The code configuration is as follows. Regarding different job tasks, the calculation complexity and resource usage are different. Therefore, there is no optimal solution for this parameter configuration. The debugging should be carried out according to the performance of the server and the actual situation of the task.

Principle : Because checkpoint will sacrifice the real-time performance of calculation (barrier implementation), it is not recommended that the time interval be too short while ensuring the success of program checkpoint.

var env=StreamExecutionEnvironment.getExecutionEnvironment
//启动检查点机制
env.enableCheckpointing(5000,CheckpointingMode.EXACTLY_ONCE)
//配置checkpoint必须在2s内完成一次checkpoint,否则检查点终止
env.getCheckpointConfig.setCheckpointTimeout(2000)
//设置checkpoint之间时间间隔 <=  Checkpoint interval。忽略检查点的时间间隔
env.getCheckpointConfig.setMinPauseBetweenCheckpoints(5)
//配置checkpoint并行度,不配置默认1
env.getCheckpointConfig.setMaxConcurrentCheckpoints(1)
//一旦检查点不能正常运行,Task也将终止
env.getCheckpointConfig.setFailOnCheckpointingErrors(true)
//将检查点存储外围系统 filesystem、rocksdb,可以配置在cancel任务时候,系统是否保留checkpoint
env.getCheckpointConfig.enableExternalizedCheckpoints(ExternalizedCheckpointCleanup.RETAIN_ON_CANCELLATION)

2. Can the calculation operator be modified after Flink calculation is released? (Refers to state restoration)

First of all, this is not allowed in Spark (the blogger will share knowledge about spark later), because Spark will persist code fragments. Once the code is modified, Checkpoint must be deleted, but Flink only stores the calculation state of each operator If the user modifies the code, the user needs to specify the uid attribute on the stateful operation operator.

env.addSource(new FlinkKafkaConsumer[String]("topic01",new SimpleStringSchema(),props))
.uid("kakfa-consumer")
.flatMap(line => line.split("\\s+"))
.map((_,1))
.keyBy(0) //只可以写一个参数
.sum(1)
.uid("word-count") //唯一即可
.map(t=>t._1+"->"+t._2)
.print()

Guess you like

Origin blog.csdn.net/qq_44962429/article/details/112912144