Flink fault tolerance and state

Brief introduction

  • Apache Flink provides a fault-tolerant mechanism, we can continue to restore the state data streaming applications.
  • This mechanism ensures that even if the failure, after recovery, the state of the program will return to its previous state.
  • Flink presided over at least once and exactly once semantics semantics
  • Flink is achieved by regularly doing checkpoint recovery and fault tolerance, fault tolerance mechanisms continue to generate a snapshot of the data stream, it will not have much impact on performance.
  • State is stored in streaming applications where a configurable (e.g., major nodes or HDFS)
  • If the car program failure (due to the machine, network, or software failure), Flink will stop the flow of distributed data stream.
  • Then restart the system operator, and set it to a number of the most recent checkpoint.
  • important point:
    • By default, disabled checkpoint
    • To make the normal operation of fault tolerance, data flow source need to be able to flow back down before the specified point.
    • This method has for example the Apache Kafka, flink Kafka and the connector may be utilized to reset kafka topic offset to achieve the purpose of the re-read data.
    • Because Flink checkpoint implemented by distributed snapshots, following a "checkpoint" and a "snapshot" is the same meaning.

CheckPoint (checkpoint)

  • Flink fault tolerance of the core of the distributed data stream to generate a consistent snapshot of the state and operator.
  • These snapshots serve as checkpoints, when the system can be rolled back early failure.
  • Distributed Snapshots are implemented by the Chandy-Lamport algorithm.
  • Barriers (fencing)

restore

  • Flink recovery mechanisms in case of very straightforward: when the system fails, select Flink recently completed checkpoint k, the next re-deployment of the entire system, a data flow graph, then the state corresponding to each check Operator point k.
  • Source data were read from the set Sk of the data stream.
  • For example, when Apache Kafka perform the recovery, the system will notify consumers begin to get data from cheaper Sk.

prerequisites

  • Flink's checkpoint mechanism in general, it requires:
    • Continuous data source
      • Such as message queues (Apache Kafka, RabbitMQ) or a file system (e.g., HDFS, Amazon S3, GFS, NFS, Ceph ......).
    • Persistent state storage
      • Usually distributed file system (HDFS, Amazon S3, GFS, ...)

Enable and Configure checkpoint

  • By default, flink disabled checkpoint.

  • Open checkpoint by: calling env.enableCheckpointing (n), where N is the checkpoint interval ms units.

  • Checkpoint related parameters:

State Backends (state backpressure)

  • Then the following calculation flow scenarios need to preserve state:

    • Window operation
    • Function operation using KV
    • Inherited the function of CheckpointFunction
  • When the checkpoint (checkpoint) mechanism was started, the state will persist at checkpoints to deal with data loss and recovery.

  • The state represented internally how, how the state is persisted to the checkpoint and the persistence to go depends on the selected State Backend.

  • Flink in the state of preservation, supports three storage:

    • MemoryStateBackend (memory state backpressure)
    • FsStateBackend (file state backpressure)
    • RocksDBStateBackend (RocksDB state backpressure)
  • If anything else is not configured, the system will use the default MemoryStateBackend.

  • MemoryStateBackend

    • Such a storage policy data stored in the java pile, such as: kv window manipulation or state hash table to store the value and the like.

    • When carried out checkpoints, this strategy will make a snapshot of the state, and then send a snapshot to JobManager as part of a checkpoint in, JM also save on the heap.

    • Memory StateBackend can use asynchronous manner snapshot, the government has also encouraged the use of asynchronous way, to avoid blocking, the default is now asynchronous.

    • important point:

      • Asynchronous snapshot mode, operator operator will also do snapshot processing new data flows, the default asynchronous
      • Synchronous snapshots: operator operator to do a snapshot, it does not deal with the inflow of new data, synchronous snapshots will increase the latency of data processing.
    • If you do not want to asynchronous, false can be passed at construction time, as follows:

      new MemoryStateBackend(MAX_MEM_STATE_SIZE, false);
    • This policy limits:

      • The default maximum size of a single state is limited to 5MB, this value can be changed by the constructor.
      • No matter the size of the largest single state is limited to how much, are not available through the large frame size of akka.
      • JM polymerized state are written in the memory.
    • The appropriate scene:

      • Local development and debugging
      • Less job status
  • FsStateBackend

    • URL is set by the file system, as follows:

      • hdfs://namenode:40010/flink/checkpoints
      • file:///data/flink/checkpoints
    • When selecting FsStateBackend, it will initially saved in the Task Manager (Task Manager) of memory.

      • As checkpointing when the state will be written to the snapshot file, saved in the file system.

      • A small amount of metadata is saved in JM's memory.

      • By default, FsStateBackend asynchronous configured to provide a snapshot at the time of the write state in order to avoid blocking the checkpoint processing pipeline (processing pipeline).

      • This feature can be disabled by the constructor corresponding boolean flag to false

        new FsStateBackend(path, false);
      • Applicable scene:

        • State is relatively large, long windows, large state KV
        • HA scenarios needs to be done
  • RockDBStateBackend

    • URL is set by the file system, such as:

      • hdfs://namenode:40010/flink/checkpoints
      • file:///data/flink/checkpoints
    • This way kv state needs to be managed by rockdb database, which is memory or file backend and the biggest difference.

    • RocksDBStateBackend use RocksDB database stores data, stored in the database TaskManager data directory.

    • Note: RocksDB, it is a high-performance database Key-Value. Data will be put before the memory of them, under certain conditions, triggers written to disk file

    • When checkpoint, the data is a snapshot of the entire RocksDB database, and then save the configuration to a file system (usually hdfs).

    • Meanwhile, Apache Flink some minimum metadata stored in the memory or the JobManager Zookeeper (for the case of high availability).

    • RocksDB default configured to perform asynchronous snapshot

    • To suit the scene:

      • RocksDBStateBackend is the only state available to support streaming applications have incremental checkpoint.
      • Note: incremental checkpoint means that when you save a snapshot, a snapshot of the data in the data long enough to save the difference.
      • RocksDBStateBackend way to be able to hold state depends only on how much disk size can be used.
      • Compared FsStateBackend state stored in memory, which allows the use of very large state.
      • But it also means that the strategy of throughput will be limited.
    • Code:

      // 默认使用内存的方式存储状态值, 单词快照的状态上限为10MB, 使用同步方式进行快照。
      env.setStateBackend(new MemeoryStateBackend(10*1024*1024, false));
      
      // 使用 FsStateBackend的方式进行存储, 并且是同步方式进行快照
      env.setStateBackend(new FsStateBackend("hdfs://namenode....", false));
      
      try{
          // 使用 RocksDBStateBackend方式存储, 并采用增量的快照方式进行存储。
          env.setStateBackend(new RocksDBStateBackend("hdfs://namenode....", true));
      } catch(IOException e){
          e.printStackTrace();
      }

Checkpoint use

  • During operation program repeats every env.enableCheckpointing (5000) time, generates a checkpoint snapshot point.

  • When a checkpoint hdfs to store state data of the snapshot point,

  • If the program fails, when we restart the program, you can specify a snapshot point from which to recover.

    flink-1.9.1/bin/flink run -s hdfs://ronnie01:8020/data/flink-checkpoint/xxxxxxxxxxxxxxx(哈希码)/chk-xxx/ metadata -c com.ronnie.flink.test.checkPointTest flink-test.jar
  • Code:

    package com.ronnie.flink.stream.test;
    
    import org.apache.flink.api.common.functions.FlatMapFunction;
    import org.apache.flink.api.common.restartstrategy.RestartStrategies;
    import org.apache.flink.api.java.tuple.Tuple;
    import org.apache.flink.api.java.tuple.Tuple2;
    import org.apache.flink.contrib.streaming.state.RocksDBStateBackend;
    import org.apache.flink.runtime.state.filesystem.FsStateBackend;
    import org.apache.flink.runtime.state.memory.MemoryStateBackend;
    import org.apache.flink.streaming.api.CheckpointingMode;
    import org.apache.flink.streaming.api.datastream.DataStreamSource;
    import org.apache.flink.streaming.api.datastream.KeyedStream;
    import org.apache.flink.streaming.api.datastream.SingleOutputStreamOperator;
    import org.apache.flink.streaming.api.environment.CheckpointConfig;
    import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
    import org.apache.flink.util.Collector;
    
    import java.io.IOException;
    
    public class CheckPointTest {
    
        public static void main(String[] args) {
            StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
    
            // 启动checkpoint, 并且设置多久进行一次checkpoint, 即两次checkpoint的时间间隔
            env.enableCheckpointing(5000);
    
            env.setParallelism(1);
    
            CheckpointConfig checkpointConfig = env.getCheckpointConfig();
    
            env.setRestartStrategy(RestartStrategies.fallBackRestart());
    
            /* 设置 checkpoint 语义, 一般使用 exactly_once 语义。
             at_least_once 一般在那里非常低的延迟场景使用。*/
            checkpointConfig.setCheckpointingMode(CheckpointingMode.EXACTLY_ONCE);
    
            /*
                设置检查点之间的2最短时间
                检查点之间的最短时间: 为确保流应用程序在检查点之间取得一些进展, 可以定义检查点之间需要经过多长时间。
                如果将此值设置为例如500, 则无论检查点持续时间和检查点间隔如何, 下一个检查点将在上一个检查点完成后的500ms内启动
                请注意, 这意味检查点间隔永远不会小于此参数。
             */
            checkpointConfig.setMinPauseBetweenCheckpoints(500);
    
            // 设置超时时间, 若本次checkpoint时间超时, 则放弃本次checkpoint操作
            checkpointConfig.setCheckpointTimeout(60000);
    
            /*
                同一时间最多可以进行多少个checkpoint
                默认情况下, 当一个检查点仍处于运行状态时, 系统不会触发另一个检查点
             */
            checkpointConfig.setMaxConcurrentCheckpoints(1);
    
            /*开启checkpoints的外部持久化,但是在job失败的时候不会自动清理,需要自己手工清理state
              DELETE_ON_CANCELLATION:在job canceled的时候会自动删除外部的状态数据,但是如果是FAILED的状态则会保留;
              RETAIN_ON_CANCELLATION:在job canceled的时候会保留状态数据
             */
            checkpointConfig.enableExternalizedCheckpoints(CheckpointConfig.ExternalizedCheckpointCleanup.RETAIN_ON_CANCELLATION);
    
            // 默认使用内存的方式存储状态值。单次快照的状态上限内存为10MB, 使用同步方式进行快照。
            env.setStateBackend(new MemoryStateBackend(10*1024*1024, false));
    
            // 使用 FsStateBackend的方式进行存储, 并且是同步方式进行快照
            env.setStateBackend(new FsStateBackend("hdfs://ronnie01:8020/data/flink-checkpoint",false));
    
            try {
                env.setStateBackend(new RocksDBStateBackend("hdfs://ronnie:8020/data/flink-checkpoint", true));
            } catch (IOException e) {
                e.printStackTrace();
            }
    //        DataStreamSource<String> dataStreamSource = env.socketTextStream("ronnie01",9999);
    //
    //        SingleOutputStreamOperator<Tuple2<String, Integer>> pairStream = dataStreamSource.flatMap(new FlatMapFunction<String, Tuple2<String, Integer>>() {
    //            @Override
    //            public void flatMap(String value, Collector<Tuple2<String, Integer>> out) throws Exception {
    //                String[] split = value.split(" ");
    //                for (String word : split) {
    //                    System.out.println("--------- lala --------");
    //                    out.collect(new Tuple2<String, Integer>(word, 1));
    //                }
    //            }
    //        });
    //
    //
    //        KeyedStream<Tuple2<String, Integer>, Tuple> keyedStream = pairStream.keyBy(0);
    //
    //
    //        SingleOutputStreamOperator<Tuple2<String, Integer>> sum = keyedStream.sum(1);
    //
    //
    //        sum.print();
    //
    //
    //        try {
    //            //转换算子都是懒执行的,最后要显示调用 执行程序,
    //            env.execute("checkpoint-test");
    //        } catch (Exception e) {
    //            e.printStackTrace();
    //        }
    
        }
    }
    

Savepoint (save point)

  • Flink the Savepoints Checkpoints differs from that of a conventional backup and recovery logs of different database systems.

  • The main purpose of the checkpoint is to provide a recovery mechanism when the job fails unexpectedly.

  • Checkpoint's life cycle, that is created by Flink Flink management, own and publish Checkpoint - without user interaction.

  • As a method of restoring and regular trigger, Checkpoint main design goals are:

    • Create a checkpoint, lightweight
    • Recovery as quickly as possible
  • On the contrary, Savepoints created by the user, with or removed.

  • They are generally planned for manual backup and recovery.

  • For example, when Flink version needs to be updated, or change your stream processing logic, change the parallelism and so on.

  • In this case, we tend to shut it flow, which requires us to the state of the stream is stored, will be back in time to re-deploy the job.

  • Conceptually, Savepoints generation and recovery costs may be higher, and more attention to support portability and operating changes to previously mentioned.

  • use:

    • command:

      flink savepoint jobID target_directory
    • Save the state of the current stream to the specified directory:

      bin/flink savepoint xxxxxxxx(哈希码) hdfs://ronnie01:8020/data/flink/savepoint
    • Restart, recover the data stream:

      flink-1.9.1/bin/flink run -s hdfs://ronnie01:8020/data/flink/savepoint/savepoint-xxxxx-xxxxxxxxx -c com.ronnie.flink.stream.test.CheckPointTest flink-test.jar

Guess you like

Origin www.cnblogs.com/ronnieyuan/p/11852116.html