版权声明:本文为博主原创文章,遵循 CC 4.0 BY-SA 版权协议,转载请附上原文出处链接和本声明。
基本概念
- Flink 中很重要的一点就是能够保持任务的状态,这样任务失败了,会根据上一次的结果重新计算,保证数据只被计算一次
实现原理,就是每隔一点时间对计算状态作一个镜像,这个状态点称为Checkpoints - savepoint 主要是人为的触发,保存这个任务的计算状态 ,两个作用都是一样的
代码实践
/**
* Created by shuiyu lei
* date 2019/6/21
*/
public class TestRock {
public static void main(String[] args) {
StreamExecutionEnvironment en = StreamExecutionEnvironment.getExecutionEnvironment();
en.setStreamTimeCharacteristic(TimeCharacteristic.IngestionTime);
en.enableCheckpointing(5000);
RocksDBStateBackend rock = null;
try {
// rock作state 路径为HDFS 增量作state
rock = new RocksDBStateBackend("hdfs://centos-6:8020/flink/ch/", true);
} catch (IOException e) {
e.printStackTrace();
}
en.setStateBackend((StateBackend) rock);
// 自己封装的
KafkaUtil util = new KafkaUtil();
FlinkKafkaConsumer011 consumer = util.getConsumer("dsf", "te");
en.addSource(consumer).flatMap(new Tokenizer())
.keyBy(0)
.window(TumblingEventTimeWindows.of(Time.seconds(10)))
.sum(1)
.print();
try {
en.execute("print dwf log");
} catch (Exception e) {
e.printStackTrace();
}
}
/**
* Implements the string tokenizer that splits sentences into words as a user-defined
* FlatMapFunction. The function takes a line (String) and splits it into
* multiple pairs in the form of "(word,1)" ({@code Tuple2<String, Integer>}).
*/
public static final class Tokenizer implements FlatMapFunction<String, Tuple2<String, Integer>> {
@Override
public void flatMap(String value, Collector<Tuple2<String, Integer>> out) {
// normalize and split the line
String[] tokens = value.toLowerCase().split("\\W+");
// emit the pairs
for (String token : tokens) {
if (token.length() > 0) {
out.collect(new Tuple2<>(token, 1));
}
}
}
}
}
测试恢复
- 启动
bin/flink run -m yarn-cluster xx.jar
- 查看检查点,7c50是任务ID,可以从Flink web得到,ch-no no会一直增加,发现state已经写到这里了
- 查看任务状态
- kill 掉任务
yarn applicationn -kill appid
- 重启任务,可以观察任务从上次状态恢复过来了
bin/flink run -m yarn-cluster -s hdfs://centos-6:8020/flink/89f98ed4e6b2b163bc6be6cd750f7c50/chk-47 ./exec-jar/flink-test-1.0-SNAPSHOT.jar
- 触发保存点
bin/flink savepoint 4b2e0be6e97b2e6dca866d6486d3ca0f hdfs://centos-6:8020/flink/save -yid application_1562553530220_0018
- 查看保存点
- 从保存点恢复
bin/flink run -m yarn-cluster -s hdfs://centos-6:8020/flink/save/savepoint-4b2e0b-7b0a742460e7 xx.jar
总结
- 保存点得人为触发,只要用在人为的做任务镜像,比如系统升级什么的
- 检查点,自动触发
- 保存点只能全量,检查点支持增量