state processor api

Not long ago, Flink community released FLink 1.9 version, in which contains a very important new feature, namely 
state processor api, this framework supports checkpoint and savepoint operation, including
read, change, write, and so on. Here we have a concrete example to illustrate how to use this framework.

1. First, we create a sample job to generate savepoint
Main Class Code
 1 final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
 2         env.enableCheckpointing(60*1000);
 3         DataStream<Tuple2<Integer,Integer>> kafkaDataStream =
 4                 env.addSource(new SourceFunction<Tuple2<Integer,Integer>>() {
 5             private boolean running = true;
 6             private int key;
 7             private int value;
 8             private Random random = new Random();
 9             @Override
10             public void run(SourceContext<Tuple2<Integer,Integer>> sourceContext) throws Exception {
11                 while (running){
12                     key = random.nextInt(5);
13                     sourceContext.collect(new Tuple2<>(key,value++) );
14                     Thread.sleep(100);
15                 }
16             }
17 
18             @Override
19             public void cancel() {
20                 running = false;
21             }
22         }).name("source").uid("source");
23 
24 
25         kafkaDataStream
26                 .keyBy(tuple -> tuple.f0)
27                 .map(new StateTest.StateMap()).name("map").uid("map")
28                 .print().name("print").uid("print");
In the above code, only need to pay attention in a custom source, a message transmitted tuple2 while doing savepoint is 
critical that the state, the state in StateMap this class, as follows:
 1 public static class StateMap extends RichMapFunction<Tuple2<Integer,Integer>,String>  {
 2         private transient ListState<Integer> listState;
 3 
 4         @Override
 5         public void open(Configuration parameters) throws Exception {
 6             ListStateDescriptor<Integer> lsd =
 7                     new ListStateDescriptor<>("list",TypeInformation.of(Integer.class));
 8             listState = getRuntimeContext().getListState(lsd);
 9         }
10 
11         @Override
12         public String map(Tuple2<Integer,Integer> value) throws Exception {
13             listState.add(value.f1);
14             return value.f0+"-"+value.f1;
15         }
16 
17         @Override
18         public void close() throws Exception {
19             listState.clear();
20         }
21     }
Map above in the first statement in a ListState open, and then the message processing logic, but also very simple to put the value of the tuple2 
listState in. And submit the job, the job runs after a period of time, triggering a savepoint, and record the address savepoint. This completes the
data preparation state processor api verification work.

2. Use savepoint state processor api read
this step is simply to verify the savepoint can be correctly read, as follows:
 1 public class ReadListState {
 2     protected static final Logger logger = LoggerFactory.getLogger(ReadListState.class);
 3 
 4     public static void main(String[] args) throws Exception {
 5         final String operatorUid = "map";
 6         final String savepointPath =
 7                 "hdfs://xxx/savepoint-41b05d-d517cafb61ba";
 8 
 9         final String checkpointPath = "hdfs://xxx/checkpoints";
10 
11         // set up the batch execution environment
12         final ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
13 
14         RocksDBStateBackend db = new RocksDBStateBackend(checkpointPath);
15         DataSet<String> dataSet = Savepoint
16                 .load(env, savepointPath, db)
17                 .readKeyedState(operatorUid, new ReaderFunction())
18                 .flatMap(new FlatMapFunction<KeyedListState, String>() {
19                     @Override
20                     public void flatMap(KeyedListState keyedListState, Collector<String> collector) throws Exception {
21                         keyedListState.value.forEach(new Consumer<Integer>() {
22                             @Override
23                             public void accept(Integer integer) {
24                                 collector.collect(keyedListState.key + "-" + integer);
25                             }
26                         });
27                     }
28                 });
29 
30         dataSet.writeAsText("hdfs://xxx/test/savepoint/bravo");
31 
32         // execute program
33         env.execute("read the list state");
34     }
35 
36     static class KeyedListState {
37         Integer key;
38         List<Integer> value;
39     }
40 
41     static class ReaderFunction extends KeyedStateReaderFunction<Integer, KeyedListState> {
42         private transient ListState<Integer> listState;
43 
44         @Override
45         public void open(Configuration parameters) {
46             ListStateDescriptor<Integer> lsd =
47                     new ListStateDescriptor<>("list", TypeInformation.of(Integer.class));
48             listState = getRuntimeContext().getListState(lsd);
49         }
50 
51         @Override
52         public void readKey(
53                 Integer key,
54                 Context ctx,
55                 Collector<KeyedListState> out) throws Exception {
56             List<Integer> li = new ArrayList<>();
57             listState.get().forEach(new Consumer<Integer>() {
58                 @Override
59                 public void accept(Integer integer) {
60                     li.add(integer);
61                 }
62             });
63 
64             KeyedListState kl = new KeyedListState();
65             kl.key = key;
66             kl.value = li;
67 
68             out.collect(kl);
69         }
70     }
71 }
After reading in a state where savepoint successfully move it a document, the following parts of the file, the contents of each row are key-value pairs:

3. Using state processor api rewriting savepoint 
savepoint is cured to an operating point of the state program to facilitate the procedures for re-submitted at the time of connection, but sometimes need to savepoint in a state
rewritten to facilitate the specific state to start the job.
 1 public class ReorganizeListState {
 2     protected static final Logger logger = LoggerFactory.getLogger(ReorganizeListState.class);
 3     public static void main(String[] args) throws Exception {
 4         final String operatorUid = "map";
 5         final String savepointPath =
 6                 "hdfs://xxx/savepoint-41b05d-d517cafb61ba";
 7 
 8         final String checkpointPath = "hdfs://xxx/checkpoints";
 9 
10         // set up the batch execution environment
11         final ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
12 
13         RocksDBStateBackend db = new RocksDBStateBackend(checkpointPath);
14         DataSet<KeyedListState> dataSet = Savepoint
15                 .load(env,savepointPath,db)
16                 .readKeyedState(operatorUid,new ReaderFunction())
17                 .flatMap(new FlatMapFunction<KeyedListState, KeyedListState>() {
18                     @Override
19                     public void flatMap(KeyedListState keyedListState, Collector<KeyedListState> collector) throws Exception {
20                         KeyedListState newState = new KeyedListState();
21                         newState.value = keyedListState.value.stream()
22                         .map( x -> x+10000).collect(Collectors.toList());
23                         newState.key = keyedListState.key;
24                         collector.collect(newState);
25                     }
26                 });
27         
28         BootstrapTransformation<KeyedListState> transformation = OperatorTransformation
29                 .bootstrapWith(dataSet)
30                 .keyBy(acc -> acc.key)
31                 .transform(new KeyedListStateBootstrapper());
32 
33         Savepoint.create(db,128)
34                 .withOperator(operatorUid,transformation)
35                 .write("hdfs://xxx/test/savepoint/");
36 
37         // execute program
38         env.execute("read the list state");
39     }
40 
41     static class KeyedListState{
42         Integer key;
43         List<Integer> value;
44     }
45 
46     static class ReaderFunction extends KeyedStateReaderFunction<Integer, KeyedListState> {
47         private transient ListState<Integer> listState;
48 
49         @Override
50         public void open(Configuration parameters) {
51             ListStateDescriptor<Integer> lsd =
52                     new ListStateDescriptor<>("list",TypeInformation.of(Integer.class));
53             listState = getRuntimeContext().getListState(lsd);
54         }
55 
56         @Override
57         public void readKey(
58                 Integer key,
59                 Context ctx,
60                 Collector<KeyedListState> out) throws Exception {
61             List<Integer> li = new ArrayList<>();
62             listState.get().forEach(new Consumer<Integer>() {
63                 @Override
64                 public void accept(Integer integer) {
65                     li.add(integer);
66                 }
67             });
68 
69             KeyedListState kl = new KeyedListState();
70             kl.key = key;
71             kl.value = li;
72 
73             out.collect(kl);
74         }
75     }
76 
77     static class KeyedListStateBootstrapper extends KeyedStateBootstrapFunction<Integer, KeyedListState> {
78         private transient ListState<Integer> listState;
79 
80         @Override
81         public void open(Configuration parameters) {
82             ListStateDescriptor<Integer> lsd =
83                     new ListStateDescriptor<>("list",TypeInformation.of(Integer.class));
84             listState = getRuntimeContext().getListState(lsd);
85         }
86 
87         @Override
88         public void processElement(KeyedListState value, Context ctx) throws Exception {
89             listState.addAll(value.value);
90         }
91     }
92 }
The key here is that according to the read out step dataSet, in the process of converting all of the accumulated value 10000, and then to construct a BootstrapTransformation dataSet this as input, and creates an empty that the savepoint, 
and the status of the specified operatorUid written as a savepoint, written final success, got a new savepoint, this new savepoint contain
state in value compared to the original value has changed.

4. Verify that the new production savepoint is available
due to the verification of the state is ListState, in other words, it is KeyedState, and KeyedState belongs Flink hosted state, meaning Flink own
control logic preservation and restoration of the state, so in order to verify the correct operation the start of the new savepoint, StateMap before rewrite of the following:
 1 public static class StateMap extends RichMapFunction<Tuple2<Integer,Integer>,String>  {
 2         private transient ListState<Integer> listState;
 3 
 4         @Override
 5         public void open(Configuration parameters) throws Exception {
 6             ListStateDescriptor<Integer> lsd =
 7                     new ListStateDescriptor<>("list",TypeInformation.of(Integer.class));
 8             listState = getRuntimeContext().getListState(lsd);
 9         }
10 
11         @Override
12         public String map(Tuple2<Integer,Integer> value) throws Exception {
13             listState.add(value.f1);
14             log.info("get value:{}-{}",value.f0,value.f1);
15             StringBuilder sb = new StringBuilder();
16             listState.get().forEach(new Consumer<Integer>() {
17                 @Override
18                 public void accept(Integer integer) {
19                     sb.append(integer).append(";");
20                 }
21             });
22             log.info("***********************taskNameAndSubTask:{},restored value:{}"
23                     ,getRuntimeContext().getTaskNameWithSubtasks(),sb.toString());
24             return value.f0+"-"+value.f1;
25         }
26 
27         @Override
28         public void close() throws Exception {
29             listState.clear();
30         }
31     }
Unable to get data corresponding recovery immediately after the state of recovery, after news reached here at a time when the output of the next state in the content, alternative and see 
if the recovery is successful, the results are as follows:

Facie the above figure can be compared to the output of key 4, can be seen that the modified value is the value of the output of the verification is successful.
5. Conclusion 
Flink into the state KeyedState, OperatorState and BroadcastState, the state processor api are provided corresponding processing interfaces.
In addition, for keyedState, the degree of parallelism if the job has changed how? If Key changed how? We need to be further explored.

See the official document:
https://flink.apache.org/feature/2019/09/13/state-processor-api.html
https://ci.apache.org/projects/flink/flink-docs-release-1.9 /dev/libs/state_processor_api.html


Guess you like

Origin www.cnblogs.com/029zz010buct/p/11900302.html