Recently the company has just one demand, to use flink state calculations, demand is such that the new data collection database.
Sounds simple, right? At first I thought so, and now find that it is tantamount to a dynamic read Nima ah.
Because the data is always increasing, you need to record the results of this collection, the last operation for the next, so use to calculate state.
Ado, directly on dry goods.
About what is stateful flink calculated, the answer is given official: the intermediate calculation result of the calculation is stored in the internal program flink, Function and supplied to the operator or to use the results.
Understand the definition, we enter the next topic.
1. Status Type
Flink in accordance with whether the data set according to the Key partition state and Operator State Keyde state into two types.
(1)Keyed State
And represents a key associated state, it can only be used on the data set corresponding to the type of KeyedStream Functions and Operators. Keyed State Operator State is a special case, except that according to the prior Keyed State
key of the data set partitions, each corresponding to a combination of only a State Key and the Key Operator. Keyed State can be managed by a Key Groups, mainly used when the operator parallelism is changed, automatically redistributing
Keyed State data. In the process of running the system types, a Keyed operator may run one or more instances Key Groups of keys.
(2)Operator State
2.Managed Keyed State
(1) Stateful Function definitions
Followed by complete example to illustrate how to use ValueState in RichFlatmapFunction, the acquisition is completed a minimum of intervention data.
The env = StreamExecutionEnvironment StreamExecutionEnvironment .getExecutionEnvironment; // create a dataset element DataStream < int , Long > env.fromElements inputStream = ((2,21L), (4,1L), (5,4L )); inputStream.keyBy ( " . 1 ") .flatMap { // define and create RichFlatMapFunction, a first type of data bit input parameter, the second parameter data type bit output new new RichFlatMapFunction <the Map ( int , Long ), the Map ( int , the Map ( Long , Long )) > () { Private ValueState leastValueState = null ; @Override Open (the Configuration Parameters) { ValueStateDescriptor leastValueStateDescriptor =new ValueStateDescriptor ("leastValueState ",class.of(long)); leastValueState = getRuntimeContext.getState(leastValueStateDescriptor ); } @Override flatMap(Collector collector,Tuple2(int,long) t){ long leastValue =leastValueState .value(); if(t.f1>leastValue){ collector.collect(t,leastValue); }else{ leastValueState.update(t.f1); collector.collect(t,leastValue); } } }
}
3.Managed Operator State
Operator State is a non-keyed state, the parallel operation of an actual operator is associated, for example, in Kafka Connector, each operator terminal Kafka consumption instance corresponds to a partition in Kafka, maintenance and partitioning Topic
Offsets offset as Operators Operator State. Flink may be implemented in or ListCheckpointed CheckpointedFunction two interfaces operate to define a function of Managed Operator State.
(1) by the interface operation Operator State CheckpointedFunction
CheckpointedFunction interface definition:
public interface CheckpointedFunction { // trigger checkpoint call void snapshotState (FunctionSnapshotContext context) throws Exception; // every custom function initialization, call void initializeState (FunctionInitializationContext context) throws Exception; }
In each of the Managed Operator State operator are stored in List, Operator status data between the promoter and operator independent, more suitable for re-List storing distribution data, currently supports Managed Operator Flink
State redistribution of two strategies, namely, Even-split Redistribution and Union Redistribution.
Number of elements can be accomplished statistical data on the input data and the number of elements in each of the key operator and by implementing FlatMapFunction CheckpointedFunction.
CV respectively at keyedState initializeState () method and operator State two states, based on the stored state values and associated Key state based on the value of operator.
Private class CheckpointCount ( int numElements) the extends FlatMapFunction <the Map ( int , Long ), the Map ( int , the Map ( Long , Long ))> with CheckpointedFunction { // definition of local variables operator instance, the amount of data stored Operator Private Long operatorCount = null ; // define keyedState, storage key and the associated state value Private ValueState keyedState = null ; // define operatorState, operator status value stored Private ListState operatorState = null ; @Override flatMap (Tuple (int , Long ) T, Collector Collector) { Long keyedCount okeyedState.value (+ 1'd) ; // update keyedState number keyedState.update (keyedCount); // updates the local operator operatorCount value operatorCount = operatorCount +. 1 ; // output , including id, id corresponding to the number of statistical keyedCount, the operator inputs the number of statistical data operatorCount collector.collect (t.f0, keyedCount, operatorCount); } // initialize state data @Override initializeState (FunctionInitializationContext context) { // defined and Get keyedState ValueStateDescriptor KeyedDescriptor = new new ValueStateDescriptor ( "keyedState" , createTypeInformation); keyedState = operatorState.add ( operatorCount);context.getKeyedStateStore.getState (KeyedDescriptor); // directly obtaining operatorState ValueStateDescriptor OperatorDescriptor = new new ValueStateDescriptor ( "OperatorState" , createTypeInformation); operatorState = context.getOperatorStateStore.getListState (); // definition Restored process, the reply from operatorState logical data iF (context.isRestored) { operatorCount = operatorState.get () } // when a snapshot occurs, operatorCount added to the operatorState @Override snapshotState (FunctionSnapshotContext context) { operatorState.clear (); } } }
It can be seen from the above code, cleaned snapshotState () method out data stored in the last checkpoint operatorState, and then add this operator and update the checkpoint operatorCount required state variables. when
InitializeState method calls the restart, resume keyedState and OperatorState, which operatorCount data can be recovered from the latest operatorState in.
(2) defined by the interface Operator State ListCheckpointed
ListCheckpointed interface and the interface to re-CheckpointedFunction flexibility compared to the relatively weak point, can only support the List type of state, and supports only even-redistribution policies when data recovery.
Need to implement the following two methods to operate Operator State:
List<T> snapshotState(long checkpointId,long timestamp) throws Exception; void restoreState(List<T> state) throws Exception;
Wherein snapshotState method defined data elements stored in the List logical checkpoints, restoreState method defines the logical checkpoints from the recovery state.
class numberRecordsCount extends FlatMapFunction(Map(String,long),Map(String,long))with ListCheckpointed{ private long numberRecords =0L; @Override flatMap(Tuple2(String,long)t,Collector collector){ //接入一条记录则进行统计,并输出 numberRecords +=1; collector.collect(t.f0,numberRecords); } @Override snapshotState(long checkpointId){ Collections.singletonList(numberRecords); } @Override restoreState(List<long> list){ numberRecords = 0L ; for (COUNT < List) { // recover data from numberRecords state numberRecords + = COUNT } } }