Flink states topics: keyed state and Operator state

        As we all know, flink is stateful computing. So learning flink must know that state.

        Recently the company has just one demand, to use flink state calculations, demand is such that the new data collection database.

        Sounds simple, right? At first I thought so, and now find that it is tantamount to a dynamic read Nima ah.

  Because the data is always increasing, you need to record the results of this collection, the last operation for the next, so use to calculate state.

  Ado, directly on dry goods.

  About what is stateful flink calculated, the answer is given official: the intermediate calculation result of the calculation is stored in the internal program flink, Function and supplied to the operator or to use the results.

Understand the definition, we enter the next topic.
  

  1. Status Type

    Flink in accordance with whether the data set according to the Key partition state and Operator State Keyde state into two types.

    (1)Keyed State

      And represents a key associated state, it can only be used on the data set corresponding to the type of KeyedStream Functions and Operators. Keyed State Operator State is a special case, except that according to the prior Keyed State

    

    key of the data set partitions, each corresponding to a combination of only a State Key and the Key Operator. Keyed State can be managed by a Key Groups, mainly used when the operator parallelism is changed, automatically redistributing

    

    Keyed State data. In the process of running the system types, a Keyed operator may run one or more instances Key Groups of keys.

    (2)Operator State

      Keyed State is different and, Operator State only examples and binding parallel operators, and key data elements regardless of species, the state of each part of the operator holds all of the data instance of the data elements. Operator State
 
    Support when change operator parallelism Auto Reallocation example status data.
 
    In the Flink, Keyed State and Operator State has two forms, the state managed and native state. (Two states have no long-winded any different, look headache)
 

  2.Managed Keyed State

    Flink has the following Managed Keyed State type can be used. ValueState [T], ListState [T], MapState [K, V].

    (1) Stateful Function definitions

    Followed by complete example to illustrate how to use ValueState in RichFlatmapFunction, the acquisition is completed a minimum of intervention data.

    

The env = StreamExecutionEnvironment StreamExecutionEnvironment .getExecutionEnvironment;
 // create a dataset element 
DataStream < int , Long > env.fromElements inputStream = ((2,21L), (4,1L), (5,4L )); 
inputStream.keyBy ( " . 1 ") .flatMap {
 // define and create RichFlatMapFunction, a first type of data bit input parameter, the second parameter data type bit output 
new new RichFlatMapFunction <the Map ( int , Long ), the Map ( int , the Map ( Long , Long )) > () {
   Private ValueState leastValueState = null ;   
  @Override 
  Open (the Configuration Parameters) {
     ValueStateDescriptor leastValueStateDescriptor =new ValueStateDescriptor ("leastValueState ",class.of(long));
   leastValueState = getRuntimeContext.getState(leastValueStateDescriptor );
    }

  @Override 
  flatMap(Collector collector,Tuple2(int,long) t){
  long leastValue =leastValueState .value();
  if(t.f1>leastValue){
    collector.collect(t,leastValue);
  }else{
    leastValueState.update(t.f1);
    collector.collect(t,leastValue);
  }
 }
 }
}

  3.Managed Operator State

  Operator State is a non-keyed state, the parallel operation of an actual operator is associated, for example, in Kafka Connector, each operator terminal Kafka consumption instance corresponds to a partition in Kafka, maintenance and partitioning Topic

Offsets offset as Operators Operator State. Flink may be implemented in or ListCheckpointed CheckpointedFunction two interfaces operate to define a function of Managed Operator State.

  (1) by the interface operation Operator State CheckpointedFunction

        CheckpointedFunction interface definition:

public  interface CheckpointedFunction {
 // trigger checkpoint call 
  void snapshotState (FunctionSnapshotContext context) throws Exception;  
 // every custom function initialization, call 
  void initializeState (FunctionInitializationContext context) throws Exception; 
}

  In each of the Managed Operator State operator are stored in List, Operator status data between the promoter and operator independent, more suitable for re-List storing distribution data, currently supports Managed Operator Flink

State redistribution of two strategies, namely, Even-split Redistribution and Union Redistribution.

  Number of elements can be accomplished statistical data on the input data and the number of elements in each of the key operator and by implementing FlatMapFunction CheckpointedFunction.

  CV respectively at keyedState initializeState () method and operator State two states, based on the stored state values ​​and associated Key state based on the value of operator.

Private  class CheckpointCount ( int numElements) the extends FlatMapFunction <the Map ( int , Long ), the Map ( int , the Map ( Long , Long ))> with CheckpointedFunction {
 // definition of local variables operator instance, the amount of data stored Operator 
Private  Long operatorCount = null ;
 // define keyedState, storage key and the associated state value 
Private ValueState keyedState = null ;
 // define operatorState, operator status value stored 
Private ListState operatorState = null ; 
@Override 
flatMap (Tuple (int , Long ) T, Collector Collector) {
 Long keyedCount okeyedState.value (+ 1'd) ;
 // update keyedState number 
keyedState.update (keyedCount);
 // updates the local operator operatorCount value 
operatorCount = operatorCount +. 1 ;
 // output , including id, id corresponding to the number of statistical keyedCount, the operator inputs the number of statistical data operatorCount 
collector.collect (t.f0, keyedCount, operatorCount); 

} 
// initialize state data 
@Override 
initializeState (FunctionInitializationContext context) { 
// defined and Get keyedState 
ValueStateDescriptor KeyedDescriptor = new new ValueStateDescriptor ( "keyedState" , createTypeInformation);
keyedState = 
operatorState.add ( operatorCount);context.getKeyedStateStore.getState (KeyedDescriptor);
 // directly obtaining operatorState 
ValueStateDescriptor OperatorDescriptor = new new ValueStateDescriptor ( "OperatorState" , createTypeInformation); 
operatorState = context.getOperatorStateStore.getListState ();
 // definition Restored process, the reply from operatorState logical data 
iF (context.isRestored) { 
  operatorCount = operatorState.get ()   
} 
// when a snapshot occurs, operatorCount added to the operatorState 
@Override 
snapshotState (FunctionSnapshotContext context) { 
operatorState.clear (); 
} 
} 
}

It can be seen from the above code, cleaned snapshotState () method out data stored in the last checkpoint operatorState, and then add this operator and update the checkpoint operatorCount required state variables. when

InitializeState method calls the restart, resume keyedState and OperatorState, which operatorCount data can be recovered from the latest operatorState in.

(2) defined by the interface Operator State ListCheckpointed

    ListCheckpointed interface and the interface to re-CheckpointedFunction flexibility compared to the relatively weak point, can only support the List type of state, and supports only even-redistribution policies when data recovery.

  Need to implement the following two methods to operate Operator State:

  

List<T> snapshotState(long checkpointId,long timestamp) throws Exception;
void restoreState(List<T> state) throws Exception;

  Wherein snapshotState method defined data elements stored in the List logical checkpoints, restoreState method defines the logical checkpoints from the recovery state.

class numberRecordsCount extends FlatMapFunction(Map(String,long),Map(String,long))with ListCheckpointed{
  private long numberRecords =0L;
@Override
flatMap(Tuple2(String,long)t,Collector collector){
//接入一条记录则进行统计,并输出
numberRecords +=1;
collector.collect(t.f0,numberRecords);
}  
@Override
snapshotState(long checkpointId){
  Collections.singletonList(numberRecords);
}
@Override
restoreState(List<long> list){
 numberRecords = 0L ;
 for (COUNT < List) {
  // recover data from numberRecords state 
numberRecords + = COUNT 
} 
} 
}

 

  Empty code is not contagious, so paving the way in front of me so much, I hope the next summary can be helpful to you.
  
  All of the above reference Zhang Libing summary "flink combat, summary and analysis" chapter. Attachment: scala text is written, because the company where the landlord with java, so all the code rewrite it again with java. If you have looked at inconvenient
 
Friends, you can see the original hehe.
 
  As flink new to the white paper talk about the basics of state. There may be follow-up and how to store the state flink state machine optimization.
 
  Welcome to the wing. 

Guess you like

Origin www.cnblogs.com/shaokai7878/p/11285893.html