Flink Detailed Explanation (2): Core Part II
22. You mentioned State just now, so let’s briefly talk about what State is.
In Flink, the state is called state
and is used to store intermediate calculation results or cache data. According to whether the state needs to save intermediate results, it can be divided into stateless computing and stateful computing .
- For stream computing, events are continuously generated. If each calculation is independent of each other and does not depend on upstream and downstream events, the same input can get the same output, which is stateless computing.
- If a computation needs to depend on previous or subsequent events, it is called a stateful computation.
Stateful calculations such as sum
summation, data accumulation, etc.
23. What are the states of Flink?
(1) According to whether it is managed by the user or managed by Flink , the state can be divided into the original state and the managed state .
- Raw state (
Raw State
): self-managed by the user. - Managed State (
Managed State
): State managed by Flink itself.
The difference between the two :
- In terms of state management , Managed State is managed by Flink Runtime, automatically stored and restored, and optimized for memory management; while Raw State needs to be managed by the user and serialized by himself. Flink does not know whether the data stored in the State is What structure, only the user knows, needs to be finally serialized into a storable data structure.
- In terms of state data structure , Managed State supports known data structures, such as
Value
,List
,Map
and so on. Raw State only supports byte arrays, and all states must be converted to binary byte arrays. - From the recommended usage scenarios , Managed State can be used in most cases, and Raw State is used when Managed State is not enough, such as when a custom Operator is required. In the actual production process, only Managed State is recommended.
(2) State is divided into two types according to whether there is a keyKeyedState
or not OperatorState
.
KeyedState Features
- It can only be used in operators on KeyedStream, and the state is bound to a specific key.
- Each key on the KeyedStream stream corresponds to a state object. If an operator instance processes multiple keys and accesses corresponding multiple states, it can correspond to multiple states.
- KeyedState is stored in StateBackend.
- Accessed through the RuntimeContext, implements
Rich Function
the interface. - Support multiple data structures: ValueState,
ListState
, ReducingState, AggregatingState, MapState.
OperatorState Features
- Can be used for all operators, but the entire operator only corresponds to one state.
- There are many ways to redistribute when concurrent changes are available: (1) evenly distribute (2) each get the full amount after merging.
- implementation
CheckpointedFunction
orListCheckpointed
interface. - Currently only
ListState
data structures are supported.
Here fromElements
will call FromElementsFunction
the class of , in which the operator state of type is used ListState
.
24. Do you understand the status of Flink broadcasting?
In Flink, the broadcast state is called BroadcastState
. Used in broadcast state mode. The so-called broadcast state mode means that the data from one flow needs to be broadcast to all downstream tasks, stored locally in the operator, and depends on the broadcast data when processing another flow. The broadcast state mode is illustrated below with an example.
The example in the figure above contains two streams. One is the Kafka model stream . The model is a model trained by machine learning or deep learning. The model is sent to all downstream rule operators through broadcasting. The rule operator caches the rules in In Flink's local memory, the other is the Kafka data stream , which is used to receive the test set, which depends on the model in the model stream , and completes the inference task of the test set through the model.
The broadcast state must be MapState
of the type, and the broadcast state mode needs to be processed with the broadcast function , which provides an interface for processing broadcast data streams and ordinary data streams.
25. What are the Flink state interfaces?
Using state in Flink contains two state interfaces:
- State operation interface : use the state object itself to store, write, and update data.
- State Provider :
StateBackend
Get the state object itself from .
1. State operation interface
The state operation interface in Flink is aimed at two types of users, namely application developers and the Flink framework itself . So Flink designed two sets of interfaces.
(1) State interface for developers
The development-oriented State interface only provides the basic operation interface for adding, deleting, and modifying data in the State, and the user cannot access other information required by the state at runtime. The interface system is as follows:
(2) Facing the internal State interface
The internal State interface is used by the Flink framework, which provides more State methods and can be flexibly expanded as needed. In addition to access to the data in the State, it also provides internal runtime information, such as the serializer of the data in the State, the namespace, the serializer of the namespace, and the interface for merging namespaces. The internal State interface is named as InternalxxxState
.
2. Status access interface
After having the state, how should the developer access the state when customizing UDF (UserDefineFunction, user-defined function)?
State will be saved in StateBackend, but StateBackend contains different types. Therefore, two state access interfaces are abstracted in Flink: OperatorStateStore
and KeyedStateStore
, when users write UDF, they do not need to consider which StateBackend type interface is used.
(1) OperatorStateStore interface principle
OperatorState data is stored in memory in the form of Map, and does not use RocksDBStateBackend
and HeapKeyedStateBackend
.
(2) KeyedStateStore interface principle
KeyedStateStore data is stored using RocksDBStateBackend
or HeapKeyedStateBackend
, and the creation and acquisition of states in KeyedStateStore are handed over to the specific StateBackend for processing. KeyedStateStore itself is more like a proxy.
26. How is the Flink state stored?
In Flink, state storage is called StateBackend , which has two capabilities:
- Provides the ability to access State during the calculation process, and developers can use the interface of StateBackend to read and write data when writing business logic.
- Ability to persist State to external storage to provide fault tolerance.
Flink state provides three storage methods:
- Memory type :
MemoryStateBackend
, suitable for verification, testing, not recommended for production use. - File type :
FSStateBackend
, suitable for long-term and large-scale data. - RocksDB :
RocksDBStateBackend
, suitable for long-term and large-scale data.
The StateBackend mentioned above is user-oriented . The relationship between the three states in Flink is as follows:
At runtime, both the local State MemoryStateBackend
and FSStateBackend
the local State are stored in the memory of the TaskManager, so the bottom layer depends on it HeapKeyedStateBackend
. HeapKeyedStateBackend
For the inside of the Flink engine, the user does not need to be aware of it.
1. Memory StateBackend
MemoryStateBackend
, all the State data required at runtime are stored in the memory on the TaskManager JVM heap , and the KV type State and window operator State use HashTable to store data, triggers, etc. When performing a checkpoint, the snapshot data of the State will be saved in the memory of the JobManager process .
MemoryStateBackend
Snapshots can be taken asynchronously (or synchronously, asynchronous is recommended ) to avoid blocking operators to process data.
The memory-based StateBackend is not recommended for use in a production environment, and you can develop and debug tests locally. Note the following points:
- State is stored in the JobManager's memory, limited by the JobManager's memory size.
- Each State defaults to 5 MB 5MB5 MB , adjustable via
MemoryStateBackend
constructor. - Each Stale cannot exceed the Akka Frame size.
2. File StateBackend
FSStateBackend
, all the State data required at runtime are stored in the memory of TaskManager , and when the checkpoint is executed, the snapshot data of the State will be saved to the configured file system .
It can be a distributed or local file system, the path is as follows:
- HDFS path: " hdfs://namenode:40010/flink/checkpoints "
- Local path: " file:///data/flink/checkpoints "
FSStateBackend
It is suitable for stateful processing tasks that deal with large states, long windows, or large key-value states. Note the following points:
- State data is first stored in TaskManager's memory.
- State size cannot exceed TM memory.
- TM writes State data to external storage asynchronously.
MemoryStateBackend
and FSStateBackend
both depend on HeapKeyedStateBackend
, HeapKeyedStateBackend
using State to store data.
3、RocksDBStateBackend
RocksDBStateBackend
It is different from memory type and file type.
RocksDBStateBackend
Use the embedded local database RocksDB to store the flow computing data state in the local disk, which will not be limited by the memory size of the TaskManager. When performing checkpoints, the state data stored in the entire RocksDB will be fully or incrementally persisted To the configured file system, a small amount of checkpoint metadata is stored in the JobManager memory. RocksDB overcomes the problem of State being limited by memory, and at the same time, it can be persisted to the remote file system, which is more suitable for production use.
Disadvantages: RocksDBStateBackend
Compared with the memory-based StateBackend, the cost of accessing State is much higher, which may lead to a sharp drop in the throughput of the data flow, and may even be reduced to 1/10 1/10 of the original1/10。
Applicable scene
- Best suited for stateful processing tasks that deal with large states, long windows, or large key-value states.
RocksDBStateBackend
Ideal for high availability scenarios.RocksDBStateBackend
is currently the only backend that supports incremental checkpointing. Incremental checkpoints are ideal for very large state scenarios.
important point
- The total state size is limited to disk size and not limited by memory.
RocksDBStateBackend
It is also necessary to configure an external file system to save the State centrally.- RocksDB's JNI API is based on byte arrays, and the size of a single Key and a single Value cannot exceed 8 88 bytes.
- For applications using state with coalescing operations, such as ListState, over time may accumulate to more than 2 31 2^{31}231 bytes in size, which will cause subsequent queries to fail.
27. How is the Flink state persisted?
First of all, the state of Flink must eventually be persisted to a third-party storage to ensure that it can be recovered after a cluster failure or a job hangs. RocksDBStateBackend
There are two persistence strategies:
- Full persistence strategy , RocksFullSnapshotStrategy
- Incremental persistence strategy , RocksIncementalSnapshotStrategy
1. Full persistence strategy
Write the full amount of State to the state store (HDFS) each time. StataBackend of memory type, file type, and RocksDB type all support full persistence strategy.
When executing the persistence strategy, the asynchronous mechanism is used, and each operator starts 1 1An independent thread writes its own state into distributed storage and reliable storage. During the persistence process, the state may be continuously modified. The memory-based state backend usesCopyOnWriteStateTable
to ensure thread safety,RocksDBStateBackend
and the RocksDB snapshot mechanism is used to ensure thread safety.
2. Incremental persistence strategy
Incremental persistence is the State that is incrementally persisted each time, and only RocksDBStateBackend
supports incremental persistence.
Flink's incremental checkpoint is based on RocksDB , which is a KV storage based on LSM-Tree. The new data is kept in memory, called memtable
. If the keys are the same, the later data will overwrite the previous data. Once the data memtable
is full, RocksDB will compress the data and write it to disk. memtable
After the data is persisted to disk, it becomes immutable sstable
.
Because sstable
is immutable, Flink sstable
can calculate what has changed in the state by comparing the RocksDB files created and deleted by the previous checkpoint.
In order to ensure sstable
that is immutable, Flink will trigger a refresh operation in RocksDB to force memtable
flush to disk. When Flink performs a checkpoint, it sstable
persists new s to HDFS, while retaining references. In this process, Flink does not persist all local data sstable
, because part of the local history sstable
has been persisted to the storage in the previous checkpoint, and only needs to increase sstable
the number of references to the file.
RocksDB will merge sstable
and delete duplicate data in the background. Then delete the original one in RocksDB sstable
and replace it with the newly synthesized one sstable
. The new sstable
contains sstable
the information from the deleted , by merging the histories sstable
into a new one sstable
and deleting the histories sstable
. It can reduce the history files of checkpoints and avoid the generation of a large number of small files.
28. How to clean up the Flink state after it expires?
1. State expires in DataStream
The cleanup policy StateTtlConfig can be set for each state in DataStream, and the content that can be set is as follows:
- Expiration time: If it has not been accessed for a long time, it will be regarded as State expired, similar to cache.
- Expiration time update strategy: update-on-create and write, update-on-read and write.
- State visibility: Available if not cleaned up, unavailable if timed out.
2. State expiration in Flink SQL
Flink SQL generally uses State in stream Join and aggregation scenarios. If State is not cleaned up regularly, it will lead to too many States and memory overflow. The cleaning policy configuration is as follows:
StreamQueryConfig qConfig = ...
//设置过期时间为 min = 12小时 ,max = 24小时
qConfig.withIdleStateRetentionTime(Time.hours(12),Time.hours(24));