This series includes:
- [Big Data] Detailed Explanation of Flink (1): Basics
- [Big Data] Detailed Explanation of Flink (2): Core Part I
- [Big Data] Detailed Explanation of Flink (3): Core Part II
- [Big Data] Detailed Explanation of Flink (4): Core Part III
- [Big Data] Detailed Explanation of Flink (5): Core Part IV
- [Big Data] Detailed Explanation of Flink (6): Source Code Part I
Flink Detailed Explanation (5): Core Chapter IV
- 45. Do you understand the Flink broadcast mechanism?
- 46. Do you understand Flink backpressure?
- 47. What are the effects of Flink back pressure?
- 48. How to solve Flink back pressure?
- 49. What data types does Flink support?
- 50. How does Flink perform serialization and deserialization?
- 51. Why does Flink use autonomous memory instead of JVM memory management?
- 52. How does Flink's autonomous memory manage objects?
- 53. Tell me about the Flink memory model?
- 54. How does Flink manage resources?
45. Do you understand the Flink broadcast mechanism?
It can be understood from the figure that broadcasting is a public shared variable. The broadcast variable is stored in the memory of TaskManager, so the broadcast variable should not be too large. After broadcasting a data set, different Tasks can be obtained on the node. Each There is only one copy of the node. If broadcasting is not used, each Task will copy a data set, resulting in waste of memory resources.
46. Do you understand Flink backpressure?
Backpressure (backpressure
) is a very common problem in the development of real-time computing applications, especially in streaming computing. Back pressure means that a certain node in the data pipeline becomes a bottleneck, and the downstream processing rate cannot keep up with the upstream data sending rate , and the upstream speed needs to be limited. Since real-time computing applications usually use message queues to decouple the production end and the consumer end, the data source of the consumer end is independentpull-based
, so the back pressure is usually transmitted from a certain node to the data source and reducesKafka consumer
the ingestion rate of the data source (for example).
To put it simply, the downstream processing rate cannot keep up with the upstream data sending rate , and the downstream has no time to consume. As a result, when the queue is full, the upstream production will be blocked, and eventually the ingestion of the data source will be blocked.
47. What are the effects of Flink back pressure?
Backpressure affects two metrics: checkpoint
duration and state
size.
(1) The former is because checkpoint barrier
will not pass through ordinary data, and the blocking of data processing will also cause checkpoint barrier
the length of time to flow through the entire data pipeline, so checkpoint
the overall time ( End to End Duration
) will become longer.
(2) The latter is because in order to ensure EOS
( Exactly-Once-Semantics
, exactly once), for an Operator with more than two input pipelines, checkpoint barrier
it needs to be aligned ( Alignment
). After receiving a faster input pipeline barrier
, the data behind it will be cached but not processed , until the slower input pipeline's barrier
also arrives, these cached data will be put state
into , causing state
becomes larger.
These two effects are very dangerous for jobs in the production environment , because checkpoint
is the key to ensuring data consistency, checkpoint
a longer time may cause checkpoint
timeout failure, and state
the size of may also slow down checkpoint
or even cause OOM
(use Heap-based StateBackend
) or physical memory usage to exceed RocksDBStateBackend
Stability issues with container resources (usage ).
48. How to solve Flink back pressure?
The Flink community proposed FLIP-76, which introduced a non-aligned checkpoint ( unaligned checkpoint
) to decouple the Checkpoint mechanism and the backpressure mechanism.
To solve the back pressure, the first thing to do is to locate the node that caused the back pressure. There are two main methods:
- Through the back pressure monitoring panel that comes with Flink Web UI
- Flink Task Metrics
(1) Back pressure monitoring panel
The back pressure monitoring of Flink Web UI provides SubTask level back pressure monitoring. The principle is to judge the node by periodically sampling the stack information of the Task thread and getting the frequency that the thread is blocked in the request buffer (meaning blocked by the downstream queue). Whether it is in a state of back pressure. In the default configuration, this frequency is 0.1 0.1Below 0.1OK
,0.1 0.10.1 to0.5 0.50.5 isLOW
, and more than0.5 0.50.5 isHIGH
.
(2)Task Metrics
Task Metrics provided by Flink is a better means of back pressure monitoring.
- If the buffer occupancy rate of the sending end of a subtask is high, it indicates that it is limited by the downstream backpressure.
- If a Subtask's receiving end Buffer usage is high, it indicates that it will transmit back pressure to the upstream.
49. What data types does Flink support?
The data types supported by Flink are shown in the figure below:
From the figure, we can see that Flink types can be divided into basic types ( Basic
), arrays ( Arrays
), composite types ( Composite
), auxiliary types ( Auxiliary
), generic and other types ( Generic
). Flink supports arbitrary Java or Scala types.
50. How does Flink perform serialization and deserialization?
The meaning of the so-called serialization and deserialization:
- Serialization : It is to convert a memory object into a binary string to form a network transmission or persistent data stream.
- Deserialization : Convert binary strings to memory pairs.
TypeInformation
Is the core class of Flink type system.
In Flink, when data needs to be serialized, it will use the TypeInformation
generated serializer interface to call a createSerialize()
method, which is created TypeSerializer
and TypeSerializer
provides serialization and deserialization capabilities.
The serialization process of Flink is shown in the following figure:
For most data types, Flink can automatically generate corresponding serializers , which can serialize and deserialize data sets very efficiently, as shown in the following figure:
For example, BasicTypeInfo
, WritableTypeIno
, but for GenericTypeInfo
type, Flink will use Kyro
for serialization and deserialization. Among them, Tuple
, Pojo
and CaseClass
types are composite types, and they may nest one or more data types. In this case, their serializers are also composite. They delegate the serialization of embedded types to the corresponding type's serializer.
Introduce Flink serialization and deserialization through a case:
As shown in the figure above, when creating a Tuple3 object, it contains three levels, one is int
type, one is double
type, and the other is Person
. Person
The object contains two fields, one of int
type id
and the other String
of type name
.
- During the serialization operation, the serializer corresponding to the specific serialization will be entrusted to perform the corresponding serialization operation. It can be seen from the figure that Tuple3 will serialize
int
the type through , and at this time only needs to occupy four bytes.IntSerializer
int
Person
The class will bePojo
treated as an object, andPojoSerializer
the serializer will store some attribute information in one byte. Similarly, its fields are serialized with the corresponding serializer. In the serialized result, you can see that all the data is supported byMemorySegment
.
MemorySegment
What role does it have?
MemorySegment
In Flink, objects are serialized to pre-allocated memory blocks, which represent 1 11 fixed-length memory, the default size is32 kb 32\ kb32 kb . MemorySegment
It represents the smallest memory allocation unit in Flink, which is equivalent to anbyte
array in Java. Each record is stored in one or more in serialized formMemorySegment
.
51. Why does Flink use autonomous memory instead of JVM memory management?
Because when storing large amounts of data in memory (including caching and efficient processing), the JVM will face many problems, including the following:
- Java objects have low storage density . Java's object storage in memory contains 3 33 main parts:object header,instance data,alignment padding. For example, an object containing only
boolean
properties accounts for16 1616byte
: the object header occupies8 88byte
, boolean attributes account for1 11byte
, in order to align up to8 8Multiples of 8 account for an extra 7 77byte
. and actually only need1 11 (bit
/8 1/81/8 byte) is enough. - Full GC can greatly affect performance . Especially for the JVM that has opened a large memory space to handle larger data, GC (
Garbage Collection
) will reach the second level or even the minute level. - OOM issues affect stability . Memory overflow (
OutOfMemoryError
) is a problem often encountered in distributed computing frameworks. When the size of all objects in the JVM exceeds the memory size allocated to the JVM, an error will occur, causing theOutOfMemoryError
JVM to crash, and the robustness and performance of the distributed framework will be affected. Influence. - Cache miss problem . When the CPU performs calculations, it obtains data from the CPU cache. The CPU of the modern system has a multi-level cache, and when loading, it is loaded in units of Cache Line. If objects can be stored contiguously, Cache Miss will be greatly reduced. Make the CPU focus on business processing instead of idling.
52. How does Flink's autonomous memory manage objects?
Flink does not store a large number of objects in the heap memory, but serializes the objects into a pre-allocated memory block. This memory block is called, which MemorySegment
represents a fixed length of memory (the default size is 32 3232 KB), which is also the smallest memory allocation unit in Flink, and provides a very efficient read and write method. Many operations can directly operate on binary data without deserialization. Each record is stored in one or more in serialized formMemorySegment
. If more data needs to be processed than can fit in memory, Flink's operators will spill some of the data to disk.
53. Tell me about the Flink memory model?
Flink's overall memory class diagram is as follows:
It mainly includes JobManager
memory model and TaskManager
memory model.
(1) JobManager memory model
in 1.10 1.10In 1.10 , Flink unified the memory management and configuration on the TM (TaskManager) side, correspondingly in1.11 1.11In 1.11 , Flink further modified the memory configuration on the JM (JobManager) side, making its options and configuration methods consistent with those on the TM side.
(2) TaskManager memory model
Big 1.10 1.101.10 has made major changes to the TaskManager's memory model and configuration options for Flink applications, allowing users to more tightly control their memory overhead.
-
JVM Heap (JVM heap memory)
- Framework Heap Memory (memory on the framework heap) : The memory used by the Flink framework itself, that is, the memory on the heap occupied by the TaskManager itself, is not included in the resources of the Slot. Configuration parameters:
taskmanager.memory.framework.heap.size = 128MB
, default 128 128128 MB。 - Task Heap Memory (Task heap memory) : The memory on the heap used by Task when executing user code. Configuration parameters:
taskmanager.memory.task.heap.size
.
- Framework Heap Memory (memory on the framework heap) : The memory used by the Flink framework itself, that is, the memory on the heap occupied by the TaskManager itself, is not included in the resources of the Slot. Configuration parameters:
-
Off-Heap Mempry (off-heap memory)
- DirectMemory (direct memory)
- Framework Off-Heap Memory : The memory used by the Flink framework itself, that is, the external memory occupied by the TaskManager itself, is not included in the Slot resource. Configuration parameters:
taskmanager.memory.framework.off-heap.size = 128MB
, default 128 128128 MB。 - Task Off-Heap Memory (Task off-heap memory) : The external memory used by Task to execute user code. Configuration parameters:
taskmanager.memory.task.off-heap.size = 0
, default 0 00。 - Network Memory : The size of off-heap memory used for network data exchange, such as network data exchange buffer.
- Framework Off-Heap Memory : The memory used by the Flink framework itself, that is, the external memory occupied by the TaskManager itself, is not included in the Slot resource. Configuration parameters:
- Managed Memory : The off-heap memory managed by Flink is used for sorting, hash tables, caching intermediate results, and the local memory of RocksDB State Backend.
- DirectMemory (direct memory)
-
JVM Specific Memory (the memory used by the JVM itself)
- JVM Metaspace (JVM Metaspace)
- JVM Overhead (JVM execution overhead) : The content required by the JVM itself during execution, including the memory used by thread stacks, IO, and compilation caches. Configuration parameters:
taskmanager.memory.jvm-overhead.min = 192MB
,taskmanager.memory.jvm-overhead.max = 1GB
,taskmanager.memory.jvm-overhead.fraction = 0.1
.
-
overall memory
-
Total Process Memory : The total memory consumed by the Flink Java application (including user code) and the JVM running the entire process. Total process memory = memory used by Flink + JVM metaspace + JVM execution overhead. Configuration item:
taskmanager.memory.process.size: 1728m
. -
Flink total memory : memory consumed by the Flink Java application only, including user code, but excluding memory allocated by the JVM for its operation. Flink uses memory = inside and outside the framework heap +
task
inside and outside the heapnetwork
++manage
.
-
54. How does Flink manage resources?
Flink can be divided into two layers in resource management: cluster resources and its own resources . Cluster resources support mainstream resource management systems, such as Yarn
, Mesos
, K8s
etc., and also support independently started Standalone
clusters. Self-resources are related to task
the resource usage of each child and are maintained by Flink itself.
1. Analysis of Cluster Architecture
The operation of Flink is mainly composed of a client , a JobManager (hereinafter referred to as JM
) and more than one TaskManager (referred to as TM
or Worker
).
- Client : The client is mainly used to submit tasks to the cluster. In the Session or Per Job mode, the client program is also responsible for parsing the user code and generating JobGraph; in the Application mode, it is sufficient to directly submit the user and execution parameters
jar
. The client generally supports two modes:detached
mode, the client automatically exits after submitting;attached
mode, the client blocks after submitting and waits for the task to be executed before exiting. - JobManager : JM is responsible for deciding when the application is scheduled
task
,task
what to do when the execution ends or fails, and coordinates checkpoints and failure recovery. The process mainly consists of the following parts:- ResourceManager : Responsible for resource application, release, and management
slot
(the most granular resource management unit in a Flink cluster). Flink implements various RM implementations to adapt to various resource management frameworks, such asYarn
,Mesos
,K8s
orStandalone
. InStandalone
mode, RMs can only allocateslot
, not start new TMs. Note: The RM mentioned hereYarn
is not the same thing as the RM here, and the RM here is an independent service in JM. - Dispatcher : Provides an interface for Flink to submit tasks
rest
, starts a new JobMaster for each submitted task, provides a Web UI for all tasks, and queries task execution status. - JobMaster : Responsible for managing and executing a single JobGraph, multiple tasks can be started in a cluster at the same time, each with its own JobMaster. Note the difference between JobMaster and JobManager here.
- ResourceManager : Responsible for resource application, release, and management
- TaskManager : Also known as TM
worker
, it is used to execute tasks in the data flow graph, cache and exchange data. The cluster has at least one TM, and the smallest resource management unit in TM is thatslot
eachslot
can execute onetask
, soslot
the number in TM represents the number of tasks that can be executed at the same time.
2. Slot and resource management
Each TM is an independent JVM process, which executes one or more tasks internally based on independent threads. In order to control the execution resources of each task, TM uses task slot
to manage. Each task slot
represents a part of fixed resources in TM, for example, a TM has 3 33 ,slot
eachslot
will get1/3 1/31/3 memory resources. There will be no resource preemption between different tasks. Note that the GPU is not currently isolated, and currentlyslot
only memory resources can be divided.
For example, the following data flow graph, after being expanded into a parallel flow graph, task
may be split into multiple tasks and executed in parallel in the cluster. The operation chain can combine multiple different tasks, so as to support the execution of multiple tasks in one thread, without frequently releasing the application thread. At the same time, the operation chain can also cache data uniformly, increase data processing throughput, and reduce processing delay.
In Flink, several conditions need to be met to combine different subtasks:
- The incoming edge of the downstream node is 1 11 (guaranteed that no data exists
shuffle
); - The upstream and downstream subtasks are not empty;
- The connection policy is always
ALWAYS
; - The partition type is
ForwardPartitioner
; - Consistent parallelism;
- Currently Flink enables
Chain
the feature.
The execution graph in the cluster may be as follows:
Flink also supports slot
the sharing of , that is, different tasks are assigned to the same according to the dependencies of the tasks slot
. This brings several benefits: it is convenient to count the maximum resource configuration required by the current task (the maximum parallelism of a subtask); avoid slot
excessive application and release of and improve slot
the efficiency of usage.
By slot
sharing, it is possible slot
to contain a complete task execution link in a .
3. Application execution
A Flink application is a user-written main
function, which may contain one or more Flink tasks. These tasks can be executed locally or started on a remote cluster, which can run for a long time or be started independently. The following are currently supported task submission schemes:
- Session Cluster
- Lifecycle: The cluster is created in advance and runs for a long time, and the client connects to the cluster when submitting a task. Even after all tasks are executed, the cluster will keep running unless stopped manually. So the life cycle of the cluster is independent of the tasks.
- Resource isolation: TM
slot
is applied for by RM, and will be released automatically when the above tasks are executed. Since multiple tasks will share the same cluster, there will be competition between tasks, such as network bandwidth. If a TM dies, all tasks on it will fail. - Other aspects: Having a cluster created in advance can avoid too much consideration of the cluster problem every time it is used. It is more suitable for scenarios where the execution time is very short and the startup time is relatively high, such as interactive query analysis.
- Per Job cluster
- Life cycle: Create a separate cluster for each submitted task. When the client submits a task, it directly communicates with the ClusterManager to apply for creating a JM and runs the submitted task internally. TM delays the application according to the resources required for task operation. Once the tasks are executed, the cluster will be recycled.
- Resource isolation: If a task encounters a fatal problem, it will only affect its own task.
- Other aspects: Since RM needs to apply for and wait for resources, the startup time will be slightly longer. It is suitable for tasks that are relatively large, run for a long time, need to ensure long-term stability, and do not care about startup time.
- Application cluster
- Lifecycle: Similar to Per Job, except that
main
the method runs in the cluster. The task submission procedure is very simple. It does not need to start or connect to the cluster, but directly packages the application program into the resource management system and starts the corresponding EntryPoint, calls the method of the user program in the EntryPoint, parses and generates the JobGraph, and then starts the operationmain
. The lifecycle of a cluster is the same as that of an application. - Resource isolation: RM and Dispatcher are application level.
- Lifecycle: Similar to Per Job, except that