Flink submit task parameters

Opinions on online parameters are divergent and unrealistic

  1. Submit the parameter settings of the yarn task:

parameter

suggested value

description

-n (taskmanager)

Number of nodes* (4-8)

1.10 is obsolete

This parameter is the number of Flink taskmanagers. The Flink engine operation needs to be composed of a jobmanager and several taskmanagers. Each taskmanager is an independent part. When a Flink application needs to run, it will be randomly assigned to a taskmanager to run, so the number of taskmanagers increases, and more Flink applications can be run in parallel without interfering with each other. But obviously, the resources in the cluster are limited. The more taskmanagers, the fewer resources a single taskmanager can allocate, which will also cause Flink performance to degrade. Therefore, according to empirical values, the number of taskmanagers can be the number of cluster nodes * 4 ~ the number of nodes * 8.

-s (slot)

Configure according to the actual scene

This parameter is the number of slots owned by a single taskmanager. After the Flink task is submitted, the total number of slots that can be used in the cluster is slot * taskmanager. A slot can provide 1 concurrency for a taskmanager. For example, slot=30, that is, a taskmanager can run at most 30 concurrency. Of course, in actual operation, you can also only run 20 concurrency, then the remaining 10 slots are idle at this time. The remaining unused slots will not take up CPU resources, but will take up related memory resources. The modification of this parameter is dynamically adjusted according to the actual concurrency used. In the case of sufficient memory, you can set a larger value appropriately.

-tm(taskmanager memory)

30000

This parameter is the memory resource allocated to a single taskmanager. As long as taskmanager memory is sufficiently used, increased memory resource allocation will not directly improve performance. After the jobmanager memory is allocated, all the remaining memory can be allocated to the taskmanager first. According to the experience value, about 30000MB can be allocated.

–jm(jobmanager memory)

5000

This parameter is the memory resource allocated to jobmanager. This parameter has almost no effect on the overall performance and does not need to be allocated too much. According to the experience value, allocate 5000MB~15000MB.

 

  1. Tuning
  1. Increase CPU usage while reducing additional performance overhead
  1. According to the business model, set up an appropriate garbage collector, analyze the GC log, set a reasonable partition size, GC thread concurrency, and reduce full gc operations.
  2. Set a reasonable number of TaskManagers and the number of slots corresponding to each TaskManager, so that each node has a reasonable degree of task parallelism. Note that the number of slots should not be too high to avoid the additional overhead of threads.
  3. Set the number of partitions in the operator to prevent insufficient memory GC.
  4. To prevent data skew, you can use interfaces such as rebalance to divide the data evenly.
  5. According to the size of the data processing, set the amount of buffer data blocks in the "taskmanager.network.numberOfBuffers" buffer.
  6. Optimize business logic, reduce the amount of calculation and IO operations. Filter unnecessary data in advance, reuse memory space as much as possible, and avoid double calculations.

 

  1. JVM GC memory
  1. GC configuration: In the "flink-conf.yaml" configuration file of the client, add parameters in the "env.java.opts" configuration item:

“-Xloggc:<LOG_DIR>/gc.log -XX:+PrintGCDetails -XX:-OmitStackTraceInFastThrow -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=20 -XX:GCLogFileSize=20M”

 

  1. Resource parameter configuration
    1. core:heap_memory=1:4, that is, 1 core corresponds to 4G memory. E.g:
      • The core parameter value is 1CU, and the heap memory parameter value is 3G, then the final resource allocation result is 1CU+4G.
      • The core parameter value is 1CU, the heap memory parameter value is 5G, and the final resource allocation result is 1.25CU+5G
  2. Optimize DataStream operation
  1. The real-time program optimizes the data partitioning or grouping operation of DataStream. When partitions cause data skew, you need to consider optimizing partitions.
  2. Avoid non-parallelism operations. Some operations on DataStream can cause parallelism, such as WindowAll.
  3. Try not to use String for keyBy.
  1. Parallelism
    1. Increase the degree of parallelism of tasks and make full use of the computing power of cluster machines. Generally, the degree of parallelism is set to 2-3 times the total number of CPU cores in the cluster.
    2. The number of Sources is related to the number of upstream Partitions. For example, the number of Sources is 16, and the concurrent number of Sources can be 16, 8, or 4, etc., and must not exceed 16. (The concurrent number of Source cannot be greater than the number of partitions of Source)
    3. Operator setting parallelism: the parallelism of an operator, data source and sink can be specified by calling the setParallelism() method
    4. Execution environment setting parallelism: Flink programs run in the execution environment. The execution environment defines a default parallelism for all executed operators, data sources, and data sinks. The default parallelism of the execution environment can be specified by calling the setParallelism() method.
    5. Set the degree of parallelism when the client submits: The degree of parallelism can be set when the client submits the job to Flink . For the CLI client, the degree of parallelism can be specified by the " -p " parameter.
    6. Provided in the system configuration file: In flink-conf.yaml in , provided parallelism.default parameters
  2. Task parameter configuration
    1. When using the yarn-session command, add the " -jm MEM " parameter to set the memory
    2. When using the yarn-cluster command, add the " -yjm MEM " parameter to set the memory
    3. When using the yarn-session command, add the " -n NUM " parameter to set the number of TaskManagers
    4. When using the yarn-cluster command, add the " -yn NUM " parameter to set the number of TaskManagers
    5. When using the yarn-session command, add the " -p NUM " parameter to set the SLOT number
    6. When using the yarn-cluster command, add the " -yp NUM " parameter to set the SLOT number
    7. When using the yarn-sesion command, add the " -tm MEM " parameter to set the memory
    8. When using the yarn-cluster command, add the " -ytm MEM " parameter to set the memory
  3. Partition settings
  1. Random Partition: Partition the elements randomly. dataStream.shuffle()
  2. The elements are partitioned based on round-robin, so that each partition is responsible for balancing. Solve the situation of data skew: dataStream.rebalance();
  3. Use round-robin to partition elements into subsets of downstream operations. When you want to distribute data from each parallel instance of a source to a subset of mappers to distribute the load, but you don't want to fully partition load balancing: dataStream.rescale();
  4. Broadcast each element to all partitions: dataStream.broadcast();
  5. Custom partition: Use a custom Partitioner to select the target task for each element and partition according to a certain characteristic to optimize task execution
  1. Checkpoint settings

https://ververica.cn/developers/introduction-to-state-management-and-fault-tolerance/

 

  1. flink status

https://ververica.cn/developers/flink-state-best-practices/

  1. Official parameter configuration and tuning:
  1. https://ci.apache.org/projects/flink/flink-docs-release-1.12/zh/deployment/config.html
  1. https://ci.apache.org/projects/flink/flink-docs-release-1.12/zh/deployment/memory/mem_tuning.html

 

Guess you like

Origin blog.csdn.net/qq_34387470/article/details/115366459