Flink on Yarn deploys Session-Cluster and Per-Job-Cluster


The principle of Flink on Yarn mode is to rely on YARN to schedule Flink tasks, which is currently used more in enterprises. The advantage of this mode is that it can make full use of cluster resources and improve the utilization of cluster machines. It only needs 1 set of Hadoop cluster to execute MapReduce, Spark, and Flink tasks. The operation is very convenient, and the operation and maintenance is also very easy. The Flink on Yarn mode needs to rely on the Hadoop cluster, and the Hadoop version needs to be 2.2 and above.

Principle introduction

Insert picture description here
1) When starting a new Flink YARN Client session, the client first checks whether the requested resources (containers and memory) are available. After that, it will upload the Flink configuration and JAR file to HDFS.
2) The client requests a YARN container to start the ApplicationMaster. JobManager and ApplicationMaster (AM) run in the same container. Once they are successfully started, AM will be able to know the address of JobManager, and it will generate a new Flink configuration file for TaskManager (so that it can connect to JobManager). It will also be uploaded to HDFS. In addition, the AM container also provides Flink's web interface services. The port used by Flink to provide services is configured by the user and application ID as an offset, which allows users to execute multiple YARN sessions in parallel.
3) After that, AM began to allocate a container for Flink's TaskManager, downloading JAR files and modified configuration files from HDFS. Once these steps are completed, Flink is installed and ready to accept tasks

When Flink n on n Yarn mode is used, it can be divided into two Session-Cluster and Per-Job-Cluster
Session-Cluster

This mode is to initialize a Flink cluster (called Flinkyarn-session) in advance in YARN, open up designated resources, and submit future Flink tasks here. This Flink cluster will reside in the YARN cluster unless it is manually stopped. The Flink cluster created in this way will monopolize resources, no matter whether there are Flink tasks in execution, other tasks on YARN cannot use these resources
Insert picture description here

Per-Job-Cluster

In this mode, a new Flink cluster is created every time a Flink task is submitted. Each Flink task is independent of each other and does not affect each other, which is convenient for management. After the task execution is completed, the Flink cluster created will also disappear, and will not occupy additional resources. It will be used on demand, which maximizes resource utilization. This mode is recommended for work.

Flink on Yarn mode

The link on Yarn mode requires the following conditions:
1) Install and configure the hadoop cluster, see: https://blog.csdn.net/zhangxm_qz/article/details/106695347
2) Configure the HADOOP_HOME environment variable
3) Download Flink and submit it to Hadoop Connector (jar package), and copy the jar to Flink's lib directory.
My hadoop version is 2.6.2. The jar package download address: https://download.csdn.net/download/zhangxm_qz/12737715

Session-Cluster mode

Start the hadoop cluster

[root@server01 hadoop]# sbin/start-all.sh 

Start the flink cluster through the flink command

[root@server01 flink]# bin/yarn-session.sh -n 3 -s 3 -nm bjsxt1

The parameters of yarn-session.sh are described as follows:

  -n,--container <arg> 表示分配容器的数量(也就是 TaskManager 的数量)。
 -D <arg> 动态属性。
 -d,--detached 在后台独立运行。
 -jm,--jobManagerMemory <arg>:设置 JobManager 的内存,单位是 MB。
 -nm,--name:在 YARN 上为一个自定义的应用设置一个名字。
 -q,--query:显示 YARN 中可用的资源(内存、cpu 核数)。
 -qu,--queue <arg>:指定 YARN 队列。
 -s,--slots <arg>:每个 TaskManager 使用的 Slot 数量。
 -tm,--taskManagerMemory <arg>:每个 TaskManager 的内存,单位是 MB。
 -z,--zookeeperNamespace <arg>:针对 HA 模式在 ZooKeeper 上创建 NameSpace。
 -id,--applicationId <yarnAppId>:指定 YARN 集群上的任务 ID,附着到一个后台独立运行的 yarn session 中

Problem
I keep reporting an error during the startup process here:
Container [] is running beyond virtual memory limits. Current Usage: 200MB of 1GB physical memory used; 2.6GB of 2.1GB virtual memory used. Killing container.
Prompt that the virtual memory usage is out of range
online Find the reason:
The physical memory allocated by yarn is 1GB, and the default ratio of virtual memory to physical memory is 2.1. Therefore, the virtual memory is not enough, just increase the ratio of virtual memory to physical memory.
Modify the yarn-site.xml configuration file in Hadoop, add the following content, and restart the Hadoop cluster and flink (I did not restart it several times here, but later it worked, you can view the modified version through http://server01:8088/conf Does the configuration take effect)

<property>
  <name>yarn.nodemanager.vmem-pmem-ratio</name>
  <value>5</value>
</property>

Visit FlinkwebUI and
execute the above command to start the Flink cluster. After that, the webUI address will be displayed in the log, and flink webUI can be accessed through this address

JobManager Web Interface: http://server03:36941

Insert picture description here
You can see the application we submitted in the yarn interface
Insert picture description here

After submitting the task and
successfully starting flink, you can see that there is a temporary file in the local file system. With this file, you can submit the job to Yarn

[root@server01 hadoop]# cat  /tmp/.yarn-properties-root 
#Generated YARN properties file
#Fri Aug 21 02:39:00 EDT 2020
parallelism=9
dynamicPropertiesString=
applicationID=application_1597991682418_0002

Command to submit task

[root@server01 flink]# bin/flink run -d -c com.test.flink.wc.StreamWordCount ./appjars/test-1.0-SNAPSHOT.jar
2020-08-21 05:40:25,763 INFO  org.apache.flink.yarn.cli.FlinkYarnSessionCli                 - Found Yarn properties file under /tmp/.yarn-properties-root.
2020-08-21 05:40:25,763 INFO  org.apache.flink.yarn.cli.FlinkYarnSessionCli                 - Found Yarn properties file under /tmp/.yarn-properties-root.
2020-08-21 05:40:27,148 INFO  org.apache.flink.yarn.cli.FlinkYarnSessionCli                 - YARN properties set default parallelism to 9
2020-08-21 05:40:27,148 INFO  org.apache.flink.yarn.cli.FlinkYarnSessionCli                 - YARN properties set default parallelism to 9
YARN properties set default parallelism to 9
2020-08-21 05:40:27,217 INFO  org.apache.hadoop.yarn.client.RMProxy                         - Connecting to ResourceManager at server01/192.168.204.10:8032
2020-08-21 05:40:27,445 INFO  org.apache.flink.yarn.cli.FlinkYarnSessionCli                 - No path for the flink jar passed. Using the location of class org.apache.flink.yarn.YarnClusterDescriptor to locate the jar
2020-08-21 05:40:27,445 INFO  org.apache.flink.yarn.cli.FlinkYarnSessionCli                 - No path for the flink jar passed. Using the location of class org.apache.flink.yarn.YarnClusterDescriptor to locate the jar
2020-08-21 05:40:27,454 WARN  org.apache.flink.yarn.AbstractYarnClusterDescriptor           - Neither the HADOOP_CONF_DIR nor the YARN_CONF_DIR environment variable is set.The Flink YARN Client needs one of these to be set to properly load the Hadoop configuration for accessing YARN.
2020-08-21 05:40:27,560 INFO  org.apache.flink.yarn.AbstractYarnClusterDescriptor           - Found application JobManager host name 'server03' and port '36941' from supplied application id 'application_1597991682418_0002'
Starting execution of program
Job has been submitted with JobID 714f6e03bf0dd2dba7d23db45a5961ab

You can see the normal execution of the task through the webUI
Insert picture description here

Per-Job-Cluster mode

Stop Flink Session-Cluster by command

[root@server01 flink]# yarn application -kill application_1597991682418_0002
20/08/21 05:47:19 INFO client.RMProxy: Connecting to ResourceManager at server01/192.168.204.10:8032
Killing application application_1597991682418_0002
20/08/21 05:47:20 INFO impl.YarnClientImpl: Killed application application_1597991682418_0002

Submit task

bin/flink run -m yarn-cluster -yn 3 -ys 3 -ynm bjsxt02  -c com.test.flink.wc.StreamWordCount ./appjars/test-1.0-SNAPSHOT.jar

Task submission parameter description: Compared with the Yarn-Session parameter, only y is added in front. The details are as follows:

 -yn,--container <arg> 表示分配容器的数量,也就是 TaskManager 的数量。
 -d,--detached:设置在后台运行。
 -yjm,--jobManagerMemory<arg>:设置 JobManager 的内存,单位是 MB。
 -ytm,--taskManagerMemory<arg>:设置每个 TaskManager 的内存,单位是 MB。
 -ynm,--name:给当前 Flink application 在 Yarn 上指定名称。
 -yq,--query:显示 yarn 中可用的资源(内存、cpu 核数)
 -yqu,--queue<arg> :指定 yarn 资源队列
 -ys,--slots<arg> :每个 TaskManager 使用的 Slot 数量。
 -yz,--zookeeperNamespace<arg>:针对 HA 模式在 Zookeeper 上创建 NameSpace
 -yid,--applicationID<yarnAppId> : 指定 Yarn 集群上的任务 ID,附着到一个后台独立运行的 Yarn Session 中。

After uploading the task, you can see the application
Insert picture description here

Guess you like

Origin blog.csdn.net/zhangxm_qz/article/details/108152008