Flink Stream-Batch Integrated Computing (5): Deployment and Operation Mode

Table of contents

cluster operation mode

1.local mode

2. Standalone mode

3. Flink on YARN mode

local mode

Standalone mode

Flink on Yarn mode


cluster operation mode

Similar to Spark , Flink also has various operating modes, of which three are mainly supported: local mode, standalone mode, and Flink on YARN mode.

Each mode has a specific usage scenario, let's take a look at the various operating modes.

1.local mode

Good for testing and debugging. Flink can run on Linux , macOS and Windows systems. The only requirement for the installation in local mode is Java 1.7.x or later, and the JVM will be started at runtime , which is mainly used for debugging code and can run on one server.

2. Standalone mode

Applies to Flink self-management resources. Flink has its own cluster mode standalone , which mainly leaves resource scheduling management to the Flink cluster itself. Standalone is a cluster mode, which can have one or more master nodes ( JobManager , HA mode, used for resource management scheduling, task management, task division, etc.), multiple slave nodes ( TaskManager , mainly used to execute JobManager decomposed task).

3. Flink on YARN mode

Use YARN to uniformly schedule and manage resources. Generally, in the study and research process or when resources are insufficient, you can deploy in local mode. The Flink on YARN mode is more common in the production environment .

See the next section for the workflow of Flink on YARN task submission.

local mode

Flink's local mode deployment and installation

In local mode, there is no need to start any process, just use the local thread to simulate the process of flink, which is suitable for testing, development and debugging, etc. In this mode, there is no need to change any configuration, only need to ensure that jdk8 is installed normally.

prerequisites:

Java 1.8+

Deployment steps:

1. Download the installation package and decompress it, download a newer and stable version:

# wget https://archive.apache.org/dist/flink/flink-1.16.0/flink-1.16.0-bin-scala_2.12.tgz

unzip

# tar -zxf flink-1.16.0-bin-scala_2.12.tgz

2. Directly use the script to start

Flink is in local mode, no need to change any configuration, just start after decompression

Execute the following command to directly start the local mode

cd /data-ext/flink-1.16.0
bin/start-cluster.sh

turn off local mode

cd /data-ext/flink-1.16.0
bin/stop-cluster.sh

3. Check after the startup is successful

Execute jps to see that two processes have been started

# jps

23792 TaskManagerRunner

23514 StandaloneSessionClusterEntrypoint

webUI interface access

After starting the two processes successfully, visit the 8081 port number to access the flink web management interface

http://master:8081/#/overview

4. Run the tests that come with flink

The master uses the nc command of linux to send some words to the socket.

nc is the abbreviation of netcat. It is a powerful network tool and has the reputation of the Swiss Army Knife in the network industry. The actual command of the nc command in the Linux system is ncat, and nc is a soft connection to ncat.

# sudo yum -y install nc
# nc -lk 8000

Open another window of the master, start flink's built-in word statistics program, accept the input socket data and make statistics

cd /data-ext/flink-1.16.0
bin/flink run examples/streaming/SocketWindowWordCount.jar   --hostname localhost  --port 8000

View statistics:

The statistical results of the test cases that come with flink are under the log folder

The master executes the following command to view the statistical results

cd /data-ext/flink-1.16.0/log
tail -200f flink-root-taskexecutor-0-VM-0-9-centos.out

Standalone mode

Standalone mode is a kind of cluster mode, but this mode generally does not run in the production environment, the reason is compared with the on yarn mode:

The deployment of Standalone mode is relatively simple, and can support small-scale and a small number of tasks;

The Stabdalone mode lacks system-level management of jobs in the cluster, which is prone to uneven resource allocation;

Resource isolation is relatively simple, and resource competition between tasks is serious.

prerequisites:

Prepare two servers, one for managing tasks (JobManager) and one for executing tasks (TaskManager)

One server for the management task is enough, and the server for performing the task can subsequently expand the nodes without limit according to actual needs

Install java 1.8 on each server, and set JAVA_HOME

Realize ssh password-free login between two servers

server list:

NAME

IP

OS-IMAGE

Java

master

192.168.0.220

el7. x86_64

1.8.0_291

node01

192.168.0.6

el7. x86_64

1.8.0_291

node02

192.168.0.8

el7. x86_64

1.8.0_291

deployment steps

1. Unzip the flink file of version 1.16.0

2. Configure system environment variables

# vim /etc/profile

export FLINK_HOME=/data-ext/flink-1.16.0

export PATH=$PATH:$FLINK_HOME/bin

Refresh the system environment variables to make them take effect

# source /etc/profile

3. Edit the conf file

Enter the command cd flink-1.16.0/conf/ to enter the conf directory

Enter the command vim flink-conf.yaml to edit the conf file, which is the core configuration file

jobmanager.rpc.address

Configure the jobmanager rpc address

Select a node as the master node (JobManager), and set the jobmanager.rpc.address configuration item to the IP or host name of the node.

Make sure all nodes have the same jobmanager.rpc.address configuration.

  • Modify taskmanager memory size

taskmanager.memory.process.size: 2048m

taskmanager.numberOfTaskSlots

Modify the number of taskslots of taskmanager. The card slot of each server in Flink can be configured in the conf file. The default is 1

We modify it to 2. If this value is greater than 1, the TaskManager can use multiple CPU cores, and a single TaskManager will run the acquisition function or operator in parallel.

Modify parallelism

parallelism.default: 4

4. Configure the master

vim masters

master:8081

5. Edit workers

Enter vim workers to edit the file, this file is used to configure the flink cluster sub-node, the default is localhost

Similar to the HDFS configuration, edit the file conf/slaves and enter the IP/hostname of each working node, and write multiple newlines for multiple nodes. Each worker node will later run the TaskManager.

If the master has a heavy burden, you can still choose the master not to be the TaskManager node (remove localhost).

# vim workers

Node01

Node02

5. Distribution configuration file

Distribute the configuration file to the child server via scp

6. Service start and stop

Start the cluster:

bin/start-cluster.sh

jps view process

Shut down the cluster:

bin/stop-cluster.sh

7. HA configuration

7.1 Server node design

master node

slave node

Deployment method

master

node01

Standalone-HA

7.2 Configure environment variables

# vim /etc/profile

export HADOOP_HOME=/data-ext/hadoop-3.2.4

export HADOOP_CONF_DIR=$HADOOP_HOME/etc/Hadoop

Refresh the system variable environment

# source /etc/profile

7.3 Edit conf/ flink-conf.yaml, configure flink

# vim conf/flink-conf.yaml

7.3.1 Configuring zoos

Create a new snapshot storage directory and execute it in the FLINK_HOME directory

# mkdir -p tmp/zookeeper

Modify the zoo.cfg configuration under conf

# vim zoo.cfg

# The directory where the snapshot is stored.

dataDir=/data-ext/flink-1.16.0/tmp/zookeeper

# The port at which the clients will connect

clientPort=2181

# ZooKeeper quorum peers

server.1=master:2888:3888

Flink on Yarn mode

The principle of Flink on Yarn mode is to rely on YARN to schedule Flink tasks, which is currently widely used in enterprises. The advantage of this mode is that it can make full use of cluster resources, improve the utilization rate of cluster machines, and only need one Hadoop cluster to execute MR and Spark tasks, as well as Flink tasks, etc. The operation is very convenient and does not require much maintenance. A set of clusters is also very easy to operate and maintain. The Flink on Yarn mode needs to rely on the Hadoop cluster, and the Hadoop version needs to be 2.2 or above.

When starting a new Flink YARN Client session, the client first checks whether the requested resources (containers and memory) are available. Afterwards, it uploads the Flink configuration and JAR files to HDFS.

The next step for the client is to request a YARN container to start the ApplicationMaster. JobManager and ApplicationMaster (AM) run in the same container. Once they are successfully started, AM can know the address of JobManager, and it will generate a new Flink configuration file for TaskManager (so that it can connect to JobManager). This file It will also be uploaded to HDFS. In addition, the AM container also provides Flink's web interface service. The ports Flink uses to provide services are configured by user and application IDs as offsets, which enables users to execute multiple YARN sessions in parallel.

Afterwards, AM starts to allocate containers (Containers) for Flink's TaskManager, and downloads JAR files and modified configuration files from HDFS. Once these steps are complete, Flink is installed and ready to accept tasks.

Deployment steps:

1. Modify the conf/flink-conf.yaml configuration and add the following two items:

#When the user fails to submit the job, the number of re-executions

yarn.application-attempts: 4

#Set Task to distribute equally among all nodes

cluster.evenly-spread-out-slots: true

2. Download the hadoop dependency package and copy the package to the lib directory of flink

 flink-shaded-hadoop-3-uber-3.1.1.7.2.1.0-327-9.0.jar

3. Start the test (session mode)

Session-Cluster: It is to initialize a Flink cluster (called Flink yarn-session) in advance in YARN, open up specified resources, and submit all future Flink tasks here. This Flink cluster will reside in the YARN cluster unless manually stopped. The Flink cluster created in this way will monopolize resources, and no matter whether there are Flink tasks executing or not, other tasks on YARN cannot use these resources.

# Execute on the master node

bin/yarn-session.sh -d -jm 1024 -tm 1024 -s 1

-tm indicates the memory size of each TaskManager

-s indicates the number of slots for each TaskManager

-d means to run as a background program

Note: The tasks submitted at this time are all executed through the session (session), and will not apply for yarn resources

View the list of running tasks

yarn application -list

4. Test

  1. Create a text file words.txt

we think we can do it by ourself.

we can not think we can guess ourself.

think think think

we can think, so we can do it.

  1. Upload files to hdfs
# hdfs dfs -copyFromLocal  words.txt  /
  1. Submit tasks on yarn through session mode
# bin/flink run examples/batch/WordCount.jar --input hdfs://master:8020/wordcount.txt

5. Close the session mode and kill the running tasks

yarn application kill

Guess you like

Origin blog.csdn.net/victory0508/article/details/131361901