Flink Standalone cluster installation and deployment

Flink architecture introduction

The installation and deployment of Flink are mainly divided into local (single machine) mode and cluster mode. The local mode can be used only by directly decompressing without modifying any parameters. It is generally used when doing some simple tests. The local mode will not be repeated in our course. The cluster mode includes:
Standalone.
Flink on Yarn.
Mesos.
Docker.
Kubernetes.
AWS.
Goole Compute Engine.
Here first deploy the Standalone mode cluster

Basic structure

The entire Flink system is mainly composed of two components, namely JobManager and TaskManager. The Flink architecture also follows the design principles of the Master-Slave architecture. The JobManager is the Master node and the TaskManager is the Worker (Slave) node. Communication between all the components are by means of Akka Framework, including the status of the task and Checkpoint triggering information schema is as follows:
Insert picture description here
Client
The client is responsible for tasks to submit to the cluster, the JobManager Akka connector constructed and submit tasks to the JobManager , Obtain the task execution status by interacting with the JobManager. Clients can submit tasks by CLI or by using Flink WebUI. You can also specify the RPC network port of JobManager in the application to build ExecutionEnvironment. Submit Flink applications.
JobManager
JobManager is responsible for the scheduling of tasks and resource management of the entire Flink cluster. From the client Obtain the submitted application, and then allocate TaskSlots resources for the submitted application according to the usage of TaskSlot on the TaskManager in the cluster and instruct TaskManger to start the application. JobManager is equivalent to the Master node of the entire cluster, and there is only one active JobManager in the entire cluster, responsible for task management and resource management of the entire cluster.
The JobManager and TaskManager communicate through the Actor System to obtain the execution status of the task and send the execution status to the client through the Actor System. At the same time, in the process of task execution, Flink JobManager will trigger the Checkpoints operation. After each TaskManager node receives the Checkpoint trigger instruction, it completes the Checkpoint operation. All checkpoint coordination processes are completed in Flink JobManager. When the task is completed, Flink will feed back the information of the task execution to the client, and release the resources in the TaskManager for the next task submission.

TaskManager
TaskManager is equivalent to the slave node of the entire cluster, responsible for specific task execution and resource application and management of corresponding tasks on each node. The client compiles and packages the prepared Flink application, submits it to the JobManager, and then the JobManager assigns the task to the TaskManager node with resources according to the resource situation of the registered TaskManager, and then starts and runs the task. The TaskManager receives the tasks that need to be deployed from the JobManager, then uses the Slot resource to start the Task, establishes a network connection for data access, receives the data, and starts data processing. At the same time, the data interaction between TaskManagers is carried out through data flow.
Flink's task operation is actually multi-threaded, which is very different from the MapReduce multi-JVM process. Fink can greatly improve the efficiency of CPU usage. System resources are shared between multiple tasks and tasks through TaskSlot. Effective management of resources by managing multiple TaskSlot resource pools in a TaskManager

Installation and deployment

Document preparation and deployment planning

We prepare three servers server01, server02, server03
server01 as jobManager (master) server01, server02, server03 as TaskManager (slave) server01 as both a master and a slave to
download and install the file flink-1.9.1-bin-scala_2.12.tgz , Download address: https://download.csdn.net/download/zhangxm_qz/12732760

Upload files to the server and unzip

Upload the file flink-1.9.1-bin-scala_2.12.tgz to the server01 server /opt/apps directory
and unzip it as follows:

[root@server01 apps]# ll
total 750912
lrwxrwxrwx.  1 root  root         11 Jun 14 22:54 flink -> flink-1.9.1
drwxr-xr-x. 10   502 games       156 Sep 30  2019 flink-1.9.1
-rw-r--r--.  1 root  root  246364329 Aug 20  2020 flink-1.9.1-bin-scala_2.12.tgz

Modify the configuration file

Enter the conf directory under flink to modify the configuration file flink-conf.yaml, and modify the following two items

jobmanager.rpc.address: server01   # 修改master节点服务器 我这里是 server01 做master
taskmanager.numberOfTaskSlots: 3 # taskmanager.numberOfTaskSlot 参数默认值为 1,修改成 3。表示数每一个TaskManager 上有 3 个 Slot

Modify the contents of the slaves file as follows:

"flink-conf.yaml" 259L, 10326C written
[root@server01 conf]# vi slaves
server01
server02
server03
~

Distribute installation files to other servers

Copy the modified flink directory to server02 and server03, the command is as follows:

[root@server01 apps]# scp -r flink-1.9.1  root@server02:/opt/apps 
[root@server01 apps]# scp -r flink-1.9.1  root@server03:/opt/apps 

Start the cluster

Execute the start-cluster.sh command under bin to start the cluster as follows:

[root@server01 flink]# bin/start-cluster.sh 
Starting cluster.
Starting standalonesession daemon on host server01.
Starting taskexecutor daemon on host server01.
Starting taskexecutor daemon on host server02.
Starting taskexecutor daemon on host server03.

Visit webUI

After successful startup, visit http://server01:8081 to access flinkwebUI as follows:
Insert picture description here

Upload tasks to the cluster

Develop example program WordCount

code show as below

package com.test.flink.wc

import org.apache.flink.streaming.api.scala.StreamExecutionEnvironment

object StreamWordCount {
    
    
  def main(args: Array[String]): Unit = {
    
    
    val streamEnv: StreamExecutionEnvironment =
      StreamExecutionEnvironment.getExecutionEnvironment
    //导入隐式转换,建议写在这里,可以防止IDEA代码提示出错的问题
    import org.apache.flink.streaming.api.scala._
    //读取数据
    val stream: DataStream[String] = streamEnv.socketTextStream("server01",8888)
    //转换计算
    val result: DataStream[(String, Int)] = stream.flatMap(_.split(","))
      .map((_, 1))
      .keyBy(0)
      .sum(1)
    //打印结果到控制台
    result.print()
    //启动流式处理,如果没有该行代码上面的程序不会运行
    streamEnv.execute("wordcount")
  }
}

Upload tasks to Flink cluster through commands

Upload the program jar package to the server01 location as follows:

[root@server01 flink]# ll appjars
total 24
-rw-r--r--. 1 root root 22361 Aug 20  2020 test-1.0-SNAPSHOT.jar

Execute command upload task

Because the program needs to connect to port 8888 of server01, you must first start the data sending program through the following command, otherwise the task will fail to start due to connection failure
. Execute the following command on server01 (if the command does not exist, execute yum install -y nc to install ):

[root@server01 flink]# nc -lk 8888

Upload the package to the flink engine

[root@server01 flink]# bin/flink run -d -c com.test.flink.wc.StreamWordCount ./appjars/test-1.0-SNAPSHOT.jar 
Starting execution of program
Job has been submitted with JobID 75fd7304caa7d429b343b77dff4ce65d

By sending a string to the handler in the terminal, you can see the execution of the task in the webUI

[root@server01 flink]# nc -lk 8888
a
a
a
a
a
a
a
a

Insert picture description here
Insert picture description here

Upload tasks to Flink cluster through webUI

You can also upload the task program through the webUI interface

Insert picture description here

Guess you like

Origin blog.csdn.net/zhangxm_qz/article/details/108124130