Spark kernel analysis-deployment mode analysis 8 (6)

1. Deployment mode analysis

1.1 Overview of deployment modes

The main three distributed deployment methods supported by Spark are standalone, spark on mesos and spark on YARN. Standalone mode, that is, independent mode, comes with complete services and can be deployed in a cluster independently without relying on any other resource management system. It is a resource scheduling framework implemented by Spark. Its main nodes are Client node, Master node and Worker node. Yarn is a unified resource management mechanism, on which multiple sets of computing frameworks can be run, such as map reduce, storm, etc. According to the location of the driver in the cluster, it is divided into yarn client and yarn cluster. Mesos is a more powerful distributed resource management framework that allows a variety of different frameworks to be deployed on it, including yarn. Basically, the running mode of Spark depends on the value of the MASTER environment variable passed to SparkContext. Some modes also require auxiliary program interfaces to be used together. Currently supported Master strings and URLs include:

Insert image description here
When users submit tasks to Spark for processing, the following two parameters jointly determine how Spark runs.
· –master MASTER_URL: Determines which cluster the Spark task is submitted to for processing.
· –deploy-mode DEPLOY_MODE: Determines the running mode of the Driver. The optional value is Client or Cluster.

1.2standalone framework

The standalone cluster is composed of three different levels of nodes, namely
1) Master control node, which can be compared to the chairman or chief rudder. In the entire cluster, there is at most one Master in the Active state;
2) Worker node, This is the manager, the sub-rudder. In the entire cluster, there can be multiple workers. If there are zero workers, nothing can be done.
3) Executor does hard work and is directly controlled by the worker. One worker can start multiple executors. ,The number of startups is limited by the number of ,cpu cores in the machine.
Each of these three ,different types of nodes runs in its own JVM process.
In Standalone mode, the cluster includes Master and Worker when it is started. Master is responsible for receiving jobs submitted by clients and managing Workers. According to the different ways of job submission, it is divided into driver on client and driver on worker. As shown in Figure 7 below, the upper picture shows the driver on work mode, and the lower picture shows the driver on client mode. The main difference between the two modes is where the driver is located.
The standalone deployment mode is divided into client mode and cluster mode. In client mode, the driver and client run in the same JVM and are not started by the worker. The JVM process does not exit until the spark application completes the calculation and returns the result. As shown below.

Insert image description here
In cluster mode, the driver is started by the worker, and the client exits directly after confirming that the spark application is successfully submitted to the cluster, without waiting for the spark application running result to be returned. As shown below

Insert image description here
Analyze from the deployment diagram how the file dependencies of each JVM process are satisfied when it is started.
1) The Master process is the simplest. Except for the spark jar package, there is no third-party library dependency.
2) Driver and Executor may have third-party package dependencies when running. Speaking separately,
3) Driver is relatively simple, spark-submit. When submitting, you will specify where to read the dependent jar files.
4) The Executor is started by the worker. The worker needs to download the jar files required when the Executor is started, so where to download it from.

Insert image description here
Insert image description here
Spark Standalone mode, that is, independent mode, comes with complete services and can be deployed in a cluster independently without relying on other resource management systems. In this mode, users can start an independent cluster by manually starting Master and Worker. Among them, Master plays the role of resource management, and Worker plays the role of computing node. In this mode, the Spark Driver program runs on the client (Client), and the Executor runs on the Worker node.
The following is a Spark task scheduling interactive deployment architecture diagram running in Standalone mode, including one Master node and two Worker nodes.

Insert image description here
From the above Spark task scheduling process, we can see:
1) The entire cluster is divided into Master nodes and Worker nodes, where the Driver program runs on the client. The Master node is responsible for allocating computing resources on the Worker node to tasks. The two will communicate with each other to synchronize the resource status, as shown by the red two-way arrow on the way.
2) After the client starts the task, it will run the Driver program. The Driver program will complete the initialization of the SparkContext object and register it with the Master.
3) There will be one or more ExecutorBackend processes on each Worker node. Each process contains an Executor object, which holds a thread pool, and each thread pool can execute a task. The ExecutorBackend process is also responsible for communicating with the Driver program on the client node and reporting task status.

1.2.1 Task running process in Standalone mode

The above process reflects the overall interaction between the client, Master and Worker nodes in Spark in standalone mode. The specific running process of a task requires a more detailed decomposition. The decomposed running process is shown in the small print in the picture.
1. The user starts the driver process of the application through the bin/spark-submit deployment tool or bin/spark-class. The driver process will initialize the SparkContext object and register with the Master node.
1. The Master node accepts the registration of the Driver program, checks the Worker nodes it manages, and allocates the required computing resources Executor to the Driver program. After the Worker node completes the allocation of the Executor, it reports the status of the Executor to the Master.
2. After the ExecutorBackend process on the Worker node is started, it registers with the Driver process.
2. After the driver process completes the task division through DAG Schaduler, Stage Schaduler, Task Schaduler and other processes, it allocates TASK to the ExecutorBackend on the Worker node.
1.ExecutorBackend performs TASK calculation and reports TASK status to Driver until the end.
2. After all TASKs are processed, the Driver process logs out of the Master.

1.2.2 Summary

Spark can run in standalone mode, which is the operating mode provided by Spark itself. Users can start an independent cluster by manually starting the master and worker processes, or run these daemon processes on a machine for testing. The standalone mode can be used in production environments, which effectively reduces the cost for users to learn and test the Spark framework.
Standalone mode currently only supports simple FIFO scheduling across applications. However, to allow for multiple concurrent users, you can control the maximum number of resources used by each application. By default, it requests the use of all CUP cores in the cluster.
By default, the standalone task scheduler allows worker failures (in which case it can transfer the failed task to other workers). However, the scheduler uses the master for scheduling, which creates a single point problem: if the master crashes, new applications will not be created. In order to solve this problem, you can use ZooKeeper's election mechanism to start multiple masters in the cluster, or you can use local files to achieve single-node recovery.

1.3yarn cluster mode

Apache yarn is part of the apache Hadoop open source project. It was originally designed to solve the problem of resource management of the mapreduce computing framework. Haodoop 2.0 uses yarn to separate mapreduce's distributed computing and resource management. Its introduction has brought the Hadoop distributed computing system into the platform era, that is, various computing frameworks can run in a cluster, and are uniformly managed and scheduled by the resource management system YRAN, thereby sharing the entire cluster resources and improving resource utilization.
  YARN generally has a Master/slave architecture - ResourceManager/NodeManager. The former (RM) is responsible for unified management and scheduling of resources on each NodeManager (NM). The container is the basic unit of resource allocation and scheduling, which encapsulates machine resources, such as memory, CPU, disk and network. Each task will be assigned a Container. The task can only be executed in and use the Container. Encapsulated resources. The role of NodeManager is to receive and start the application container, and report to RM the running status and resource usage of the application container on this node. ApplicationMaster is related to a specific Application and is mainly responsible for negotiating with ResourceManager to obtain appropriate Containers, tracking the status of these Containers and monitoring their progress. The figure below shows the general model of yarn cluster.
Insert image description here
There are two ways to deploy Spark on the yarn cluster, yarn client (the driver runs on the client) and yarn cluster (the driver runs on the master). The driver on master is shown in the figure below.

Insert image description here
(1) Spark Yarn Client submits applications to YARN, including Application Master programs, commands to start Application Master, programs that need to be run in Executor, etc.;
(2) After the Resource manager receives the request, in one of the node managers, it The application allocates a container and requires it to start the application's Application Master in the container. The Application master initializes sparkContext and creates DAG Scheduler and Task Scheduler.
(3) The Application master applies for a container from the resource manager based on the configuration in sparkContext. At the same time, the Application master registers with the Resource manager so that users can check the running status of the application through the Resource manager.
(4) The Resource manager searches for qualified containers in the cluster. Node manager starts the container in the node manager and requires the container to start the executor.
(5) After the Executor starts, it registers with the Application master and receives the tasks assigned by the
Application master. (6) After the application is completed, the Application Master applies to the Resource Manager to log out and shut down. Own.
Driver on client is shown in the figure below:
Insert image description here
(1) Spark Yarn Client applies to YARN's Resource Manager to start Application Master. At the same time, DAG Scheduler and TASK Scheduler will be created in SparkContent initialization.
(2) After receiving the request, ResourceManager selects a NodeManager in the cluster, allocates the first Container to the application, and requires it to start the Application Master of the application in this Container. , the difference from YARN-Cluster is that the ApplicationMaster does not run the SparkContext, but only contacts the SparkContext for resource allocation.
(3) After the SparkContext in the Client is initialized, it establishes communication with the Application Master, registers with the Resource Manager, and reports to the Resource Manager based on the task information. Resource Manager applies for resources (Container)
(4) When the application master applies for resources, it communicates with the node manager and asks it to start the container
(5) After the Container starts, it registers with the sparkContext in the driver and applies for the task
(6) The application runs After completion, the Client's SparkContext applies to the ResourceManager to log out and closes itself.
  As can be seen from Figure 11 below: Comparison between Yarn-client and Yarn cluster modes, in Yarn-client (Driver on client), the Application Master only applies for resources from Yarn to the Executor, and then the client will communicate with the container to schedule the job. If the client is far away from the cluster, it is not recommended to use this method, but this method is beneficial to interactive operations.
Insert image description here
Spark can run in the form of a cluster, and available cluster management systems include Yarn, Mesos, etc. The core functions of the cluster manager are resource management and task scheduling. Take Yarn as an example. Yarn works in Master/Slave mode. The Resource Manager (RM) runs on the Master node and is responsible for managing the resources and resource allocation of the entire cluster. The Node Manager (NM) running on the Slave node is the working node that actually owns resources in the cluster. After we submit the job, the multiple tasks that make up the job will be scheduled to the corresponding Node Manager for execution. In addition, resources are abstracted in the form of Container on Node Manager. Container includes two resources: memory and CPU.
The following is a Spark task scheduling exchange deployment architecture diagram running on a Yarn cluster, including one Resource Manager node and three Node Manager nodes (two of which are Worker nodes and one Master node).

Insert image description here
As can be seen from the Spark task scheduling process diagram above:
1) The entire cluster is divided into Master nodes and Worker nodes, both of which exist on the Node Manager node. When the client submits a task, it is uniformly allocated by the Resource Manager. The node running the Driver program It is called the Master node, and the node that performs specific tasks is called the Worder node. Changes in resources on the Node Manager node need to be updated to the Resource Manager in a timely manner, as shown by the red two-way arrow in the figure.
2) The Master daemon resides on the Master node - the Driver program. The SparkContext object is created in the Driver program and is responsible for communicating with the ExecutorBackend process on each Worker node, managing tasks on the Worker node, and synchronizing task progress. In fact, the relationship between Node Managers in Yarn is equal, so the Driver program will be scheduled to any Node Manager node.
3) There will be one or more ExecutorBackend processes on each Worker node. Each process contains an Executor object, which holds a thread pool, and each thread pool can execute a task. The ExecutorBackend process is also responsible for communicating with the Driver program on the Master node and reporting task status.

1.3.1 Task running process under the cluster

The above process reflects the overall interaction between Resource Manager and Node Manager nodes, and the interaction between Master and Worker in Spark cluster mode. The specific running process of a task requires a more detailed decomposition. The decomposed running process is shown in the small print in the picture.
1) Users submit applications to the Yarn cluster through the bin/spark-submit deployment tool or bin/spark-class.
2) The Resource Manager of the Yarn cluster selects a Node Manager node for the submitted application and allocates the first container, and starts the SparkContext object on the container of the node.
3) The SparkContext object applies for resources from the Resource Manager of the Yarn cluster to run the Executor.
4) The Resource Manager of the Yarn cluster allocates a container to the SparkContext object. SparkContext communicates with the relevant Node Manager and starts the ExecutorBackend daemon on the obtained container. After the ExecutorBackend starts, it starts to register with the SparkContext and apply for Tasks.
5) SparkContext assigns Task to ExecutorBackend for execution.
6) ExecutorBackend starts executing the Task and reports the running status to SparkContext in a timely manner.
After the task is finished running, SparkContext returns the resources to the Node Manager and logs out.

1.4mesos cluster mode

Mesos is an open source distributed resource management framework under Apache. It originated from the University of California, Berkeley, and was later popularized and used by Twitter. A variety of distributed frameworks can be deployed on Mesos. The architecture diagram of Mesos is shown in Figure 12 below, where Framework refers to external computing frameworks, such as Hadoop, Mesos, etc. These computing frameworks can be connected to Mesos through registration so that Mesos can perform Unified management and resource allocation.

Insert image description here
Figure 12 Mesos general deployment diagram
  The framework running on Mesos consists of two parts: one is the scheduler, which obtains cluster resources by registering with the master. The other is the executor process running on the slave node, which can execute the tasks of the framework. The Master decides how many resources to provide for each framework, and the framework's scheduler selects the resources provided. When the framework agrees to the provided resources, it sends the tasks through the master to run on the slaves that provide the resources. The resource allocation diagram of Mesos is as shown in Figure 13.
Insert image description here
Figure 13 Mesos resource allocation diagram
(1) Slave1 reports to Master that 4 CPUs and 4 GB memory are available
(2) Master sends a Resource Offer to Framework1 to describe how many available resources Slave1 has
(3) FW Scheduler in FrameWork1 will reply Master, I have two Tasks that need to be run on Slave1. One Task requires <2 CPUs, 1 GB memory="">, and the other Task requires <1 CPU, 2 GB memory=""> (
4) Finally, Master Send these Tasks to Slave1. Then, Slave1 still has 1 CPU and 1 GB of memory that are not used, so the allocation module can provide these resources to Framework2.
  Spark can be deployed on mesos as one of the distributed frameworks. The deployment diagram is similar to the general framework deployment diagram of mesos, as follows Figure 14 will not be repeated here.
Insert image description here

1.5Differences between the three deployment modes of spark

Among these three deployment modes, standalone, as the distributed deployment mode that comes with Spark, is the simplest and most basic Spark application deployment mode, so I will not go into details here. Here is the difference between yarn and mesos:
(1) As far as the two frameworks themselves are concerned, the yarn framework can be deployed on mesos. Yarn is a more general deployment framework and the technology is more mature.
(2) The Mesos dual-layer scheduling mechanism can support multiple scheduling modes, while Yarn manages cluster resources through Resource Mananger and can only use one scheduling mode. Mesos's two-layer scheduling mechanism is: mesos can be connected to distributed deployment frameworks such as yarn, but Mesos requires that the accessible framework must have a scheduler module, which is responsible for task scheduling within the framework. When a framework wants to access Mesos, it needs to modify its scheduler to register with Mesos and obtain the resources allocated to it by Mesos. Then its scheduler will allocate these resources to tasks in the framework, that is Said that the entire Mesos system adopts a two-layer scheduling framework: on the first level, Mesos allocates resources to the framework; on the second level, the framework's own scheduler allocates resources to its own internal tasks.
(3) Mesos can implement coarse- and fine-grained resource scheduling and dynamically allocate resources, while Yarn can only implement static resource allocation. Coarse-grained and fine-grained scheduling are defined as follows:
  Coarse-grained Mode: Before running the program, you must apply for the various resources required (how many resources each executor occupies, and how many executors can be run internally). It cannot be changed during operation.
  Fine-grained Mode: In order to prevent resource waste, resources are allocated on demand. Like the coarse-grained mode, when the application starts, the executors will be started first, but the resources occupied by each executor are only the resources required for its own operation, and there is no need to consider the tasks to be run in the future. After that, Mesos will dynamically allocate to each executor. Every time resources are allocated, a new task can be run. After a single Task is run, the corresponding resources can be released immediately. Each Task will report status to the Mesos slave and Mesos Master to facilitate more fine-grained management and fault tolerance. This scheduling mode is similar to the MapReduce scheduling mode. Each Task is completely independent. The advantage is that it facilitates resource control and isolation, but the disadvantages are also obvious. , the running delay of short jobs is large.
  It can be seen from the difference between yarn and mesos that they each have their own advantages and disadvantages. Therefore, in actual use, which framework to choose depends on the actual needs of the company, and the existing big data ecological environment can be considered. For example, our company uses yarn to deploy spark. The reason is that our company already has a relatively mature hadoop framework. Considering the convenience of use, we adopt the yarn mode deployment.

1.6 Abnormal Scenario Analysis

The above description is the message distribution details of each node under normal circumstances. So if there is a problem with some nodes in the cluster during operation, can the entire cluster still process the tasks in the Application normally?

1.6.1 Exception Analysis 1: Worker exits abnormally

Insert image description here
During the running of Spark, a problem often encountered is that the worker exits abnormally. When the worker exits, what stories will happen to the entire cluster? Please see the detailed description below
1) The worker exits abnormally, for example, consciously kill the worker through the kill command. Worker killing
2) Before the worker exits, it will kill all the younger executors it controls.
3) The worker needs to regularly improve the heartbeat messages to the master. Now that the worker processes have finished, there is no heartbeat message, so the Master will During the timeout process, it was realized that one of the "sub-rudders" had left.
4) The Master was very sad. The sad Master reported the situation to the corresponding Driver.
5) The Driver confirmed that the Executor assigned to itself had unfortunately left through two aspects. One was that the Master sent The second notification is that the Driver did not receive the Executor’s StatusUpdate within the specified time, so the Driver will remove the registered Executor.

1.6.1.1 Consequence analysis

What are the impacts of abnormal worker exit?
1) The executor exit causes the submitted task to fail to end normally and will be submitted for execution again.
2) If all workers exit abnormally, the entire cluster will be unavailable.
3) Corresponding programs are required to restart. worker process, such as using supervisord or runit

1.6.1.2 Test steps

1) Start Master
2) Start worker
3) Start spark-shell
4) Manually kill the worker process
5) Use jps or ps -ef|grep -i java to view the starting java process

1.6.1.3 Code processing for abnormal exit

The start function defined in ExecutorRunner.scala

def start() {
    
    
  workerThread = new Thread("ExecutorRunner for " + fullId) {
    
    
    override def run() {
    
     fetchAndRunExecutor() }
  }
  workerThread.start()
  // Shutdown hook that kills actors on shutdown.
  shutdownHook = new Thread() {
    
    
    override def run() {
    
    
      killProcess(Some("Worker shutting down"))
    }
  }
  Runtime.getRuntime.addShutdownHook(shutdownHook)
}

The process of killProcess is to stop the process of the corresponding CoarseGrainedExecutorBackend.
When stopping a worker, you must first stop the Executor you started.

1.6.1.4 Summary

It should be pointed out in particular that when the worker starts the Executor, it does so through the ExecutorRunner. The ExecutorRunner is an independent thread and has a one-to-one relationship with the Executor, which is very important. Executor runs as an independent process, but will be closely monitored by ExecutorRunner.

1.6.2 Exception analysis 2: executor exits abnormally

Insert image description here
As the lowest-level employee in the Standalone cluster deployment mode, Executor will have what consequences if it exits abnormally?
The executor exited abnormally. The ExecutorRunner noticed the exception and reported the situation to the Master through ExecutorStateChanged.
After the Master received the notification, it was very unhappy. Even though there was a younger brother who wanted to run away, it was okay. It asked the worker to which the Executor belonged to start again.
The Worker received the LaunchExecutor command. , start the executor again

1.6.2.1 Test steps

1) Start Master
2) Start Worker
3) Start spark-shell
4) Manually kill CoarseGrainedExecutorBackend

1.6.2.2 fetchAndRunExecutor

fetchAndRunExecutor is responsible for starting a specific Executor and monitoring its running status. The specific code logic is as follows

def fetchAndRunExecutor() {
    
    
  try {
    
    
    // Create the executor's working directory
    val executorDir = new File(workDir, appId + "/" + execId)
    if (!executorDir.mkdirs()) {
    
    
      throw new IOException("Failed to create directory " + executorDir)
    }

    // Launch the process
    val command = getCommandSeq
    logInfo("Launch command: " + command.mkString("\"", "\" \"", "\""))
    val builder = new ProcessBuilder(command: _*).directory(executorDir)
    val env = builder.environment()
    for ((key, value)  {
    
    
      logInfo("Runner thread for executor " + fullId + " interrupted")
      state = ExecutorState.KILLED
      killProcess(None)
    }
    case e: Exception => {
    
    
      logError("Error running executor", e)
      state = ExecutorState.FAILED
      killProcess(Some(e.toString))
    }
  }
}

1.6.3 Exception Analysis 3: master exits abnormally

Insert image description here
We have already talked about the scenarios where workers and executors exit abnormally. We are left with the last situation. What should we do if the master dies?
What will be the consequences if the leading brother is no longer there?
1) The worker has no object to report to, that is, if the executor runs away again, the worker will not start the executor. The boss has not given instructions.
2) New tasks cannot be submitted to the cluster
. 3) Even if the old task ends, it will occupy The resources cannot be cleared because the resource clearing instruction is issued by the Master.

2. A peek into the operating principle of the wordcount program

2.1Spark’s scala implements wordcount

It is simpler to use scala in spark to implement wordcount (a statistical word occurrence model). It is more concise than java code, and its functional programming thinking logic is also more intuitive.

package com.spark.firstApp

import org.apache.spark.{
    
    SparkContext, SparkConf}

/**
  * Created by atguigu,scala实现wordcount
  */
object WordCount1 {
    
    
  def main(args: Array[String]) {
    
    
    if (args.length == 0) {
    
    
      System.err.println("Usage: WordCount1 <file1>")
      System.exit(1)
    }
    /**
      * 1、实例化SparkConf;
      * 2、构建SparkContext,SparkContext是spark应用程序的唯一入口
      * 3. 通过SparkContext的textFile方法读取文本文件
      */
    val conf = new SparkConf().setAppName("WordCount1").setMaster("local")
    val sc = new SparkContext(conf)

    /**
      * 4、通过flatMap对文本中每一行的单词进行拆分(分割符号为空格),并利用map进行函数转换形成(K,V)形式,再进行reduceByKey,打印输出10个结果
      *    函数式编程更加直观的反映思维逻辑
      */
    sc.textFile(args(0)).flatMap(_.split(" ")).map(x => (x, 1)).reduceByKey(_ + _).take(10).foreach(println)
    sc.stop()
  }
}

2.2 Principle

The main business logic of running the wordcount program in a spark cluster is relatively simple, covering the following three processes:
1) Reading text files on storage media (generally stored on HDFS);
2) Parsing the content of text files and performing word processing Group statistical summary;
3) Save the grouping results of process 2 to the storage medium. (Generally stored on HDFS or RMDB)
Although the business logic of wordcount is very simple, the running process of its application in spark cleverly reflects the core essence of spark - distributed elastic data set, memory iteration and functional programming. Features. The following figure analyzes the running process of wordcount in the spark cluster to deepen the understanding of the principles of spark technology.

Insert image description here
The figure is horizontally divided and the scala core program implementation of wordcount is given below. The running process of this program in the spark cluster involves several core RDDs, mainly textFileRDD, flatMapRDD, mapToPairRDD, shuffleRDD (reduceByKey), etc.
The application reads text files on hdfs through the textFile method. The data is fragmented and loaded into different physical nodes using RDD as a unified mode, as shown in the figure above, node 1, node 2 to node n; and through a A series of data conversions, such as using flatMap to split the corresponding rows of data in the text file (words in the text file are separated by spaces) to form a new data set RDD with each word as the core; then continue the conversion through MapRDD The (K, V) data form is formed for further use of the reduceByKey method. This method will trigger the shuffle behavior, prompting different words to be aggregated and counted on the corresponding nodes (actually, the data will be shuffled locally before the data is shuffled at the boasting node. The same words are merged and accumulated) to form the statistical result of wordcount; finally, the data is saved to HDFS through the saveAsTextFile method. The specific operating logic principle and process are given in the figure above.

Guess you like

Origin blog.csdn.net/qq_44696532/article/details/135402631