Spark Core Overview

1, Spark core components


1.1  Cluster Manager(Master,ResourceManager)

Spark cluster manager, primarily responsible for the allocation and management of the entire cluster resources Cluster Manager

  • In Yarn under deployment mode ResourceManager
  • In Mesos under deployment mode Mesos Master
  • In Standalone under deployment mode for the Master.

Cluster Manager resource allocation belongs to an allocation of memory on the specifics of each Worker, CPU and other resources allocated to the Application , but is not responsible for the allocation of resources Executor's.

1.2  Worker(worker,NodeManager)

Spark's work node . In Yarn by the actual deployment mode NodeManager alternative. The main responsible for the following:

  • The own memory, CPU and other resources through a registration mechanism to inform Cluster Manager
  • Creating Executor
  • The further allocation of resources and tasks to Executor
  • Synchronization resource information, Executor status and other information to ClusterManager

1.3  Driver

Spark drive node, the main method used to perform tasks in Spark responsible for the execution of the actual code Driver is responsible for job execution in the Spark:

  • The user program into job (the Job)
  •  Between the Executor scheduling tasks (Task)
  •  Tracking the implementation of Executor
  •  Through the UI display query operation

1.4  Executor

Spark Executor node is responsible for running specific tasks in Spark job , tasks are independent from each other. Spark When the application starts, Executor nodes are started simultaneously, and always accompanied by the entire life cycle of the Spark application exists. If there Executor node has failed or crashed, Spark application can continue to execute, task scheduling mistakes will be to continue on the node running on a different node Executor. Executor has two core functions:

  • Spark is responsible for running the composition of the task applications, and returns the results to the driver (Driver)
  • They provide memory storage through its own block manager (Block Manager) program requires the user cache RDD. RDD cache is directly in the process of Executor, the task can take advantage of caching data at run time accelerated computing.

1.5  Application

User API Spark provide written applications

  • Application will be constructed and converted by the DAG RDD Spark API, Application and registered by the Driver Cluster Manager.
  • Cluster Manager Application will be based on resource requirements, and pass through a distribution will allocate Executor, memory, CPU and other resources to the Application.
  • Driver Executor and other resources through the secondary allocation will be allocated to each task, the Application Executor finally told to run the task by Driver

2, Spark general overview of running processes


Spark common operating procedures, no matter in what Spark deployment mode, are working in the following core steps:

  1. After job submission, it will first start Driver program
  2. Subsequently Driver registration application to the cluster manager
  3. Cluster Manager assigned Executor according to the task of configuration files and start the application
  4. When all the resources required to meet Driver, Driver main function begins execution, to convert the Spark lazy execution, when the execution start Action Operator neutrons reverse projections, divided according to Stage width dependency, then each corresponding to a Stage taskset, TaskSet have multiple Task
  5. According to the principle of localization, Task will be distributed to the designated Executor to perform, during task execution, Executor will continue to communicate, report task operation and Driver

Guess you like

Origin www.cnblogs.com/hyunbar/p/12079449.html