Spark two kinds submission Yarn-client and Yarn-cluster

Spark supports three cluster deployment (Standalone, Mesos, Yarn), in which the Master Service (Spark Standalone, Mesos Master, Yarn ResourceManager) decide which applications can run, run on that node, and when to run. Slave Service (Yarn NodeManager) running on each node, node controls the Executor process, while monitoring operational status of the job as well as consumption of resources. Spark running on Yarn, there are two modes, Yarn-Client and Yarn-Cluster. Typically, Yarn-Cluster for the production environment, Yarn-Client for interaction and commissioning.

1.Appliaction Master

In Yarn, each application has an Application Master process, it is the first container Appliaction start, which is responsible for the application of resources from the ResourceManager, resource allocation, and notice NodeManager to launch container for the Application, Application Master avoiding the need for a client activity to maintain, start Applicatin the client can withdraw at any time, and continue to run the process Yarn managed in the cluster.
When you run Spark jobs in Yarn, Yarn each Spark Executor as a container (container) running simultaneously support multiple tasks running in the same container, save the start time of the task.

2.Yarn-client

Yarn-in client mode, AM only apply to the resources allocated from Yarn Executor, then client will conduct job scheduling with the container (Container) communication. Client can not leave as shown below:

execution flow:

  • The client submit a job to the ResourceManager (RM)
  • RM locally the NodeManager (NM) allocated to start the container and AM NM
  • NM received RM is allocated, initialization and start Application Master job, then this is called Driver NM
  • Application to apply for funding for RM, notify the other NodeManager start the appropriate allocation of resources Executor
  • Executor registered to report AM local start and complete the corresponding task

    3.Yarn-Cluster

    In Yarn-cluster mode, driver runs on AM, AM processes simultaneously responsible for driving the Application and apply for funding from the Yarn, the process runs in the Yarn container, so start AM's client can be shut down immediately without continuing to Application Life Cycle , as shown below:

    the flow of execution:
  • The client generates job information submitted to the ResourceManager (RM)
  • In the NodeManager one RM (determined by the Yarn) start container and Application Master (AM) allocated to the NodeManager (NM)
  • NM received RM is allocated, initialization and start Application Master job, then this is called Driver NM
  • Application to apply for funding for RM, notify the other NodeManager start the appropriate allocation of resources Executor
  • Executor to the registered report on the NM Application Master and accomplish the task

Original: Big Box  Spark two kinds submission Yarn-client and Yarn-cluster


Guess you like

Origin www.cnblogs.com/petewell/p/11615101.html