Workflow of Spark Client and Cluster Operating Modes

1.client mode:  In  client mode, the driver is launched in the same process as the client that submits the application.. That is to say, in Client mode, the Driver process will be launched on the current client, and the client process will exist until the application Run ends.

The work flow chart in this mode is mainly as follows:


The workflow is as follows:

           1. Start the master and worker. The worker is responsible for resource management of the entire cluster, and the worker is responsible for monitoring its own cpu, memory information and reporting to the master regularly

           2. Start the Driver process in the client and register with the master

           3. The master communicates with the worker through rpc and informs the worker to start one or more executor processes

           4. The executor process registers with the Driver and informs the Driver of its own information, including the host of the node where it is located, etc.

           5. Driver divides the job into stages, further divides the stages, encapsulates all operations in a pipeline into a task, and sends it to the executor registered with itself

              Executed in the task thread in the process

           6. The application execution is completed and the Driver process exits


2.cluster mode: In  cluster mode, however, the driver is launched from one of the Worker processes inside the cluster, and the client process exits as soon as it fulfills its responsibility of submitting the application without waiting for the application to finish. That is Say, in cluster mode, the Driver process will be started in a worker in the cluster, and the client process can exit after completing its responsibility to submit tasks without waiting for the application to finish executing.

The workflow in this mode is as follows:


The workflow is as follows:

            1. In the nodes of the cluster, start the master and worker processes. After the worker process is successfully started, it will register with the Master.

            2. After the client submits the task, ActorSelection (the master's actor reference), and then sends a registration Driver request (RequestSubmitDriver) to the Master through ActorSelection

            3. After the client submits the task, the master notifies the worker node to start the driver process. (The choice of worker is arbitrary, as long as the worker has enough resources)

               After the driver process is successfully started, it will return the registration success information to the Master

            4. The master notifies the worker to start the executor process

            5. After the successful startup, the executor process registers with the driver

            6. Driver divides the job into stages, further divides the stages, encapsulates all operations in a pipeline into a task, and sends it to the executor registered with itself

              Executed in the task thread in the process

            7. After all tasks are executed, the program ends.


From the above description, we know that Mater is responsible for resource management of the entire cluster and the creation of workers. Worker is responsible for the management of resources of the current node, and regularly informs the master of the current cpu, memory and other information, and is responsible for creating the Executor process (also The minimum resource allocation unit), the Driver is responsible for the job division and stage cutting of the entire application task, as well as the task cutting and optimization, and is responsible for distributing the task to the task thread in the executor process of the node corresponding to the worker for execution, and To get the execution result of the task, the Driver gets in touch with the spark cluster through the SparkContext object, gets the master host host, and can register itself with the master through rpc.



Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325942055&siteId=291194637