[Big data] Spark On Yarn

Spark has a yarn-cluster and yarn-client YARN in the two modes of operation:

I. Yarn Cluster

Spark Driver first as a ApplicationMaster start YARN cluster, the client submitted to the ResourceManager each job is assigned a unique ApplicationMaster on worker nodes in the cluster, the ApplicationMaster application by the management of the entire life cycle. Because the program runs in Driver YARN in advance so do not start Spark Master / Client, the application can not run results in the client display (can be viewed in the history server), so it is best to save the results in HDFS instead of stdout output, the client YARN terminal display is as a simple job of operating conditions.

 II. Yarn client

In the yarn-client mode, Driver running on the Client, access to resources to RM by ApplicationMaster. Local Driver is responsible for interacting with all the executor container, and a summary of the final results. Off the end of the terminal, equivalent to kill off the spark applications. In general, if you need to configure the operating results only to return to the terminal.

Guess you like

Origin www.cnblogs.com/zeze/p/11531857.html