Spark on yarn execution process and log analysis

Submit command

${SPARK_HOME}/bin/spark-submit --class org.apache.spark.examples.SparkPi \
    --master yarn \
    --deploy-mode cluster \
    --driver-memory 4g \
    --executor-memory 1g \
    --executor-cores 4 \
    --queue default \
    ${SPARK_HOME}/examples/jars/spark-examples*.jar \
    10

Implementation process

  1. The client executes spark-submit to submit the application, registers with the resourceManager and applies for resources.

  2. After resourceManger receives the request, it selects a nodeManager in the cluster, allocates the first container to the application, and creates the application master in it. There is a driver in the application master and starts executing the driver (actually parsing the program written by the user)

  3. driver :

    (1) The driver will run the main method of the application.

    (2) The sparkContext object is constructed in the main method. This object is very important. It is the entrance to all spark programs. Inside the sparkContext object, two objects DAGScheduler and TaskScheduler are also constructed.

    (3) The program involves a large number of RDD conversion operations, and finally a given action triggers the actual execution. At this time, a DAG directed acyclic graph will be generated based on the relationship of rdd in the code. The direction of the graph is the order of operator operations of rdd, and finally the directed acyclic graph is sent to the DAGScheduler object.

    (4) After DAGScheduler obtains the directed acyclic graph, it divides many stages according to wide dependencies. Each stage has many tasks that can be run in parallel and divides these tasks into a taskSet collection. Finally, it divides each stage one by one. The taskSet collection is sent to the TaskScheduler object.

    (5) After TaskScheduler receives many taskSets, it executes the tasks in it according to the dependencies of the stage. When executing each taskSet, TaskSchduler traverses the tasksSet and submits each task to the executor for execution in turn.

    The driver only disassembles the tasks, and the actual execution is in the yarn container.

  4. The application master registers with the resourceManager, so that the execution status of the task can be seen through RM. At the same time, AM applies for resources for each task and monitors the completion of task execution.

  5. After AM applies for the resource (container), it will communicate with NM and let NM start CoarseGrainedExecutorBackend in the obtained container. When CoarseGrainedExecutorBackend starts, it will register with the sparkContext in AM and apply for a task.

  6. The sparkContext in AM assigns the task to CoarseGrainedExecutorBackend. When executing the task, CoarseGrainedExecutorBackend reports the progress and status of the task to AM, so that AM can keep track of the execution of the task at any time, so that it can make a second attempt when the task execution fails or when the cluster resources are tight. When the task is killed.

  7. When the task is completed, AM sends a request to RM to log out.


execution log

22/11/19 17:42:18 WARN util.Utils: Your hostname, macdeMacBook-Pro-3.local resolves to a loopback address: 127.0.0.1; using 10.10.9.250 instead (on interface en0)
22/11/19 17:42:18 WARN util.Utils: Set SPARK_LOCAL_IP if you need to bind to another address
22/11/19 17:42:18 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
22/11/19 17:42:19 INFO client.RMProxy: Connecting to ResourceManager at sh01/172.16.99.214:8010
22/11/19 17:42:19 INFO yarn.Client: Requesting a new application from cluster with 2 NodeManagers
22/11/19 17:42:19 INFO yarn.Client: Verifying our application has not requested more than the maximum memory capability of the cluster (8192 MB per container)
22/11/19 17:42:19 INFO yarn.Client: Will allocate AM container, with 4505 MB memory including 409 MB overhead
22/11/19 17:42:19 INFO yarn.Client: Setting up container launch context for our AM
22/11/19 17:42:19 INFO yarn.Client: Setting up the launch environment for our AM container
22/11/19 17:42:19 INFO yarn.Client: Preparing resources for our AM container
22/11/19 17:42:20 WARN yarn.Client: Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME.
22/11/19 17:42:23 INFO yarn.Client: Uploading resource file:/usr/local/spark-2.4.8-bin-hadoop2.7/tmp/spark-b423d166-c45e-429a-b25a-3efde9c1145c/__spark_libs__2899998199838240455.zip -> hdfs://sh01:9000/user/mac/.sparkStaging/application_1666603193487_2205/__spark_libs__2899998199838240455.zip
22/11/19 17:45:52 INFO yarn.Client: Uploading resource file:/usr/local/spark/examples/jars/spark-examples_2.11-2.4.8.jar -> hdfs://sh01:9000/user/mac/.sparkStaging/application_1666603193487_2205/spark-examples_2.11-2.4.8.jar
22/11/19 17:45:54 INFO yarn.Client: Uploading resource file:/usr/local/spark-2.4.8-bin-hadoop2.7/tmp/spark-b423d166-c45e-429a-b25a-3efde9c1145c/__spark_conf__8349177025085739013.zip -> hdfs://sh01:9000/user/mac/.sparkStaging/application_1666603193487_2205/__spark_conf__.zip
22/11/19 17:45:56 INFO spark.SecurityManager: Changing view acls to: mac
22/11/19 17:45:56 INFO spark.SecurityManager: Changing modify acls to: mac
22/11/19 17:45:56 INFO spark.SecurityManager: Changing view acls groups to:
22/11/19 17:45:56 INFO spark.SecurityManager: Changing modify acls groups to:
22/11/19 17:45:56 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(mac); groups with view permissions: Set(); users  with modify permissions: Set(mac); groups with modify permissions: Set()
22/11/19 17:45:57 INFO yarn.Client: Submitting application application_1666603193487_2205 to ResourceManager
22/11/19 17:45:57 INFO impl.YarnClientImpl: Submitted application application_1666603193487_2205
22/11/19 17:45:58 INFO yarn.Client: Application report for application_1666603193487_2205 (state: ACCEPTED)
22/11/19 17:45:58 INFO yarn.Client:
	 client token: N/A
	 diagnostics: N/A
	 ApplicationMaster host: N/A
	 ApplicationMaster RPC port: -1
	 queue: default
	 start time: 1668851157430
	 final status: UNDEFINED
	 tracking URL: http://sh01:8012/proxy/application_1666603193487_2205/
	 user: mac
22/11/19 17:45:59 INFO yarn.Client: Application report for application_1666603193487_2205 (state: ACCEPTED)
22/11/19 17:46:00 INFO yarn.Client: Application report for application_1666603193487_2205 (state: ACCEPTED)
22/11/19 17:46:01 INFO yarn.Client: Application report for application_1666603193487_2205 (state: ACCEPTED)
22/11/19 17:46:02 INFO yarn.Client: Application report for application_1666603193487_2205 (state: ACCEPTED)
22/11/19 17:46:03 INFO yarn.Client: Application report for application_1666603193487_2205 (state: RUNNING)
22/11/19 17:46:03 INFO yarn.Client:
	 client token: N/A
	 diagnostics: N/A
	 ApplicationMaster host: sh02
	 ApplicationMaster RPC port: 46195
	 queue: default
	 start time: 1668851157430
	 final status: UNDEFINED
	 tracking URL: http://sh01:8012/proxy/application_1666603193487_2205/
	 user: mac 
22/11/19 17:46:04 INFO yarn.Client: Application report for application_1666603193487_2205 (state: RUNNING)
22/11/19 17:46:05 INFO yarn.Client: Application report for application_1666603193487_2205 (state: RUNNING)
22/11/19 17:46:06 INFO yarn.Client: Application report for application_1666603193487_2205 (state: RUNNING)
22/11/19 17:46:07 INFO yarn.Client: Application report for application_1666603193487_2205 (state: RUNNING)
22/11/19 17:46:08 INFO yarn.Client: Application report for application_1666603193487_2205 (state: RUNNING)
22/11/19 17:46:09 INFO yarn.Client: Application report for application_1666603193487_2205 (state: RUNNING)
22/11/19 17:46:10 INFO yarn.Client: Application report for application_1666603193487_2205 (state: RUNNING)
22/11/19 17:46:11 INFO yarn.Client: Application report for application_1666603193487_2205 (state: FINISHED)
22/11/19 17:46:11 INFO yarn.Client:
	 client token: N/A
	 diagnostics: N/A
	 ApplicationMaster host: sh02
	 ApplicationMaster RPC port: 46195
	 queue: default
	 start time: 1668851157430
	 final status: SUCCEEDED
	 tracking URL: http://sh01:8012/proxy/application_1666603193487_2205/
	 user: mac
22/11/19 17:46:12 INFO yarn.Client: Deleted staging directory hdfs://sh01:9000/user/mac/.sparkStaging/application_1666603193487_2205
22/11/19 17:46:12 INFO util.ShutdownHookManager: Shutdown hook called
22/11/19 17:46:12 INFO util.ShutdownHookManager: Deleting directory /private/var/folders/pc/mj2v_vln4x14q6jylbtnmvx40000gn/T/spark-b39d7673-82ac-471c-8f8a-f667b8b081f2
22/11/19 17:46:12 INFO util.ShutdownHookManager: Deleting directory /usr/local/spark-2.4.8-bin-hadoop2.7/tmp/spark-b423d166-c45e-429a-b25a-3efde9c1145c

4-6: Connect to ResourceManager, apply for a new application on a cluster composed of two NodeManagers, and verify that the memory resources applied for by the new application do not exceed the maximum memory resources of the cluster. Each container of the cluster holds about 8G of memory.

8-14: Allocate a container of 4505 M size to the application master. What does "including 409 MB overhead" mean? As mentioned above, AM contains a driver. When we submitted the task, we applied for 4G (4096MB) of memory for the driver, 4505 - 4096 = 409. RM allocated more memory on the basis of the application. As for why it was allocated more memory, let’s first Let’s stop here without going into details. The next step is to build an environment for the AM container, prepare resources, package the local spark-dependent libraries (I checked it is 244 M), the application jar package, and the spark configuration file and upload them to the HDFS directory, waiting for the application to execute hdfs://sh01:9000/user/mac/.sparkStaging/application_1666603193487_2205. This directory will be deleted after completion.

15-19: Security verification.

20-21: Submit the application to RM. Note here that the name of the application is: application_1666603193487_2205, which is the same as the name of the directory when uploading resources to HDFS. Guess it should be the application submitted by AM.

22-36: AM applies for a container (resource) from RM to execute the task, so the status mark is ACCEPTED. Why do you say that? Because sometimes it is found that ACCEPTED will last for a long time, and there are tasks running on the cluster at that time, and there are no extra resources, so it is inferred that AM is applying for resources for the task at this time.

37-55: The task starts executing. From line 41, you can see that the AM container is assigned to the sh02 machine. (sh01: RM, sh02:NM, sh03:NM)

56-69: The task is completed, the cache on HDFS is deleted, and the local cache is deleted. The directory names are correct.


What do client and driver refer to?

  • Client: The place where the spark-submit command is executed is called the client.

  • driver: The program submitted by the user runs as the driver.

Where is the driver? First imagine a few servers

服务器  角色
sh01  resourceManger
sh02  nodeManger
sh03  nodeManger
sh04  拥有大数据集群的配置

Submit the task at sh04

  • yarn-cluster mode: It is the scenario described above. The driver is not on the client but in the AM of sh02. During the task execution process, the information transmission between the task container and AM (the driver in AM), and the information transmission between AM and RM have nothing to do with the client. The client only gets the data sent from stdout. Even if the client is gone, the task can still be executed.

  • yarn-client mode: The driver exists on the client, and tasks cannot run without the client. (As for other information, the log is at the end, you can analyze it yourself)

In actual development, sh01, sh02, and sh03 are usually big data clusters, and sh04 is probably just a program for segment submission tasks.


What is the relationship between yarn's container and executor?

In yarn cluster, both executor and application master must run in "container". The container here does not refer to docker. It represents the storage resources and computing resources on the physical machine. These resources are supervised by NM and scheduled by RM. The resource allocation of the yarn cluster is in the unit of container. Exector and application master are both processes, and they can only be executed after they are allocated resources.


yarn-client log

${SPARK_HOME}/bin/spark-submit --class org.apache.spark.examples.SparkPi \
    --master yarn \
    --deploy-mode client \
    --driver-memory 4g \
    --executor-memory 1g \
    --executor-cores 4 \
    --queue default \
    ${SPARK_HOME}/examples/jars/spark-examples*.jar \
    10
22/11/19 18:33:36 WARN util.Utils: Your hostname, macdeMacBook-Pro-3.local resolves to a loopback address: 127.0.0.1; using 10.10.9.250 instead (on interface en0)
22/11/19 18:33:36 WARN util.Utils: Set SPARK_LOCAL_IP if you need to bind to another address
22/11/19 18:33:36 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
22/11/19 18:33:36 INFO spark.SparkContext: Running Spark version 2.4.8
22/11/19 18:33:36 INFO spark.SparkContext: Submitted application: Spark Pi
22/11/19 18:33:36 INFO spark.SecurityManager: Changing view acls to: mac
22/11/19 18:33:36 INFO spark.SecurityManager: Changing modify acls to: mac
22/11/19 18:33:36 INFO spark.SecurityManager: Changing view acls groups to:
22/11/19 18:33:36 INFO spark.SecurityManager: Changing modify acls groups to:
22/11/19 18:33:36 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(mac); groups with view permissions: Set(); users  with modify permissions: Set(mac); groups with modify permissions: Set()
22/11/19 18:33:37 INFO util.Utils: Successfully started service 'sparkDriver' on port 53336.
22/11/19 18:33:37 INFO spark.SparkEnv: Registering MapOutputTracker
22/11/19 18:33:37 INFO spark.SparkEnv: Registering BlockManagerMaster
22/11/19 18:33:37 INFO storage.BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
22/11/19 18:33:37 INFO storage.BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
22/11/19 18:33:37 INFO storage.DiskBlockManager: Created local directory at /usr/local/spark-2.4.8-bin-hadoop2.7/tmp/blockmgr-ea23e012-50a5-4ad2-a2c0-cf40ea020a9e
22/11/19 18:33:37 INFO memory.MemoryStore: MemoryStore started with capacity 2004.6 MB
22/11/19 18:33:37 INFO spark.SparkEnv: Registering OutputCommitCoordinator
22/11/19 18:33:37 INFO util.log: Logging initialized @2435ms to org.spark_project.jetty.util.log.Slf4jLog
22/11/19 18:33:37 INFO server.Server: jetty-9.4.z-SNAPSHOT; built: unknown; git: unknown; jvm 1.8.0_333-b02
22/11/19 18:33:37 INFO server.Server: Started @2564ms
22/11/19 18:33:37 INFO server.AbstractConnector: Started ServerConnector@62b3df3a{
    
    HTTP/1.1, (http/1.1)}{
    
    0.0.0.0:4040}
22/11/19 18:33:37 INFO util.Utils: Successfully started service 'SparkUI' on port 4040.
22/11/19 18:33:37 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@169da7f2{
    
    /jobs,null,AVAILABLE,@Spark}
22/11/19 18:33:37 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@757f675c{
    
    /jobs/json,null,AVAILABLE,@Spark}
22/11/19 18:33:37 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@2617f816{
    
    /jobs/job,null,AVAILABLE,@Spark}
22/11/19 18:33:37 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@5d10455d{
    
    /jobs/job/json,null,AVAILABLE,@Spark}
22/11/19 18:33:37 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@535b8c24{
    
    /stages,null,AVAILABLE,@Spark}
22/11/19 18:33:37 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@4a951911{
    
    /stages/json,null,AVAILABLE,@Spark}
22/11/19 18:33:37 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@55b62629{
    
    /stages/stage,null,AVAILABLE,@Spark}
22/11/19 18:33:37 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@6759f091{
    
    /stages/stage/json,null,AVAILABLE,@Spark}
22/11/19 18:33:37 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@33a053d{
    
    /stages/pool,null,AVAILABLE,@Spark}
22/11/19 18:33:37 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@14a54ef6{
    
    /stages/pool/json,null,AVAILABLE,@Spark}
22/11/19 18:33:37 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@20921b9b{
    
    /storage,null,AVAILABLE,@Spark}
22/11/19 18:33:37 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@867ba60{
    
    /storage/json,null,AVAILABLE,@Spark}
22/11/19 18:33:37 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@5ba745bc{
    
    /storage/rdd,null,AVAILABLE,@Spark}
22/11/19 18:33:37 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@654b72c0{
    
    /storage/rdd/json,null,AVAILABLE,@Spark}
22/11/19 18:33:37 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@55b5e331{
    
    /environment,null,AVAILABLE,@Spark}
22/11/19 18:33:37 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@6034e75d{
    
    /environment/json,null,AVAILABLE,@Spark}
22/11/19 18:33:37 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@15fc442{
    
    /executors,null,AVAILABLE,@Spark}
22/11/19 18:33:37 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@3f3c7bdb{
    
    /executors/json,null,AVAILABLE,@Spark}
22/11/19 18:33:37 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@456abb66{
    
    /executors/threadDump,null,AVAILABLE,@Spark}
22/11/19 18:33:37 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@2a3a299{
    
    /executors/threadDump/json,null,AVAILABLE,@Spark}
22/11/19 18:33:37 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@7da10b5b{
    
    /static,null,AVAILABLE,@Spark}
22/11/19 18:33:37 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@1da6ee17{
    
    /,null,AVAILABLE,@Spark}
22/11/19 18:33:37 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@78d39a69{
    
    /api,null,AVAILABLE,@Spark}
22/11/19 18:33:37 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@15f193b8{
    
    /jobs/job/kill,null,AVAILABLE,@Spark}
22/11/19 18:33:37 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@2516fc68{
    
    /stages/stage/kill,null,AVAILABLE,@Spark}
22/11/19 18:33:37 INFO ui.SparkUI: Bound SparkUI to 0.0.0.0, and started at http://10.10.9.250:4040
22/11/19 18:33:37 INFO spark.SparkContext: Added JAR file:/usr/local/spark/examples/jars/spark-examples_2.11-2.4.8.jar at spark://10.10.9.250:53336/jars/spark-examples_2.11-2.4.8.jar with timestamp 1668854017716
22/11/19 18:33:38 INFO client.RMProxy: Connecting to ResourceManager at sh01/172.16.99.214:8010
22/11/19 18:33:38 INFO yarn.Client: Requesting a new application from cluster with 2 NodeManagers
22/11/19 18:33:38 INFO yarn.Client: Verifying our application has not requested more than the maximum memory capability of the cluster (8192 MB per container)
22/11/19 18:33:38 INFO yarn.Client: Will allocate AM container, with 896 MB memory including 384 MB overhead
22/11/19 18:33:38 INFO yarn.Client: Setting up container launch context for our AM
22/11/19 18:33:38 INFO yarn.Client: Setting up the launch environment for our AM container
22/11/19 18:33:38 INFO yarn.Client: Preparing resources for our AM container
22/11/19 18:33:39 WARN yarn.Client: Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME.
22/11/19 18:33:42 INFO yarn.Client: Uploading resource file:/usr/local/spark-2.4.8-bin-hadoop2.7/tmp/spark-7ecf7a1c-87e6-4f76-8e50-cd1682762c25/__spark_libs__7614795133133378512.zip -> hdfs://sh01:9000/user/mac/.sparkStaging/application_1666603193487_2206/__spark_libs__7614795133133378512.zip
22/11/19 18:37:46 INFO yarn.Client: Uploading resource file:/usr/local/spark-2.4.8-bin-hadoop2.7/tmp/spark-7ecf7a1c-87e6-4f76-8e50-cd1682762c25/__spark_conf__885526568489264491.zip -> hdfs://sh01:9000/user/mac/.sparkStaging/application_1666603193487_2206/__spark_conf__.zip
22/11/19 18:37:48 INFO spark.SecurityManager: Changing view acls to: mac
22/11/19 18:37:48 INFO spark.SecurityManager: Changing modify acls to: mac
22/11/19 18:37:48 INFO spark.SecurityManager: Changing view acls groups to:
22/11/19 18:37:48 INFO spark.SecurityManager: Changing modify acls groups to:
22/11/19 18:37:48 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(mac); groups with view permissions: Set(); users  with modify permissions: Set(mac); groups with modify permissions: Set()
22/11/19 18:37:49 INFO yarn.Client: Submitting application application_1666603193487_2206 to ResourceManager
22/11/19 18:37:50 INFO impl.YarnClientImpl: Submitted application application_1666603193487_2206
22/11/19 18:37:50 INFO cluster.SchedulerExtensionServices: Starting Yarn extension services with app application_1666603193487_2206 and attemptId None
22/11/19 18:37:51 INFO yarn.Client: Application report for application_1666603193487_2206 (state: ACCEPTED)
22/11/19 18:37:51 INFO yarn.Client:
	 client token: N/A
	 diagnostics: N/A
	 ApplicationMaster host: N/A
	 ApplicationMaster RPC port: -1
	 queue: default
	 start time: 1668854270205
	 final status: UNDEFINED
	 tracking URL: http://sh01:8012/proxy/application_1666603193487_2206/
	 user: mac
22/11/19 18:37:52 INFO yarn.Client: Application report for application_1666603193487_2206 (state: ACCEPTED)
22/11/19 18:37:53 INFO yarn.Client: Application report for application_1666603193487_2206 (state: ACCEPTED)
22/11/19 18:37:54 INFO yarn.Client: Application report for application_1666603193487_2206 (state: ACCEPTED)
22/11/19 18:37:55 INFO yarn.Client: Application report for application_1666603193487_2206 (state: ACCEPTED)
22/11/19 18:37:55 INFO cluster.YarnClientSchedulerBackend: Add WebUI Filter. org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter, Map(PROXY_HOSTS -> sh01, PROXY_URI_BASES -> http://sh01:8012/proxy/application_1666603193487_2206), /proxy/application_1666603193487_2206
22/11/19 18:37:55 INFO cluster.YarnSchedulerBackend$YarnSchedulerEndpoint: ApplicationMaster registered as NettyRpcEndpointRef(spark-client://YarnAM)
22/11/19 18:37:56 INFO yarn.Client: Application report for application_1666603193487_2206 (state: RUNNING)
22/11/19 18:37:56 INFO yarn.Client:
	 client token: N/A
	 diagnostics: N/A
	 ApplicationMaster host: 172.16.99.116
	 ApplicationMaster RPC port: -1
	 queue: default
	 start time: 1668854270205
	 final status: UNDEFINED
	 tracking URL: http://sh01:8012/proxy/application_1666603193487_2206/
	 user: mac
22/11/19 18:37:56 INFO cluster.YarnClientSchedulerBackend: Application application_1666603193487_2206 has started running.
22/11/19 18:37:56 INFO util.Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 54084.
22/11/19 18:37:56 INFO netty.NettyBlockTransferService: Server created on 10.10.9.250:54084
22/11/19 18:37:56 INFO storage.BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
22/11/19 18:37:56 INFO storage.BlockManagerMaster: Registering BlockManager BlockManagerId(driver, 10.10.9.250, 54084, None)
22/11/19 18:37:56 INFO storage.BlockManagerMasterEndpoint: Registering block manager 10.10.9.250:54084 with 2004.6 MB RAM, BlockManagerId(driver, 10.10.9.250, 54084, None)
22/11/19 18:37:56 INFO storage.BlockManagerMaster: Registered BlockManager BlockManagerId(driver, 10.10.9.250, 54084, None)
22/11/19 18:37:56 INFO storage.BlockManager: Initialized BlockManager: BlockManagerId(driver, 10.10.9.250, 54084, None)
22/11/19 18:37:56 INFO ui.JettyUtils: Adding filter org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter to /metrics/json.
22/11/19 18:37:56 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@238291d4{
    
    /metrics/json,null,AVAILABLE,@Spark}
22/11/19 18:37:56 INFO cluster.YarnClientSchedulerBackend: SchedulerBackend is ready for scheduling beginning after waiting maxRegisteredResourcesWaitingTime: 30000(ms)
22/11/19 18:37:57 INFO spark.SparkContext: Starting job: reduce at SparkPi.scala:38
22/11/19 18:37:57 INFO scheduler.DAGScheduler: Got job 0 (reduce at SparkPi.scala:38) with 10 output partitions
22/11/19 18:37:57 INFO scheduler.DAGScheduler: Final stage: ResultStage 0 (reduce at SparkPi.scala:38)
22/11/19 18:37:57 INFO scheduler.DAGScheduler: Parents of final stage: List()
22/11/19 18:37:57 INFO scheduler.DAGScheduler: Missing parents: List()
22/11/19 18:37:57 INFO scheduler.DAGScheduler: Submitting ResultStage 0 (MapPartitionsRDD[1] at map at SparkPi.scala:34), which has no missing parents
22/11/19 18:37:57 INFO memory.MemoryStore: Block broadcast_0 stored as values in memory (estimated size 2.0 KB, free 2004.6 MB)
22/11/19 18:37:58 INFO memory.MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 1358.0 B, free 2004.6 MB)
22/11/19 18:37:58 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on 10.10.9.250:54084 (size: 1358.0 B, free: 2004.6 MB)
22/11/19 18:37:58 INFO spark.SparkContext: Created broadcast 0 from broadcast at DAGScheduler.scala:1184
22/11/19 18:37:58 INFO scheduler.DAGScheduler: Submitting 10 missing tasks from ResultStage 0 (MapPartitionsRDD[1] at map at SparkPi.scala:34) (first 15 tasks are for partitions Vector(0, 1, 2, 3, 4, 5, 6, 7, 8, 9))
22/11/19 18:37:58 INFO cluster.YarnScheduler: Adding task set 0.0 with 10 tasks
22/11/19 18:37:59 INFO cluster.YarnSchedulerBackend$YarnDriverEndpoint: Registered executor NettyRpcEndpointRef(spark-client://Executor) (172.16.99.116:48068) with ID 2
22/11/19 18:37:59 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, sh02, executor 2, partition 0, PROCESS_LOCAL, 7741 bytes)
22/11/19 18:37:59 INFO scheduler.TaskSetManager: Starting task 1.0 in stage 0.0 (TID 1, sh02, executor 2, partition 1, PROCESS_LOCAL, 7743 bytes)
22/11/19 18:37:59 INFO scheduler.TaskSetManager: Starting task 2.0 in stage 0.0 (TID 2, sh02, executor 2, partition 2, PROCESS_LOCAL, 7743 bytes)
22/11/19 18:37:59 INFO scheduler.TaskSetManager: Starting task 3.0 in stage 0.0 (TID 3, sh02, executor 2, partition 3, PROCESS_LOCAL, 7743 bytes)
22/11/19 18:38:00 INFO storage.BlockManagerMasterEndpoint: Registering block manager sh02:44398 with 366.3 MB RAM, BlockManagerId(2, sh02, 44398, None)
22/11/19 18:38:02 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on sh02:44398 (size: 1358.0 B, free: 366.3 MB)
22/11/19 18:38:02 INFO cluster.YarnSchedulerBackend$YarnDriverEndpoint: Registered executor NettyRpcEndpointRef(spark-client://Executor) (172.16.97.106:57790) with ID 1
22/11/19 18:38:02 INFO scheduler.TaskSetManager: Starting task 4.0 in stage 0.0 (TID 4, sh03, executor 1, partition 4, PROCESS_LOCAL, 7743 bytes)
22/11/19 18:38:02 INFO scheduler.TaskSetManager: Starting task 5.0 in stage 0.0 (TID 5, sh03, executor 1, partition 5, PROCESS_LOCAL, 7743 bytes)
22/11/19 18:38:02 INFO scheduler.TaskSetManager: Starting task 6.0 in stage 0.0 (TID 6, sh03, executor 1, partition 6, PROCESS_LOCAL, 7743 bytes)
22/11/19 18:38:02 INFO scheduler.TaskSetManager: Starting task 7.0 in stage 0.0 (TID 7, sh03, executor 1, partition 7, PROCESS_LOCAL, 7743 bytes)
22/11/19 18:38:02 INFO scheduler.TaskSetManager: Starting task 8.0 in stage 0.0 (TID 8, sh02, executor 2, partition 8, PROCESS_LOCAL, 7743 bytes)
22/11/19 18:38:02 INFO scheduler.TaskSetManager: Starting task 9.0 in stage 0.0 (TID 9, sh02, executor 2, partition 9, PROCESS_LOCAL, 7743 bytes)
22/11/19 18:38:02 INFO scheduler.TaskSetManager: Finished task 2.0 in stage 0.0 (TID 2) in 2609 ms on sh02 (executor 2) (1/10)
22/11/19 18:38:02 INFO scheduler.TaskSetManager: Finished task 3.0 in stage 0.0 (TID 3) in 2608 ms on sh02 (executor 2) (2/10)
22/11/19 18:38:02 INFO scheduler.TaskSetManager: Finished task 1.0 in stage 0.0 (TID 1) in 2622 ms on sh02 (executor 2) (3/10)
22/11/19 18:38:02 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 2645 ms on sh02 (executor 2) (4/10)
22/11/19 18:38:02 INFO storage.BlockManagerMasterEndpoint: Registering block manager sh03:45892 with 366.3 MB RAM, BlockManagerId(1, sh03, 45892, None)
22/11/19 18:38:02 INFO scheduler.TaskSetManager: Finished task 8.0 in stage 0.0 (TID 8) in 378 ms on sh02 (executor 2) (5/10)
22/11/19 18:38:02 INFO scheduler.TaskSetManager: Finished task 9.0 in stage 0.0 (TID 9) in 407 ms on sh02 (executor 2) (6/10)
22/11/19 18:38:04 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on sh03:45892 (size: 1358.0 B, free: 366.3 MB)
22/11/19 18:38:05 INFO scheduler.TaskSetManager: Finished task 5.0 in stage 0.0 (TID 5) in 2762 ms on sh03 (executor 1) (7/10)
22/11/19 18:38:05 INFO scheduler.TaskSetManager: Finished task 4.0 in stage 0.0 (TID 4) in 2787 ms on sh03 (executor 1) (8/10)
22/11/19 18:38:05 INFO scheduler.TaskSetManager: Finished task 7.0 in stage 0.0 (TID 7) in 2794 ms on sh03 (executor 1) (9/10)
22/11/19 18:38:05 INFO scheduler.TaskSetManager: Finished task 6.0 in stage 0.0 (TID 6) in 2800 ms on sh03 (executor 1) (10/10)
22/11/19 18:38:05 INFO cluster.YarnScheduler: Removed TaskSet 0.0, whose tasks have all completed, from pool
22/11/19 18:38:05 INFO scheduler.DAGScheduler: ResultStage 0 (reduce at SparkPi.scala:38) finished in 8.174 s
22/11/19 18:38:05 INFO scheduler.DAGScheduler: Job 0 finished: reduce at SparkPi.scala:38, took 8.233929 s
Pi is roughly 3.1405671405671405
22/11/19 18:38:05 INFO server.AbstractConnector: Stopped Spark@62b3df3a{
    
    HTTP/1.1, (http/1.1)}{
    
    0.0.0.0:4040}
22/11/19 18:38:05 INFO ui.SparkUI: Stopped Spark web UI at http://10.10.9.250:4040
22/11/19 18:38:05 INFO cluster.YarnClientSchedulerBackend: Interrupting monitor thread
22/11/19 18:38:05 INFO cluster.YarnClientSchedulerBackend: Shutting down all executors
22/11/19 18:38:05 INFO cluster.YarnSchedulerBackend$YarnDriverEndpoint: Asking each executor to shut down
22/11/19 18:38:05 INFO cluster.SchedulerExtensionServices: Stopping SchedulerExtensionServices
(serviceOption=None,
 services=List(),
 started=false)
22/11/19 18:38:05 INFO cluster.YarnClientSchedulerBackend: Stopped
22/11/19 18:38:05 INFO spark.MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
22/11/19 18:38:05 INFO memory.MemoryStore: MemoryStore cleared
22/11/19 18:38:05 INFO storage.BlockManager: BlockManager stopped
22/11/19 18:38:05 INFO storage.BlockManagerMaster: BlockManagerMaster stopped
22/11/19 18:38:05 INFO scheduler.OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
22/11/19 18:38:05 INFO spark.SparkContext: Successfully stopped SparkContext
22/11/19 18:38:05 INFO util.ShutdownHookManager: Shutdown hook called
22/11/19 18:38:05 INFO util.ShutdownHookManager: Deleting directory /private/var/folders/pc/mj2v_vln4x14q6jylbtnmvx40000gn/T/spark-5ece9ef1-aff6-451e-bf36-b637d4afb74d
22/11/19 18:38:05 INFO util.ShutdownHookManager: Deleting directory /usr/local/spark-2.4.8-bin-hadoop2.7/tmp/spark-7ecf7a1c-87e6-4f76-8e50-cd1682762c25

おすすめ

転載: blog.csdn.net/yy_diego/article/details/127953198