前言
本文章部分内容翻译自:
http://spark.apache.org/docs/latest/submitting-applications.html
应用提交
Spark的bin目录中的spark-submit脚本用于在集群上启动应用程序。它可以通过统一的界面使用Spark支持的所有集群管理器,因此您不必为每个集群管理器配置应用程序。
捆绑应用程序的依赖关系
如果您的代码依赖于其他项目,则需要将它们与应用程序一起打包,以便将代码分发到Spark集群。为此,请创建包含代码及其依赖项的程序集jar(或“uber”jar)。sbt和Maven都有汇编插件。在创建程序集jar时,将Spark和Hadoop列为提供的依赖项;这些不需要捆绑,因为它们是由集群管理器在运行时提供的。一旦你有了一个组装的jar,你可以在传递你的jar时调用bin/spark-submit脚本。对于Python,您可以使用spark-submit的--py-files参数添加.py,.zip或.egg文件,以便与您的应用程序一起分发。如果您依赖多个Python文件,我们建议将它们打包成.zip或.egg。
使用spark-submit启动应用程序
捆绑用户应用程序后,可以使用bin/spark-submit脚本启动它。此脚本负责使用Spark及其依赖项设置类路径,并且可以支持Spark支持的不同集群管理器和部署模式:
./bin/spark-submit \ --class <main-class> \ --master <master-url> \ --deploy-mode <deploy-mode> \ --conf <key>=<value> \ ... # other options <application-jar> \ [application-arguments]
上述一些常用的选项分别是:
--class:应用程序的入口点(例如org.apache.spark.examples.SparkPi)
--master:集群的主URL(例如spark://23.195.26.187:7077)
--deploy-mode:是在工作节点(集群)上部署驱动程序还是在本地部署为外部客户端(客户端)(默认值:客户端)。
--conf:key = value格式的任意Spark配置属性。对于包含空格的值,用引号括起“key = value”(如图所示)。
application-jar:包含应用程序和所有依赖项的捆绑jar的路径。URL必须在群集内部全局可见,例如,hdfs://path或所有节点上都存在的file://path。
application-arguments:传递给主类的main方法的参数(如果有的话)。
常见的部署策略是从与您的工作机器物理位于同一位置的网关机器(例如,独立EC2集群中的主节点)提交您的应用程序。在此设置中,客户端模式是合适的。在客户端模式下,驱动程序直接在spark-submit进程中启动,该进程充当群集的客户端。应用程序的输入和输出附加到控制台。因此,该模式特别适用于涉及REPL的应用程序(例如Spark shell)。
或者,如果您的应用程序是从远离工作机器的计算机提交的(例如,在笔记本电脑上本地提交),则通常使用群集模式来最小化驱动程序和执行程序之间的网络延迟。目前,独立模式不支持Python应用程序的集群模式。
对于Python应用程序,只需传递一个.py文件代替<application-jar>而不是JAR,并使用--py-files将Python的.zip,.egg或.py文件添加到搜索路径中。
有一些特定于正在使用的集群管理器的选项。例如,对于具有集群部署模式的Spark独立集群,您还可以指定--supervise以确保驱动程序在失败且退出代码为非零时自动重新启动。要枚举所有可用于spark-submit的选项,请使用--help运行它。
各种模式运行spark任务
local模式
[root@hadoop1 spark-2.4.0-bin-hadoop2.7]# ./bin/spark-submit --class org.apache.spark.examples.SparkPi --master local examples/jarsamples_2.11-2.4.0.jar 2019-02-21 22:13:58 WARN NativeCodeLoader:62 - Unable to load native-hadoop library for your platform... using builtin-java classepplicable 2019-02-21 22:14:00 INFO SparkContext:54 - Running Spark version 2.4.0 2019-02-21 22:14:00 INFO SparkContext:54 - Submitted application: Spark Pi 2019-02-21 22:14:00 INFO SecurityManager:54 - Changing view acls to: root 2019-02-21 22:14:00 INFO SecurityManager:54 - Changing modify acls to: root 2019-02-21 22:14:00 INFO SecurityManager:54 - Changing view acls groups to: 2019-02-21 22:14:00 INFO SecurityManager:54 - Changing modify acls groups to: 2019-02-21 22:14:00 INFO SecurityManager:54 - SecurityManager: authentication disabled; ui acls disabled; users with view permiss(root); groups with view permissions: Set(); users with modify permissions: Set(root); groups with modify permissions: Set() 2019-02-21 22:14:01 INFO Utils:54 - Successfully started service 'sparkDriver' on port 48468. 2019-02-21 22:14:01 INFO SparkEnv:54 - Registering MapOutputTracker 2019-02-21 22:14:01 INFO SparkEnv:54 - Registering BlockManagerMaster 2019-02-21 22:14:01 INFO BlockManagerMasterEndpoint:54 - Using org.apache.spark.storage.DefaultTopologyMapper for getting topologyion 2019-02-21 22:14:01 INFO BlockManagerMasterEndpoint:54 - BlockManagerMasterEndpoint up 2019-02-21 22:14:01 INFO DiskBlockManager:54 - Created local directory at /tmp/blockmgr-4ddfef66-4732-4271-b029-05332cfa70a9 2019-02-21 22:14:01 INFO MemoryStore:54 - MemoryStore started with capacity 413.9 MB 2019-02-21 22:14:01 INFO SparkEnv:54 - Registering OutputCommitCoordinator 2019-02-21 22:14:02 INFO log:192 - Logging initialized @9713ms 2019-02-21 22:14:02 INFO Server:351 - jetty-9.3.z-SNAPSHOT, build timestamp: unknown, git hash: unknown 2019-02-21 22:14:02 INFO Server:419 - Started @9891ms 2019-02-21 22:14:02 INFO AbstractConnector:278 - Started ServerConnector@40e4ea87{HTTP/1.1,[http/1.1]}{192.168.217.201:4040} 2019-02-21 22:14:02 INFO Utils:54 - Successfully started service 'SparkUI' on port 4040. 2019-02-21 22:14:02 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@1a38ba58{/jobs,null,AVAILABLE,@Spark} 2019-02-21 22:14:02 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@24b52d3e{/jobs/json,null,AVAILABLE,@Spark} 2019-02-21 22:14:02 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@15deb1dc{/jobs/job,null,AVAILABLE,@Spark} 2019-02-21 22:14:02 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@57a4d5ee{/jobs/job/json,null,AVAILABLE,@Spark} 2019-02-21 22:14:02 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@5af5def9{/stages,null,AVAILABLE,@Spark} 2019-02-21 22:14:02 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@3a45c42a{/stages/json,null,AVAILABLE,@Spark} 2019-02-21 22:14:02 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@36dce7ed{/stages/stage,null,AVAILABLE,@Spark} 2019-02-21 22:14:02 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@27a0a5a2{/stages/stage/json,null,AVAILABLE,@Sp 2019-02-21 22:14:02 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@7692cd34{/stages/pool,null,AVAILABLE,@Spark} 2019-02-21 22:14:02 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@33aa93c{/stages/pool/json,null,AVAILABLE,@Spar 2019-02-21 22:14:02 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@32c0915e{/storage,null,AVAILABLE,@Spark} 2019-02-21 22:14:02 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@106faf11{/storage/json,null,AVAILABLE,@Spark} 2019-02-21 22:14:02 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@70f43b45{/storage/rdd,null,AVAILABLE,@Spark} 2019-02-21 22:14:02 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@26d10f2e{/storage/rdd/json,null,AVAILABLE,@Spa 2019-02-21 22:14:02 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@10ad20cb{/environment,null,AVAILABLE,@Spark} 2019-02-21 22:14:02 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@7dd712e8{/environment/json,null,AVAILABLE,@Spa 2019-02-21 22:14:02 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@2c282004{/executors,null,AVAILABLE,@Spark} 2019-02-21 22:14:02 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@22ee2d0{/executors/json,null,AVAILABLE,@Spark} 2019-02-21 22:14:02 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@7bfc3126{/executors/threadDump,null,AVAILABLE, 2019-02-21 22:14:02 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@3e792ce3{/executors/threadDump/json,null,AVAILrk} 2019-02-21 22:14:02 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@53bc1328{/static,null,AVAILABLE,@Spark} 2019-02-21 22:14:02 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@e041f0c{/,null,AVAILABLE,@Spark} 2019-02-21 22:14:02 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@6a175569{/api,null,AVAILABLE,@Spark} 2019-02-21 22:14:02 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@4102b1b1{/jobs/job/kill,null,AVAILABLE,@Spark} 2019-02-21 22:14:02 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@61a5b4ae{/stages/stage/kill,null,AVAILABLE,@Sp 2019-02-21 22:14:02 INFO SparkUI:54 - Bound SparkUI to 192.168.217.201, and started at http://hadoop1.org.cn:4040 2019-02-21 22:14:02 INFO SparkContext:54 - Added JAR file:/usr/hdp/spark-2.4.0-bin-hadoop2.7/examples/jars/spark-examples_2.11-2.4 spark://hadoop1.org.cn:48468/jars/spark-examples_2.11-2.4.0.jar with timestamp 1550758442663 2019-02-21 22:14:02 INFO Executor:54 - Starting executor ID driver on host localhost 2019-02-21 22:14:03 INFO Utils:54 - Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on por 2019-02-21 22:14:03 INFO NettyBlockTransferService:54 - Server created on hadoop1.org.cn:37714 2019-02-21 22:14:03 INFO BlockManager:54 - Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication polic 2019-02-21 22:14:03 INFO BlockManagerMaster:54 - Registering BlockManager BlockManagerId(driver, hadoop1.org.cn, 37714, None) 2019-02-21 22:14:03 INFO BlockManagerMasterEndpoint:54 - Registering block manager hadoop1.org.cn:37714 with 413.9 MB RAM, BlockMariver, hadoop1.org.cn, 37714, None) 2019-02-21 22:14:03 INFO BlockManagerMaster:54 - Registered BlockManager BlockManagerId(driver, hadoop1.org.cn, 37714, None) 2019-02-21 22:14:03 INFO BlockManager:54 - Initialized BlockManager: BlockManagerId(driver, hadoop1.org.cn, 37714, None) 2019-02-21 22:14:03 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@61a91912{/metrics/json,null,AVAILABLE,@Spark} 2019-02-21 22:14:05 INFO SparkContext:54 - Starting job: reduce at SparkPi.scala:38 2019-02-21 22:14:06 INFO DAGScheduler:54 - Got job 0 (reduce at SparkPi.scala:38) with 2 output partitions 2019-02-21 22:14:06 INFO DAGScheduler:54 - Final stage: ResultStage 0 (reduce at SparkPi.scala:38) 2019-02-21 22:14:06 INFO DAGScheduler:54 - Parents of final stage: List() 2019-02-21 22:14:06 INFO DAGScheduler:54 - Missing parents: List() 2019-02-21 22:14:06 INFO DAGScheduler:54 - Submitting ResultStage 0 (MapPartitionsRDD[1] at map at SparkPi.scala:34), which has noparents 2019-02-21 22:14:06 INFO MemoryStore:54 - Block broadcast_0 stored as values in memory (estimated size 1936.0 B, free 413.9 MB) 2019-02-21 22:14:07 INFO MemoryStore:54 - Block broadcast_0_piece0 stored as bytes in memory (estimated size 1256.0 B, free 413.9 2019-02-21 22:14:07 INFO BlockManagerInfo:54 - Added broadcast_0_piece0 in memory on hadoop1.org.cn:37714 (size: 1256.0 B, free: 4 2019-02-21 22:14:07 INFO SparkContext:54 - Created broadcast 0 from broadcast at DAGScheduler.scala:1161 2019-02-21 22:14:07 INFO DAGScheduler:54 - Submitting 2 missing tasks from ResultStage 0 (MapPartitionsRDD[1] at map at SparkPi.scfirst 15 tasks are for partitions Vector(0, 1)) 2019-02-21 22:14:07 INFO TaskSchedulerImpl:54 - Adding task set 0.0 with 2 tasks 2019-02-21 22:14:07 INFO TaskSetManager:54 - Starting task 0.0 in stage 0.0 (TID 0, localhost, executor driver, partition 0, PROCE 7866 bytes) 2019-02-21 22:14:07 INFO Executor:54 - Running task 0.0 in stage 0.0 (TID 0) 2019-02-21 22:14:07 INFO Executor:54 - Fetching spark://hadoop1.org.cn:48468/jars/spark-examples_2.11-2.4.0.jar with timestamp 1553 2019-02-21 22:14:08 INFO TransportClientFactory:267 - Successfully created connection to hadoop1.org.cn/192.168.217.201:48468 afte(0 ms spent in bootstraps) 2019-02-21 22:14:08 INFO Utils:54 - Fetching spark://hadoop1.org.cn:48468/jars/spark-examples_2.11-2.4.0.jar to /tmp/spark-e9e2c8bda-9d3d-4a4f9671b0d9/userFiles-e2c1980d-6d11-48f1-8422-2b637ce7a1fb/fetchFileTemp701584033085131304.tmp 2019-02-21 22:14:09 INFO Executor:54 - Adding file:/tmp/spark-e9e2c8b5-0a08-4dda-9d3d-4a4f9671b0d9/userFiles-e2c1980d-6d11-48f1-84e7a1fb/spark-examples_2.11-2.4.0.jar to class loader 2019-02-21 22:14:09 INFO Executor:54 - Finished task 0.0 in stage 0.0 (TID 0). 867 bytes result sent to driver 2019-02-21 22:14:09 INFO TaskSetManager:54 - Starting task 1.0 in stage 0.0 (TID 1, localhost, executor driver, partition 1, PROCE 7866 bytes) 2019-02-21 22:14:09 INFO Executor:54 - Running task 1.0 in stage 0.0 (TID 1) 2019-02-21 22:14:09 INFO Executor:54 - Finished task 1.0 in stage 0.0 (TID 1). 824 bytes result sent to driver 2019-02-21 22:14:09 INFO TaskSetManager:54 - Finished task 0.0 in stage 0.0 (TID 0) in 2360 ms on localhost (executor driver) (1/2 2019-02-21 22:14:10 INFO TaskSetManager:54 - Finished task 1.0 in stage 0.0 (TID 1) in 180 ms on localhost (executor driver) (2/2) 2019-02-21 22:14:10 INFO TaskSchedulerImpl:54 - Removed TaskSet 0.0, whose tasks have all completed, from pool 2019-02-21 22:14:10 INFO DAGScheduler:54 - ResultStage 0 (reduce at SparkPi.scala:38) finished in 3.577 s 2019-02-21 22:14:10 INFO DAGScheduler:54 - Job 0 finished: reduce at SparkPi.scala:38, took 4.506013 s Pi is roughly 3.142475712378562 2019-02-21 22:14:10 INFO AbstractConnector:318 - Stopped Spark@40e4ea87{HTTP/1.1,[http/1.1]}{192.168.217.201:4040} 2019-02-21 22:14:10 INFO SparkUI:54 - Stopped Spark web UI at http://hadoop1.org.cn:4040 2019-02-21 22:14:10 INFO MapOutputTrackerMasterEndpoint:54 - MapOutputTrackerMasterEndpoint stopped! 2019-02-21 22:14:10 INFO MemoryStore:54 - MemoryStore cleared 2019-02-21 22:14:10 INFO BlockManager:54 - BlockManager stopped 2019-02-21 22:14:10 INFO BlockManagerMaster:54 - BlockManagerMaster stopped 2019-02-21 22:14:10 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint:54 - OutputCommitCoordinator stopped! 2019-02-21 22:14:11 INFO SparkContext:54 - Successfully stopped SparkContext 2019-02-21 22:14:11 INFO ShutdownHookManager:54 - Shutdown hook called 2019-02-21 22:14:11 INFO ShutdownHookManager:54 - Deleting directory /tmp/spark-3f8eab55-786c-4663-9f99-ca779610ee0d 2019-02-21 22:14:11 INFO ShutdownHookManager:54 - Deleting directory /tmp/spark-e9e2c8b5-0a08-4dda-9d3d-4a4f9671b0d9
standalone模式
[root@hadoop1 spark-2.4.0-bin-hadoop2.7]# ./bin/spark-submit --class org.apache.spark.examples.SparkPi --master spark://192.168.217 examples/jars/spark-examples_2.11-2.4.0.jar 2019-02-21 22:19:37 WARN NativeCodeLoader:62 - Unable to load native-hadoop library for your platform... using builtin-java classepplicable 2019-02-21 22:19:38 INFO SparkContext:54 - Running Spark version 2.4.0 2019-02-21 22:19:38 INFO SparkContext:54 - Submitted application: Spark Pi 2019-02-21 22:19:39 INFO SecurityManager:54 - Changing view acls to: root 2019-02-21 22:19:39 INFO SecurityManager:54 - Changing modify acls to: root 2019-02-21 22:19:39 INFO SecurityManager:54 - Changing view acls groups to: 2019-02-21 22:19:39 INFO SecurityManager:54 - Changing modify acls groups to: 2019-02-21 22:19:39 INFO SecurityManager:54 - SecurityManager: authentication disabled; ui acls disabled; users with view permiss(root); groups with view permissions: Set(); users with modify permissions: Set(root); groups with modify permissions: Set() 2019-02-21 22:19:40 INFO Utils:54 - Successfully started service 'sparkDriver' on port 40178. 2019-02-21 22:19:40 INFO SparkEnv:54 - Registering MapOutputTracker 2019-02-21 22:19:40 INFO SparkEnv:54 - Registering BlockManagerMaster 2019-02-21 22:19:40 INFO BlockManagerMasterEndpoint:54 - Using org.apache.spark.storage.DefaultTopologyMapper for getting topologyion 2019-02-21 22:19:40 INFO BlockManagerMasterEndpoint:54 - BlockManagerMasterEndpoint up 2019-02-21 22:19:40 INFO DiskBlockManager:54 - Created local directory at /tmp/blockmgr-07435023-9ee3-4394-9217-79c2902a5a4d 2019-02-21 22:19:40 INFO MemoryStore:54 - MemoryStore started with capacity 413.9 MB 2019-02-21 22:19:40 INFO SparkEnv:54 - Registering OutputCommitCoordinator 2019-02-21 22:19:40 INFO log:192 - Logging initialized @9124ms 2019-02-21 22:19:40 INFO Server:351 - jetty-9.3.z-SNAPSHOT, build timestamp: unknown, git hash: unknown 2019-02-21 22:19:40 INFO Server:419 - Started @9269ms 2019-02-21 22:19:40 INFO AbstractConnector:278 - Started ServerConnector@3a7b503d{HTTP/1.1,[http/1.1]}{192.168.217.201:4040} 2019-02-21 22:19:40 INFO Utils:54 - Successfully started service 'SparkUI' on port 4040. 2019-02-21 22:19:40 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@6058e535{/jobs,null,AVAILABLE,@Spark} 2019-02-21 22:19:40 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@6e9c413e{/jobs/json,null,AVAILABLE,@Spark} 2019-02-21 22:19:40 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@57a4d5ee{/jobs/job,null,AVAILABLE,@Spark} 2019-02-21 22:19:40 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@3a45c42a{/jobs/job/json,null,AVAILABLE,@Spark} 2019-02-21 22:19:40 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@36dce7ed{/stages,null,AVAILABLE,@Spark} 2019-02-21 22:19:40 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@47a64f7d{/stages/json,null,AVAILABLE,@Spark} 2019-02-21 22:19:40 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@33d05366{/stages/stage,null,AVAILABLE,@Spark} 2019-02-21 22:19:40 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@33aa93c{/stages/stage/json,null,AVAILABLE,@Spa 2019-02-21 22:19:40 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@32c0915e{/stages/pool,null,AVAILABLE,@Spark} 2019-02-21 22:19:40 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@106faf11{/stages/pool/json,null,AVAILABLE,@Spa 2019-02-21 22:19:40 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@70f43b45{/storage,null,AVAILABLE,@Spark} 2019-02-21 22:19:40 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@26d10f2e{/storage/json,null,AVAILABLE,@Spark} 2019-02-21 22:19:40 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@10ad20cb{/storage/rdd,null,AVAILABLE,@Spark} 2019-02-21 22:19:40 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@7dd712e8{/storage/rdd/json,null,AVAILABLE,@Spa 2019-02-21 22:19:40 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@2c282004{/environment,null,AVAILABLE,@Spark} 2019-02-21 22:19:40 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@22ee2d0{/environment/json,null,AVAILABLE,@Spar 2019-02-21 22:19:40 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@7bfc3126{/executors,null,AVAILABLE,@Spark} 2019-02-21 22:19:40 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@3e792ce3{/executors/json,null,AVAILABLE,@Spark 2019-02-21 22:19:40 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@53bc1328{/executors/threadDump,null,AVAILABLE, 2019-02-21 22:19:40 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@26f143ed{/executors/threadDump/json,null,AVAILrk} 2019-02-21 22:19:40 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@3c1e3314{/static,null,AVAILABLE,@Spark} 2019-02-21 22:19:40 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@11963225{/,null,AVAILABLE,@Spark} 2019-02-21 22:19:40 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@3f3c966c{/api,null,AVAILABLE,@Spark} 2019-02-21 22:19:40 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@3a71c100{/jobs/job/kill,null,AVAILABLE,@Spark} 2019-02-21 22:19:40 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@5b69fd74{/stages/stage/kill,null,AVAILABLE,@Sp 2019-02-21 22:19:41 INFO SparkUI:54 - Bound SparkUI to 192.168.217.201, and started at http://hadoop1.org.cn:4040 2019-02-21 22:19:41 INFO SparkContext:54 - Added JAR file:/usr/hdp/spark-2.4.0-bin-hadoop2.7/examples/jars/spark-examples_2.11-2.4 spark://hadoop1.org.cn:40178/jars/spark-examples_2.11-2.4.0.jar with timestamp 1550758781088 2019-02-21 22:19:41 INFO StandaloneAppClient$ClientEndpoint:54 - Connecting to master spark://192.168.217.201:7077... 2019-02-21 22:19:41 INFO TransportClientFactory:267 - Successfully created connection to /192.168.217.201:7077 after 83 ms (0 ms sootstraps) 2019-02-21 22:19:42 INFO StandaloneSchedulerBackend:54 - Connected to Spark cluster with app ID app-20190221221942-0000 2019-02-21 22:19:42 INFO Utils:54 - Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on por 2019-02-21 22:19:42 INFO NettyBlockTransferService:54 - Server created on hadoop1.org.cn:55988 2019-02-21 22:19:42 INFO BlockManager:54 - Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication polic 2019-02-21 22:19:42 INFO StandaloneAppClient$ClientEndpoint:54 - Executor added: app-20190221221942-0000/0 on worker-2019022122063.217.202-51708 (192.168.217.202:51708) with 2 core(s) 2019-02-21 22:19:42 INFO StandaloneSchedulerBackend:54 - Granted executor ID app-20190221221942-0000/0 on hostPort 192.168.217.202th 2 core(s), 1024.0 MB RAM 2019-02-21 22:19:42 INFO StandaloneAppClient$ClientEndpoint:54 - Executor added: app-20190221221942-0000/1 on worker-2019022122063.217.203-54960 (192.168.217.203:54960) with 2 core(s) 2019-02-21 22:19:42 INFO StandaloneSchedulerBackend:54 - Granted executor ID app-20190221221942-0000/1 on hostPort 192.168.217.203th 2 core(s), 1024.0 MB RAM 2019-02-21 22:19:42 INFO StandaloneAppClient$ClientEndpoint:54 - Executor added: app-20190221221942-0000/2 on worker-2019022122063.217.201-45088 (192.168.217.201:45088) with 2 core(s) 2019-02-21 22:19:42 INFO StandaloneSchedulerBackend:54 - Granted executor ID app-20190221221942-0000/2 on hostPort 192.168.217.201th 2 core(s), 1024.0 MB RAM 2019-02-21 22:19:42 INFO StandaloneAppClient$ClientEndpoint:54 - Executor updated: app-20190221221942-0000/0 is now RUNNING 2019-02-21 22:19:42 INFO StandaloneAppClient$ClientEndpoint:54 - Executor updated: app-20190221221942-0000/1 is now RUNNING 2019-02-21 22:19:42 INFO BlockManagerMaster:54 - Registering BlockManager BlockManagerId(driver, hadoop1.org.cn, 55988, None) 2019-02-21 22:19:43 INFO BlockManagerMasterEndpoint:54 - Registering block manager hadoop1.org.cn:55988 with 413.9 MB RAM, BlockMariver, hadoop1.org.cn, 55988, None) 2019-02-21 22:19:43 INFO StandaloneAppClient$ClientEndpoint:54 - Executor updated: app-20190221221942-0000/2 is now RUNNING 2019-02-21 22:19:43 INFO BlockManagerMaster:54 - Registered BlockManager BlockManagerId(driver, hadoop1.org.cn, 55988, None) 2019-02-21 22:19:43 INFO BlockManager:54 - Initialized BlockManager: BlockManagerId(driver, hadoop1.org.cn, 55988, None) 2019-02-21 22:19:44 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@726a17c4{/metrics/json,null,AVAILABLE,@Spark} 2019-02-21 22:19:45 INFO StandaloneSchedulerBackend:54 - SchedulerBackend is ready for scheduling beginning after reached minRegisurcesRatio: 0.0 2019-02-21 22:19:48 INFO CoarseGrainedSchedulerBackend$DriverEndpoint:54 - Registered executor NettyRpcEndpointRef(spark-client:// (192.168.217.202:38810) with ID 0 2019-02-21 22:19:49 INFO CoarseGrainedSchedulerBackend$DriverEndpoint:54 - Registered executor NettyRpcEndpointRef(spark-client:// (192.168.217.203:47346) with ID 1 2019-02-21 22:19:49 INFO BlockManagerMasterEndpoint:54 - Registering block manager 192.168.217.202:50169 with 413.9 MB RAM, BlockM0, 192.168.217.202, 50169, None) 2019-02-21 22:19:50 INFO BlockManagerMasterEndpoint:54 - Registering block manager 192.168.217.203:47358 with 413.9 MB RAM, BlockM1, 192.168.217.203, 47358, None) 2019-02-21 22:19:55 INFO SparkContext:54 - Starting job: reduce at SparkPi.scala:38 2019-02-21 22:19:56 INFO DAGScheduler:54 - Got job 0 (reduce at SparkPi.scala:38) with 2 output partitions 2019-02-21 22:19:56 INFO DAGScheduler:54 - Final stage: ResultStage 0 (reduce at SparkPi.scala:38) 2019-02-21 22:19:56 INFO DAGScheduler:54 - Parents of final stage: List() 2019-02-21 22:19:56 INFO DAGScheduler:54 - Missing parents: List() 2019-02-21 22:19:56 INFO DAGScheduler:54 - Submitting ResultStage 0 (MapPartitionsRDD[1] at map at SparkPi.scala:34), which has noparents 2019-02-21 22:20:03 INFO MemoryStore:54 - Block broadcast_0 stored as values in memory (estimated size 1936.0 B, free 413.9 MB) 2019-02-21 22:20:04 INFO MemoryStore:54 - Block broadcast_0_piece0 stored as bytes in memory (estimated size 1256.0 B, free 413.9 2019-02-21 22:20:04 INFO BlockManagerInfo:54 - Added broadcast_0_piece0 in memory on hadoop1.org.cn:55988 (size: 1256.0 B, free: 4 2019-02-21 22:20:05 INFO SparkContext:54 - Created broadcast 0 from broadcast at DAGScheduler.scala:1161 2019-02-21 22:20:05 INFO DAGScheduler:54 - Submitting 2 missing tasks from ResultStage 0 (MapPartitionsRDD[1] at map at SparkPi.scfirst 15 tasks are for partitions Vector(0, 1)) 2019-02-21 22:20:05 INFO TaskSchedulerImpl:54 - Adding task set 0.0 with 2 tasks 2019-02-21 22:20:07 INFO TaskSetManager:54 - Starting task 0.0 in stage 0.0 (TID 0, 192.168.217.202, executor 0, partition 0, PROC, 7870 bytes) 2019-02-21 22:20:07 INFO TaskSetManager:54 - Starting task 1.0 in stage 0.0 (TID 1, 192.168.217.203, executor 1, partition 1, PROC, 7870 bytes) 2019-02-21 22:20:12 INFO BlockManagerInfo:54 - Added broadcast_0_piece0 in memory on 192.168.217.203:47358 (size: 1256.0 B, free: 2019-02-21 22:20:12 INFO BlockManagerInfo:54 - Added broadcast_0_piece0 in memory on 192.168.217.202:50169 (size: 1256.0 B, free: 2019-02-21 22:20:15 INFO TaskSetManager:54 - Finished task 0.0 in stage 0.0 (TID 0) in 8180 ms on 192.168.217.202 (executor 0) (1/ 2019-02-21 22:20:15 INFO TaskSetManager:54 - Finished task 1.0 in stage 0.0 (TID 1) in 8179 ms on 192.168.217.203 (executor 1) (2/ 2019-02-21 22:20:15 INFO DAGScheduler:54 - ResultStage 0 (reduce at SparkPi.scala:38) finished in 16.998 s 2019-02-21 22:20:15 INFO TaskSchedulerImpl:54 - Removed TaskSet 0.0, whose tasks have all completed, from pool 2019-02-21 22:20:15 INFO DAGScheduler:54 - Job 0 finished: reduce at SparkPi.scala:38, took 20.610491 s Pi is roughly 3.1427357136785683 2019-02-21 22:20:16 INFO AbstractConnector:318 - Stopped Spark@3a7b503d{HTTP/1.1,[http/1.1]}{192.168.217.201:4040} 2019-02-21 22:20:16 INFO SparkUI:54 - Stopped Spark web UI at http://hadoop1.org.cn:4040 2019-02-21 22:20:17 INFO StandaloneSchedulerBackend:54 - Shutting down all executors 2019-02-21 22:20:17 INFO CoarseGrainedSchedulerBackend$DriverEndpoint:54 - Asking each executor to shut down 2019-02-21 22:20:18 INFO StandaloneAppClient$ClientEndpoint:54 - Executor updated: app-20190221221942-0000/0 is now EXITED (Commanwith code 0) 2019-02-21 22:20:18 INFO StandaloneSchedulerBackend:54 - Executor app-20190221221942-0000/0 removed: Command exited with code 0 2019-02-21 22:20:18 INFO StandaloneAppClient$ClientEndpoint:54 - Executor added: app-20190221221942-0000/3 on worker-2019022122063.217.202-51708 (192.168.217.202:51708) with 2 core(s) 2019-02-21 22:20:18 INFO StandaloneSchedulerBackend:54 - Granted executor ID app-20190221221942-0000/3 on hostPort 192.168.217.202th 2 core(s), 1024.0 MB RAM 2019-02-21 22:20:18 INFO StandaloneAppClient$ClientEndpoint:54 - Executor updated: app-20190221221942-0000/3 is now RUNNING 2019-02-21 22:20:18 INFO MapOutputTrackerMasterEndpoint:54 - MapOutputTrackerMasterEndpoint stopped! 2019-02-21 22:20:19 INFO MemoryStore:54 - MemoryStore cleared 2019-02-21 22:20:19 INFO BlockManager:54 - BlockManager stopped 2019-02-21 22:20:19 INFO BlockManagerMaster:54 - BlockManagerMaster stopped 2019-02-21 22:20:19 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint:54 - OutputCommitCoordinator stopped! 2019-02-21 22:20:19 INFO SparkContext:54 - Successfully stopped SparkContext 2019-02-21 22:20:20 INFO ShutdownHookManager:54 - Shutdown hook called 2019-02-21 22:20:20 INFO ShutdownHookManager:54 - Deleting directory /tmp/spark-e9155121-edfe-4f64-b917-be3f9f62220a 2019-02-21 22:20:20 INFO ShutdownHookManager:54 - Deleting directory /tmp/spark-90e9b57b-c3d4-4572-8eec-906f121d6b98 [root@hadoop1 spark-2.4.0-bin-hadoop2.7]#
yarn-cluster模式
所谓的yarn集群模式,就是讲spark任务提交给yarn,让yarn去执行相关的任务,因此需要在spark-env.sh文件中添加export HADOOP_CONF_DIR=/usr/hdp/hadoop-2.8.3/etc/hadoop,然后去执行相关的任务:
[root@hadoop1 spark-2.4.0-bin-hadoop2.7]# ./bin/spark-submit --class org.apache.spark.examples.SparkPi --master yarn --deploy-mode cluster examples/jars/spark-examples_2.11-2.4.0.jar 1000 2019-02-22 00:07:32 WARN NativeCodeLoader:62 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 2019-02-22 00:07:34 INFO RMProxy:98 - Connecting to ResourceManager at hadoop1/192.168.217.201:8032 2019-02-22 00:07:36 INFO Client:54 - Requesting a new application from cluster with 2 NodeManagers 2019-02-22 00:07:36 INFO Client:54 - Verifying our application has not requested more than the maximum memory capability of the cluster (2048 MB per container) 2019-02-22 00:07:36 INFO Client:54 - Will allocate AM container, with 1408 MB memory including 384 MB overhead 2019-02-22 00:07:36 INFO Client:54 - Setting up container launch context for our AM 2019-02-22 00:07:37 INFO Client:54 - Setting up the launch environment for our AM container 2019-02-22 00:07:37 INFO Client:54 - Preparing resources for our AM container 2019-02-22 00:07:38 WARN Client:66 - Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME. 2019-02-22 00:07:59 INFO Client:54 - Uploading resource file:/tmp/spark-2b75f51c-ce24-4767-aa38-6d3262b1c7cb/__spark_libs__7092311691544510332.zip -> hdfs://hadoop1:9000/user/root/.sparkStaging/application_1550757972410_0001/__spark_libs__7092311691544510332.zip 2019-02-22 00:08:26 INFO Client:54 - Uploading resource file:/usr/hdp/spark-2.4.0-bin-hadoop2.7/examples/jars/spark-examples_2.11-2.4.0.jar -> hdfs://hadoop1:9000/user/root/.sparkStaging/application_1550757972410_0001/spark-examples_2.11-2.4.0.jar 2019-02-22 00:08:27 INFO Client:54 - Uploading resource file:/tmp/spark-2b75f51c-ce24-4767-aa38-6d3262b1c7cb/__spark_conf__4473735302996115715.zip -> hdfs://hadoop1:9000/user/root/.sparkStaging/application_1550757972410_0001/__spark_conf__.zip 2019-02-22 00:08:28 INFO SecurityManager:54 - Changing view acls to: root 2019-02-22 00:08:28 INFO SecurityManager:54 - Changing modify acls to: root 2019-02-22 00:08:28 INFO SecurityManager:54 - Changing view acls groups to: 2019-02-22 00:08:28 INFO SecurityManager:54 - Changing modify acls groups to: 2019-02-22 00:08:28 INFO SecurityManager:54 - SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); groups with view permissions: Set(); users with modify permissions: Set(root); groups with modify permissions: Set() 2019-02-22 00:08:31 INFO Client:54 - Submitting application application_1550757972410_0001 to ResourceManager 2019-02-22 00:08:34 INFO YarnClientImpl:273 - Submitted application application_1550757972410_0001 2019-02-22 00:08:36 INFO Client:54 - Application report for application_1550757972410_0001 (state: ACCEPTED) 2019-02-22 00:08:36 INFO Client:54 - client token: N/A diagnostics: [星期五 二月 22 00:08:35 +0800 2019] Scheduler has assigned a container for AM, waiting for AM container to be launched ApplicationMaster host: N/A ApplicationMaster RPC port: -1 queue: default start time: 1550765313392 final status: UNDEFINED tracking URL: http://hadoop1:8088/proxy/application_1550757972410_0001/ user: root 2019-02-22 00:08:37 INFO Client:54 - Application report for application_1550757972410_0001 (state: ACCEPTED) 2019-02-22 00:08:38 INFO Client:54 - Application report for application_1550757972410_0001 (state: ACCEPTED) 2019-02-22 00:08:39 INFO Client:54 - Application report for application_1550757972410_0001 (state: ACCEPTED) 2019-02-22 00:08:40 INFO Client:54 - Application report for application_1550757972410_0001 (state: ACCEPTED) 2019-02-22 00:08:41 INFO Client:54 - Application report for application_1550757972410_0001 (state: ACCEPTED) 2019-02-22 00:08:42 INFO Client:54 - Application report for application_1550757972410_0001 (state: ACCEPTED) 2019-02-22 00:08:43 INFO Client:54 - Application report for application_1550757972410_0001 (state: ACCEPTED) 2019-02-22 00:08:44 INFO Client:54 - Application report for application_1550757972410_0001 (state: ACCEPTED) 2019-02-22 00:08:45 INFO Client:54 - Application report for application_1550757972410_0001 (state: ACCEPTED) 2019-02-22 00:08:46 INFO Client:54 - Application report for application_1550757972410_0001 (state: ACCEPTED) 2019-02-22 00:08:47 INFO Client:54 - Application report for application_1550757972410_0001 (state: ACCEPTED) 2019-02-22 00:08:48 INFO Client:54 - Application report for application_1550757972410_0001 (state: ACCEPTED) 2019-02-22 00:08:49 INFO Client:54 - Application report for application_1550757972410_0001 (state: ACCEPTED) 2019-02-22 00:08:50 INFO Client:54 - Application report for application_1550757972410_0001 (state: ACCEPTED) 2019-02-22 00:08:51 INFO Client:54 - Application report for application_1550757972410_0001 (state: ACCEPTED) 2019-02-22 00:08:52 INFO Client:54 - Application report for application_1550757972410_0001 (state: ACCEPTED) 2019-02-22 00:08:53 INFO Client:54 - Application report for application_1550757972410_0001 (state: ACCEPTED) 2019-02-22 00:08:54 INFO Client:54 - Application report for application_1550757972410_0001 (state: ACCEPTED) 2019-02-22 00:08:55 INFO Client:54 - Application report for application_1550757972410_0001 (state: ACCEPTED) 2019-02-22 00:08:56 INFO Client:54 - Application report for application_1550757972410_0001 (state: ACCEPTED) 2019-02-22 00:08:57 INFO Client:54 - Application report for application_1550757972410_0001 (state: ACCEPTED) 2019-02-22 00:08:58 INFO Client:54 - Application report for application_1550757972410_0001 (state: ACCEPTED) 2019-02-22 00:08:59 INFO Client:54 - Application report for application_1550757972410_0001 (state: ACCEPTED) 2019-02-22 00:09:00 INFO Client:54 - Application report for application_1550757972410_0001 (state: ACCEPTED) 2019-02-22 00:09:01 INFO Client:54 - Application report for application_1550757972410_0001 (state: ACCEPTED) 2019-02-22 00:09:02 INFO Client:54 - Application report for application_1550757972410_0001 (state: ACCEPTED) 2019-02-22 00:09:03 INFO Client:54 - Application report for application_1550757972410_0001 (state: ACCEPTED) 2019-02-22 00:09:04 INFO Client:54 - Application report for application_1550757972410_0001 (state: ACCEPTED) 2019-02-22 00:09:05 INFO Client:54 - Application report for application_1550757972410_0001 (state: ACCEPTED) 2019-02-22 00:09:06 INFO Client:54 - Application report for application_1550757972410_0001 (state: ACCEPTED) 2019-02-22 00:09:07 INFO Client:54 - Application report for application_1550757972410_0001 (state: ACCEPTED) 2019-02-22 00:09:08 INFO Client:54 - Application report for application_1550757972410_0001 (state: ACCEPTED) 2019-02-22 00:09:09 INFO Client:54 - Application report for application_1550757972410_0001 (state: ACCEPTED) 2019-02-22 00:09:10 INFO Client:54 - Application report for application_1550757972410_0001 (state: ACCEPTED) 2019-02-22 00:09:11 INFO Client:54 - Application report for application_1550757972410_0001 (state: ACCEPTED) 2019-02-22 00:09:12 INFO Client:54 - Application report for application_1550757972410_0001 (state: ACCEPTED) 2019-02-22 00:09:13 INFO Client:54 - Application report for application_1550757972410_0001 (state: ACCEPTED) 2019-02-22 00:09:14 INFO Client:54 - Application report for application_1550757972410_0001 (state: ACCEPTED) 2019-02-22 00:09:15 INFO Client:54 - Application report for application_1550757972410_0001 (state: ACCEPTED) 2019-02-22 00:09:16 INFO Client:54 - Application report for application_1550757972410_0001 (state: ACCEPTED) 2019-02-22 00:09:17 INFO Client:54 - Application report for application_1550757972410_0001 (state: ACCEPTED) 2019-02-22 00:09:18 INFO Client:54 - Application report for application_1550757972410_0001 (state: ACCEPTED) 2019-02-22 00:09:19 INFO Client:54 - Application report for application_1550757972410_0001 (state: ACCEPTED) 2019-02-22 00:09:20 INFO Client:54 - Application report for application_1550757972410_0001 (state: ACCEPTED) 2019-02-22 00:09:21 INFO Client:54 - Application report for application_1550757972410_0001 (state: ACCEPTED) 2019-02-22 00:09:22 INFO Client:54 - Application report for application_1550757972410_0001 (state: ACCEPTED) 2019-02-22 00:09:23 INFO Client:54 - Application report for application_1550757972410_0001 (state: ACCEPTED) 2019-02-22 00:09:24 INFO Client:54 - Application report for application_1550757972410_0001 (state: ACCEPTED) 2019-02-22 00:09:25 INFO Client:54 - Application report for application_1550757972410_0001 (state: ACCEPTED) 2019-02-22 00:09:26 INFO Client:54 - Application report for application_1550757972410_0001 (state: ACCEPTED) 2019-02-22 00:09:27 INFO Client:54 - Application report for application_1550757972410_0001 (state: ACCEPTED) 2019-02-22 00:09:28 INFO Client:54 - Application report for application_1550757972410_0001 (state: ACCEPTED) 2019-02-22 00:09:29 INFO Client:54 - Application report for application_1550757972410_0001 (state: ACCEPTED) 2019-02-22 00:09:30 INFO Client:54 - Application report for application_1550757972410_0001 (state: ACCEPTED) 2019-02-22 00:09:31 INFO Client:54 - Application report for application_1550757972410_0001 (state: ACCEPTED) 2019-02-22 00:09:32 INFO Client:54 - Application report for application_1550757972410_0001 (state: ACCEPTED) 2019-02-22 00:09:33 INFO Client:54 - Application report for application_1550757972410_0001 (state: ACCEPTED) 2019-02-22 00:09:34 INFO Client:54 - Application report for application_1550757972410_0001 (state: RUNNING) 2019-02-22 00:09:34 INFO Client:54 - client token: N/A diagnostics: N/A ApplicationMaster host: hadoop2.org.cn ApplicationMaster RPC port: 43210 queue: default start time: 1550765313392 final status: UNDEFINED tracking URL: http://hadoop1:8088/proxy/application_1550757972410_0001/ user: root 2019-02-22 00:09:35 INFO Client:54 - Application report for application_1550757972410_0001 (state: RUNNING) 2019-02-22 00:09:36 INFO Client:54 - Application report for application_1550757972410_0001 (state: RUNNING) 2019-02-22 00:09:37 INFO Client:54 - Application report for application_1550757972410_0001 (state: RUNNING) 2019-02-22 00:09:38 INFO Client:54 - Application report for application_1550757972410_0001 (state: RUNNING) 2019-02-22 00:09:39 INFO Client:54 - Application report for application_1550757972410_0001 (state: RUNNING) 2019-02-22 00:09:40 INFO Client:54 - Application report for application_1550757972410_0001 (state: RUNNING) 2019-02-22 00:09:41 INFO Client:54 - Application report for application_1550757972410_0001 (state: RUNNING) 2019-02-22 00:09:42 INFO Client:54 - Application report for application_1550757972410_0001 (state: RUNNING) 2019-02-22 00:09:43 INFO Client:54 - Application report for application_1550757972410_0001 (state: RUNNING) 2019-02-22 00:09:44 INFO Client:54 - Application report for application_1550757972410_0001 (state: RUNNING) 2019-02-22 00:09:45 INFO Client:54 - Application report for application_1550757972410_0001 (state: RUNNING) 2019-02-22 00:09:46 INFO Client:54 - Application report for application_1550757972410_0001 (state: RUNNING) 2019-02-22 00:09:47 INFO Client:54 - Application report for application_1550757972410_0001 (state: RUNNING) 2019-02-22 00:09:48 INFO Client:54 - Application report for application_1550757972410_0001 (state: RUNNING) 2019-02-22 00:09:49 INFO Client:54 - Application report for application_1550757972410_0001 (state: RUNNING) 2019-02-22 00:09:50 INFO Client:54 - Application report for application_1550757972410_0001 (state: RUNNING) 2019-02-22 00:09:51 INFO Client:54 - Application report for application_1550757972410_0001 (state: RUNNING) 2019-02-22 00:09:52 INFO Client:54 - Application report for application_1550757972410_0001 (state: RUNNING) 2019-02-22 00:09:53 INFO Client:54 - Application report for application_1550757972410_0001 (state: RUNNING) 2019-02-22 00:09:54 INFO Client:54 - Application report for application_1550757972410_0001 (state: RUNNING) 2019-02-22 00:09:55 INFO Client:54 - Application report for application_1550757972410_0001 (state: RUNNING) 2019-02-22 00:09:56 INFO Client:54 - Application report for application_1550757972410_0001 (state: RUNNING) 2019-02-22 00:09:57 INFO Client:54 - Application report for application_1550757972410_0001 (state: RUNNING) 2019-02-22 00:09:58 INFO Client:54 - Application report for application_1550757972410_0001 (state: RUNNING) 2019-02-22 00:09:59 INFO Client:54 - Application report for application_1550757972410_0001 (state: RUNNING) 2019-02-22 00:10:00 INFO Client:54 - Application report for application_1550757972410_0001 (state: RUNNING) 2019-02-22 00:10:01 INFO Client:54 - Application report for application_1550757972410_0001 (state: RUNNING) 2019-02-22 00:10:02 INFO Client:54 - Application report for application_1550757972410_0001 (state: RUNNING) 2019-02-22 00:10:03 INFO Client:54 - Application report for application_1550757972410_0001 (state: RUNNING) 2019-02-22 00:10:04 INFO Client:54 - Application report for application_1550757972410_0001 (state: RUNNING) 2019-02-22 00:10:05 INFO Client:54 - Application report for application_1550757972410_0001 (state: RUNNING) 2019-02-22 00:10:06 INFO Client:54 - Application report for application_1550757972410_0001 (state: RUNNING) 2019-02-22 00:10:07 INFO Client:54 - Application report for application_1550757972410_0001 (state: RUNNING) 2019-02-22 00:10:08 INFO Client:54 - Application report for application_1550757972410_0001 (state: RUNNING) 2019-02-22 00:10:09 INFO Client:54 - Application report for application_1550757972410_0001 (state: RUNNING) 2019-02-22 00:10:10 INFO Client:54 - Application report for application_1550757972410_0001 (state: RUNNING) 2019-02-22 00:10:11 INFO Client:54 - Application report for application_1550757972410_0001 (state: RUNNING) 2019-02-22 00:10:12 INFO Client:54 - Application report for application_1550757972410_0001 (state: RUNNING) 2019-02-22 00:10:13 INFO Client:54 - Application report for application_1550757972410_0001 (state: RUNNING) 2019-02-22 00:10:14 INFO Client:54 - Application report for application_1550757972410_0001 (state: RUNNING) 2019-02-22 00:10:15 INFO Client:54 - Application report for application_1550757972410_0001 (state: RUNNING) 2019-02-22 00:10:16 INFO Client:54 - Application report for application_1550757972410_0001 (state: RUNNING) 2019-02-22 00:10:17 INFO Client:54 - Application report for application_1550757972410_0001 (state: RUNNING) 2019-02-22 00:10:18 INFO Client:54 - Application report for application_1550757972410_0001 (state: RUNNING) 2019-02-22 00:10:19 INFO Client:54 - Application report for application_1550757972410_0001 (state: RUNNING) 2019-02-22 00:10:20 INFO Client:54 - Application report for application_1550757972410_0001 (state: RUNNING) 2019-02-22 00:10:21 INFO Client:54 - Application report for application_1550757972410_0001 (state: RUNNING) 2019-02-22 00:10:22 INFO Client:54 - Application report for application_1550757972410_0001 (state: RUNNING) 2019-02-22 00:10:23 INFO Client:54 - Application report for application_1550757972410_0001 (state: RUNNING) 2019-02-22 00:10:24 INFO Client:54 - Application report for application_1550757972410_0001 (state: RUNNING) 2019-02-22 00:10:25 INFO Client:54 - Application report for application_1550757972410_0001 (state: RUNNING) 2019-02-22 00:10:26 INFO Client:54 - Application report for application_1550757972410_0001 (state: RUNNING) 2019-02-22 00:10:27 INFO Client:54 - Application report for application_1550757972410_0001 (state: RUNNING) 2019-02-22 00:10:28 INFO Client:54 - Application report for application_1550757972410_0001 (state: RUNNING) 2019-02-22 00:10:29 INFO Client:54 - Application report for application_1550757972410_0001 (state: RUNNING) 2019-02-22 00:10:30 INFO Client:54 - Application report for application_1550757972410_0001 (state: RUNNING) 2019-02-22 00:10:31 INFO Client:54 - Application report for application_1550757972410_0001 (state: RUNNING) 2019-02-22 00:10:32 INFO Client:54 - Application report for application_1550757972410_0001 (state: RUNNING) 2019-02-22 00:10:33 INFO Client:54 - Application report for application_1550757972410_0001 (state: RUNNING) 2019-02-22 00:10:35 INFO Client:54 - Application report for application_1550757972410_0001 (state: RUNNING) 2019-02-22 00:10:36 INFO Client:54 - Application report for application_1550757972410_0001 (state: RUNNING) 2019-02-22 00:10:37 INFO Client:54 - Application report for application_1550757972410_0001 (state: RUNNING) 2019-02-22 00:10:38 INFO Client:54 - Application report for application_1550757972410_0001 (state: RUNNING) 2019-02-22 00:10:39 INFO Client:54 - Application report for application_1550757972410_0001 (state: RUNNING) 2019-02-22 00:10:40 INFO Client:54 - Application report for application_1550757972410_0001 (state: RUNNING) 2019-02-22 00:10:41 INFO Client:54 - Application report for application_1550757972410_0001 (state: RUNNING) 2019-02-22 00:10:42 INFO Client:54 - Application report for application_1550757972410_0001 (state: RUNNING) 2019-02-22 00:10:43 INFO Client:54 - Application report for application_1550757972410_0001 (state: FINISHED) 2019-02-22 00:10:43 INFO Client:54 - client token: N/A diagnostics: N/A ApplicationMaster host: hadoop2.org.cn ApplicationMaster RPC port: 43210 queue: default start time: 1550765313392 final status: SUCCEEDED tracking URL: http://hadoop1:8088/proxy/application_1550757972410_0001/ user: root 2019-02-22 00:10:44 INFO ShutdownHookManager:54 - Shutdown hook called 2019-02-22 00:10:44 INFO ShutdownHookManager:54 - Deleting directory /tmp/spark-afb05931-e273-4e9f-b38a-d3ca234dfb34 2019-02-22 00:10:44 INFO ShutdownHookManager:54 - Deleting directory /tmp/spark-2b75f51c-ce24-4767-aa38-6d3262b1c7cb [root@hadoop1 spark-2.4.0-bin-hadoop2.7]#
其他在python以及kubernets的相关运行程序就不在赘述了。
yarn-client模式
[root@hadoop1 spark-2.4.0-bin-hadoop2.7]# ./bin/spark-submit --class org.apache.spark.examples.SparkPi --master yarn-client examples/jars/spark-examples_2.11-2.4.0.jar Warning: Master yarn-client is deprecated since 2.0. Please use master "yarn" with specified deploy mode instead. 2019-02-22 00:31:15 WARN NativeCodeLoader:62 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 2019-02-22 00:31:16 INFO SparkContext:54 - Running Spark version 2.4.0 2019-02-22 00:31:16 INFO SparkContext:54 - Submitted application: Spark Pi 2019-02-22 00:31:16 INFO SecurityManager:54 - Changing view acls to: root 2019-02-22 00:31:16 INFO SecurityManager:54 - Changing modify acls to: root 2019-02-22 00:31:16 INFO SecurityManager:54 - Changing view acls groups to: 2019-02-22 00:31:16 INFO SecurityManager:54 - Changing modify acls groups to: 2019-02-22 00:31:16 INFO SecurityManager:54 - SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); groups with view permissions: Set(); users with modify permissions: Set(root); groups with modify permissions: Set() 2019-02-22 00:31:18 INFO Utils:54 - Successfully started service 'sparkDriver' on port 34169. 2019-02-22 00:31:18 INFO SparkEnv:54 - Registering MapOutputTracker 2019-02-22 00:31:18 INFO SparkEnv:54 - Registering BlockManagerMaster 2019-02-22 00:31:18 INFO BlockManagerMasterEndpoint:54 - Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information 2019-02-22 00:31:18 INFO BlockManagerMasterEndpoint:54 - BlockManagerMasterEndpoint up 2019-02-22 00:31:18 INFO DiskBlockManager:54 - Created local directory at /tmp/blockmgr-f9baa979-e964-46a9-b034-475ba5148562 2019-02-22 00:31:18 INFO MemoryStore:54 - MemoryStore started with capacity 413.9 MB 2019-02-22 00:31:18 INFO SparkEnv:54 - Registering OutputCommitCoordinator 2019-02-22 00:31:18 INFO log:192 - Logging initialized @9175ms 2019-02-22 00:31:19 INFO Server:351 - jetty-9.3.z-SNAPSHOT, build timestamp: unknown, git hash: unknown 2019-02-22 00:31:19 INFO Server:419 - Started @9359ms 2019-02-22 00:31:19 INFO AbstractConnector:278 - Started ServerConnector@47a64f7d{HTTP/1.1,[http/1.1]}{192.168.217.201:4040} 2019-02-22 00:31:19 INFO Utils:54 - Successfully started service 'SparkUI' on port 4040. 2019-02-22 00:31:19 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@49ef32e0{/jobs,null,AVAILABLE,@Spark} 2019-02-22 00:31:19 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@3be8821f{/jobs/json,null,AVAILABLE,@Spark} 2019-02-22 00:31:19 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@64b31700{/jobs/job,null,AVAILABLE,@Spark} 2019-02-22 00:31:19 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@bae47a0{/jobs/job/json,null,AVAILABLE,@Spark} 2019-02-22 00:31:19 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@74a9c4b0{/stages,null,AVAILABLE,@Spark} 2019-02-22 00:31:19 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@85ec632{/stages/json,null,AVAILABLE,@Spark} 2019-02-22 00:31:19 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@1c05a54d{/stages/stage,null,AVAILABLE,@Spark} 2019-02-22 00:31:19 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@214894fc{/stages/stage/json,null,AVAILABLE,@Spark} 2019-02-22 00:31:19 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@10567255{/stages/pool,null,AVAILABLE,@Spark} 2019-02-22 00:31:19 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@e362c57{/stages/pool/json,null,AVAILABLE,@Spark} 2019-02-22 00:31:19 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@1c4ee95c{/storage,null,AVAILABLE,@Spark} 2019-02-22 00:31:19 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@79c4715d{/storage/json,null,AVAILABLE,@Spark} 2019-02-22 00:31:19 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@5aa360ea{/storage/rdd,null,AVAILABLE,@Spark} 2019-02-22 00:31:19 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@6548bb7d{/storage/rdd/json,null,AVAILABLE,@Spark} 2019-02-22 00:31:19 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@e27ba81{/environment,null,AVAILABLE,@Spark} 2019-02-22 00:31:19 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@54336c81{/environment/json,null,AVAILABLE,@Spark} 2019-02-22 00:31:19 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@1556f2dd{/executors,null,AVAILABLE,@Spark} 2019-02-22 00:31:19 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@35e52059{/executors/json,null,AVAILABLE,@Spark} 2019-02-22 00:31:19 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@62577d6{/executors/threadDump,null,AVAILABLE,@Spark} 2019-02-22 00:31:19 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@49bd54f7{/executors/threadDump/json,null,AVAILABLE,@Spark} 2019-02-22 00:31:19 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@6b5f8707{/static,null,AVAILABLE,@Spark} 2019-02-22 00:31:19 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@17ae98d7{/,null,AVAILABLE,@Spark} 2019-02-22 00:31:19 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@59221b97{/api,null,AVAILABLE,@Spark} 2019-02-22 00:31:19 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@704b2127{/jobs/job/kill,null,AVAILABLE,@Spark} 2019-02-22 00:31:19 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@3ee39da0{/stages/stage/kill,null,AVAILABLE,@Spark} 2019-02-22 00:31:19 INFO SparkUI:54 - Bound SparkUI to 192.168.217.201, and started at http://hadoop1.org.cn:4040 2019-02-22 00:31:19 INFO SparkContext:54 - Added JAR file:/usr/hdp/spark-2.4.0-bin-hadoop2.7/examples/jars/spark-examples_2.11-2.4.0.jar at spark://hadoop1.org.cn:34169/jars/spark-examples_2.11-2.4.0.jar with timestamp 1550766679506 2019-02-22 00:31:22 INFO RMProxy:98 - Connecting to ResourceManager at hadoop1/192.168.217.201:8032 2019-02-22 00:31:22 INFO Client:54 - Requesting a new application from cluster with 2 NodeManagers 2019-02-22 00:31:22 INFO Client:54 - Verifying our application has not requested more than the maximum memory capability of the cluster (2048 MB per container) 2019-02-22 00:31:22 INFO Client:54 - Will allocate AM container, with 896 MB memory including 384 MB overhead 2019-02-22 00:31:22 INFO Client:54 - Setting up container launch context for our AM 2019-02-22 00:31:22 INFO Client:54 - Setting up the launch environment for our AM container 2019-02-22 00:31:22 INFO Client:54 - Preparing resources for our AM container 2019-02-22 00:31:22 WARN Client:66 - Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME. 2019-02-22 00:31:38 INFO Client:54 - Uploading resource file:/tmp/spark-bc395b60-843f-4e24-841c-1fb09330b89f/__spark_libs__4384247224971462772.zip -> hdfs://hadoop1:9000/user/root/.sparkStaging/application_1550757972410_0006/__spark_libs__4384247224971462772.zip 2019-02-22 00:31:53 INFO Client:54 - Uploading resource file:/tmp/spark-bc395b60-843f-4e24-841c-1fb09330b89f/__spark_conf__7312670304741942310.zip -> hdfs://hadoop1:9000/user/root/.sparkStaging/application_1550757972410_0006/__spark_conf__.zip 2019-02-22 00:31:53 INFO SecurityManager:54 - Changing view acls to: root 2019-02-22 00:31:53 INFO SecurityManager:54 - Changing modify acls to: root 2019-02-22 00:31:53 INFO SecurityManager:54 - Changing view acls groups to: 2019-02-22 00:31:53 INFO SecurityManager:54 - Changing modify acls groups to: 2019-02-22 00:31:53 INFO SecurityManager:54 - SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); groups with view permissions: Set(); users with modify permissions: Set(root); groups with modify permissions: Set() 2019-02-22 00:31:56 INFO Client:54 - Submitting application application_1550757972410_0006 to ResourceManager 2019-02-22 00:31:56 INFO YarnClientImpl:273 - Submitted application application_1550757972410_0006 2019-02-22 00:31:56 INFO SchedulerExtensionServices:54 - Starting Yarn extension services with app application_1550757972410_0006 and attemptId None 2019-02-22 00:31:57 INFO Client:54 - Application report for application_1550757972410_0006 (state: ACCEPTED) 2019-02-22 00:31:57 INFO Client:54 - client token: N/A diagnostics: AM container is launched, waiting for AM container to Register with RM ApplicationMaster host: N/A ApplicationMaster RPC port: -1 queue: default start time: 1550766716248 final status: UNDEFINED tracking URL: http://hadoop1:8088/proxy/application_1550757972410_0006/ user: root 2019-02-22 00:31:58 INFO Client:54 - Application report for application_1550757972410_0006 (state: ACCEPTED) 2019-02-22 00:31:59 INFO Client:54 - Application report for application_1550757972410_0006 (state: ACCEPTED) 2019-02-22 00:32:00 INFO Client:54 - Application report for application_1550757972410_0006 (state: ACCEPTED) 2019-02-22 00:32:01 INFO Client:54 - Application report for application_1550757972410_0006 (state: ACCEPTED) 2019-02-22 00:32:02 INFO Client:54 - Application report for application_1550757972410_0006 (state: ACCEPTED) 2019-02-22 00:32:03 INFO Client:54 - Application report for application_1550757972410_0006 (state: ACCEPTED) 2019-02-22 00:32:04 INFO Client:54 - Application report for application_1550757972410_0006 (state: ACCEPTED) 2019-02-22 00:32:05 INFO Client:54 - Application report for application_1550757972410_0006 (state: ACCEPTED) 2019-02-22 00:32:06 INFO Client:54 - Application report for application_1550757972410_0006 (state: ACCEPTED) 2019-02-22 00:32:07 INFO Client:54 - Application report for application_1550757972410_0006 (state: ACCEPTED) 2019-02-22 00:32:08 INFO Client:54 - Application report for application_1550757972410_0006 (state: ACCEPTED) 2019-02-22 00:32:09 INFO Client:54 - Application report for application_1550757972410_0006 (state: ACCEPTED) 2019-02-22 00:32:10 INFO Client:54 - Application report for application_1550757972410_0006 (state: ACCEPTED) 2019-02-22 00:32:12 INFO Client:54 - Application report for application_1550757972410_0006 (state: ACCEPTED) 2019-02-22 00:32:13 INFO Client:54 - Application report for application_1550757972410_0006 (state: ACCEPTED) 2019-02-22 00:32:14 INFO Client:54 - Application report for application_1550757972410_0006 (state: ACCEPTED) 2019-02-22 00:32:15 INFO Client:54 - Application report for application_1550757972410_0006 (state: ACCEPTED) 2019-02-22 00:32:16 INFO Client:54 - Application report for application_1550757972410_0006 (state: ACCEPTED) 2019-02-22 00:32:17 INFO Client:54 - Application report for application_1550757972410_0006 (state: ACCEPTED) 2019-02-22 00:32:18 INFO Client:54 - Application report for application_1550757972410_0006 (state: ACCEPTED) 2019-02-22 00:32:19 INFO Client:54 - Application report for application_1550757972410_0006 (state: ACCEPTED) 2019-02-22 00:32:20 INFO Client:54 - Application report for application_1550757972410_0006 (state: ACCEPTED) 2019-02-22 00:32:21 INFO Client:54 - Application report for application_1550757972410_0006 (state: ACCEPTED) 2019-02-22 00:32:22 INFO Client:54 - Application report for application_1550757972410_0006 (state: ACCEPTED) 2019-02-22 00:32:23 INFO Client:54 - Application report for application_1550757972410_0006 (state: ACCEPTED) 2019-02-22 00:32:24 INFO Client:54 - Application report for application_1550757972410_0006 (state: ACCEPTED) 2019-02-22 00:32:25 INFO Client:54 - Application report for application_1550757972410_0006 (state: ACCEPTED) 2019-02-22 00:32:26 INFO Client:54 - Application report for application_1550757972410_0006 (state: ACCEPTED) 2019-02-22 00:32:27 INFO Client:54 - Application report for application_1550757972410_0006 (state: ACCEPTED) 2019-02-22 00:32:28 INFO Client:54 - Application report for application_1550757972410_0006 (state: ACCEPTED) 2019-02-22 00:32:29 INFO Client:54 - Application report for application_1550757972410_0006 (state: ACCEPTED) 2019-02-22 00:32:30 INFO Client:54 - Application report for application_1550757972410_0006 (state: ACCEPTED) 2019-02-22 00:32:31 INFO Client:54 - Application report for application_1550757972410_0006 (state: RUNNING) 2019-02-22 00:32:31 INFO Client:54 - client token: N/A diagnostics: N/A ApplicationMaster host: 192.168.217.203 ApplicationMaster RPC port: -1 queue: default start time: 1550766716248 final status: UNDEFINED tracking URL: http://hadoop1:8088/proxy/application_1550757972410_0006/ user: root 2019-02-22 00:32:31 INFO YarnClientSchedulerBackend:54 - Application application_1550757972410_0006 has started running. 2019-02-22 00:32:31 INFO Utils:54 - Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 51287. 2019-02-22 00:32:31 INFO NettyBlockTransferService:54 - Server created on hadoop1.org.cn:51287 2019-02-22 00:32:31 INFO BlockManager:54 - Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy 2019-02-22 00:32:31 INFO BlockManagerMaster:54 - Registering BlockManager BlockManagerId(driver, hadoop1.org.cn, 51287, None) 2019-02-22 00:32:31 INFO BlockManagerMasterEndpoint:54 - Registering block manager hadoop1.org.cn:51287 with 413.9 MB RAM, BlockManagerId(driver, hadoop1.org.cn, 51287, None) 2019-02-22 00:32:31 INFO BlockManagerMaster:54 - Registered BlockManager BlockManagerId(driver, hadoop1.org.cn, 51287, None) 2019-02-22 00:32:31 INFO BlockManager:54 - Initialized BlockManager: BlockManagerId(driver, hadoop1.org.cn, 51287, None) 2019-02-22 00:32:32 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@1788cb61{/metrics/json,null,AVAILABLE,@Spark} 2019-02-22 00:32:32 INFO YarnClientSchedulerBackend:54 - Add WebUI Filter. org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter, Map(PROXY_HOSTS -> hadoop1, PROXY_URI_BASES -> http://hadoop1:8088/proxy/application_1550757972410_0006), /proxy/application_1550757972410_0006 2019-02-22 00:32:32 INFO JettyUtils:54 - Adding filter org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter to /jobs, /jobs/json, /jobs/job, /jobs/job/json, /stages, /stages/json, /stages/stage, /stages/stage/json, /stages/pool, /stages/pool/json, /storage, /storage/json, /storage/rdd, /storage/rdd/json, /environment, /environment/json, /executors, /executors/json, /executors/threadDump, /executors/threadDump/json, /static, /, /api, /jobs/job/kill, /stages/stage/kill, /metrics/json. 2019-02-22 00:32:32 INFO YarnClientSchedulerBackend:54 - SchedulerBackend is ready for scheduling beginning after waiting maxRegisteredResourcesWaitingTime: 30000(ms) 2019-02-22 00:32:33 INFO YarnSchedulerBackend$YarnSchedulerEndpoint:54 - ApplicationMaster registered as NettyRpcEndpointRef(spark-client://YarnAM) 2019-02-22 00:32:35 INFO SparkContext:54 - Starting job: reduce at SparkPi.scala:38 2019-02-22 00:32:37 INFO DAGScheduler:54 - Got job 0 (reduce at SparkPi.scala:38) with 2 output partitions 2019-02-22 00:32:37 INFO DAGScheduler:54 - Final stage: ResultStage 0 (reduce at SparkPi.scala:38) 2019-02-22 00:32:37 INFO DAGScheduler:54 - Parents of final stage: List() 2019-02-22 00:32:37 INFO DAGScheduler:54 - Missing parents: List() 2019-02-22 00:32:37 INFO DAGScheduler:54 - Submitting ResultStage 0 (MapPartitionsRDD[1] at map at SparkPi.scala:34), which has no missing parents 2019-02-22 00:32:47 INFO MemoryStore:54 - Block broadcast_0 stored as values in memory (estimated size 1936.0 B, free 413.9 MB) 2019-02-22 00:32:48 INFO MemoryStore:54 - Block broadcast_0_piece0 stored as bytes in memory (estimated size 1256.0 B, free 413.9 MB) 2019-02-22 00:32:48 INFO BlockManagerInfo:54 - Added broadcast_0_piece0 in memory on hadoop1.org.cn:51287 (size: 1256.0 B, free: 413.9 MB) 2019-02-22 00:32:49 INFO SparkContext:54 - Created broadcast 0 from broadcast at DAGScheduler.scala:1161 2019-02-22 00:32:49 INFO DAGScheduler:54 - Submitting 2 missing tasks from ResultStage 0 (MapPartitionsRDD[1] at map at SparkPi.scala:34) (first 15 tasks are for partitions Vector(0, 1)) 2019-02-22 00:32:49 INFO YarnScheduler:54 - Adding task set 0.0 with 2 tasks 2019-02-22 00:33:04 WARN YarnScheduler:66 - Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources 2019-02-22 00:33:10 INFO YarnSchedulerBackend$YarnDriverEndpoint:54 - Registered executor NettyRpcEndpointRef(spark-client://Executor) (192.168.217.202:43729) with ID 1 2019-02-22 00:33:11 INFO TaskSetManager:54 - Starting task 0.0 in stage 0.0 (TID 0, hadoop2.org.cn, executor 1, partition 0, PROCESS_LOCAL, 7877 bytes) 2019-02-22 00:33:11 INFO BlockManagerMasterEndpoint:54 - Registering block manager hadoop2.org.cn:33875 with 413.9 MB RAM, BlockManagerId(1, hadoop2.org.cn, 33875, None) 2019-02-22 00:33:12 INFO BlockManagerInfo:54 - Added broadcast_0_piece0 in memory on hadoop2.org.cn:33875 (size: 1256.0 B, free: 413.9 MB) 2019-02-22 00:33:13 INFO TaskSetManager:54 - Starting task 1.0 in stage 0.0 (TID 1, hadoop2.org.cn, executor 1, partition 1, PROCESS_LOCAL, 7877 bytes) 2019-02-22 00:33:13 INFO TaskSetManager:54 - Finished task 0.0 in stage 0.0 (TID 0) in 2285 ms on hadoop2.org.cn (executor 1) (1/2) 2019-02-22 00:33:13 INFO TaskSetManager:54 - Finished task 1.0 in stage 0.0 (TID 1) in 132 ms on hadoop2.org.cn (executor 1) (2/2) 2019-02-22 00:33:13 INFO DAGScheduler:54 - ResultStage 0 (reduce at SparkPi.scala:38) finished in 34.651 s 2019-02-22 00:33:13 INFO YarnScheduler:54 - Removed TaskSet 0.0, whose tasks have all completed, from pool 2019-02-22 00:33:13 INFO DAGScheduler:54 - Job 0 finished: reduce at SparkPi.scala:38, took 37.594449 s Pi is roughly 3.14281571407857 2019-02-22 00:33:13 INFO AbstractConnector:318 - Stopped Spark@47a64f7d{HTTP/1.1,[http/1.1]}{192.168.217.201:4040} 2019-02-22 00:33:13 INFO SparkUI:54 - Stopped Spark web UI at http://hadoop1.org.cn:4040 2019-02-22 00:33:13 INFO YarnClientSchedulerBackend:54 - Interrupting monitor thread 2019-02-22 00:33:13 INFO YarnClientSchedulerBackend:54 - Shutting down all executors 2019-02-22 00:33:13 INFO YarnSchedulerBackend$YarnDriverEndpoint:54 - Asking each executor to shut down 2019-02-22 00:33:14 INFO SchedulerExtensionServices:54 - Stopping SchedulerExtensionServices (serviceOption=None, services=List(), started=false) 2019-02-22 00:33:14 INFO YarnClientSchedulerBackend:54 - Stopped 2019-02-22 00:33:14 INFO MapOutputTrackerMasterEndpoint:54 - MapOutputTrackerMasterEndpoint stopped! 2019-02-22 00:33:14 INFO MemoryStore:54 - MemoryStore cleared 2019-02-22 00:33:14 INFO BlockManager:54 - BlockManager stopped 2019-02-22 00:33:14 INFO BlockManagerMaster:54 - BlockManagerMaster stopped 2019-02-22 00:33:14 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint:54 - OutputCommitCoordinator stopped! 2019-02-22 00:33:14 INFO SparkContext:54 - Successfully stopped SparkContext 2019-02-22 00:33:14 INFO ShutdownHookManager:54 - Shutdown hook called 2019-02-22 00:33:14 INFO ShutdownHookManager:54 - Deleting directory /tmp/spark-ea671cae-988b-4f5b-a85f-184ed2dba58d 2019-02-22 00:33:15 INFO ShutdownHookManager:54 - Deleting directory /tmp/spark-bc395b60-843f-4e24-841c-1fb09330b89f
Master URLS
传递给Spark的Master URL可以采用以下格式之一:
- local:使用一个工作线程在本地运行Spark(即根本没有并行性)。
- local[K]:使用K个工作线程在本地运行Spark(理想情况下,将其设置为计算机上的核心数)。
- local[K,F]:使用K个工作线程和F个maxFailures在本地运行Spark(有关此变量的说明,请参阅spark.task.maxFailures)
- local[*]:使用与计算机上的逻辑核心一样多的工作线程在本地运行Spark。
- local [*,F]:本地运行Spark,其中包含与计算机和F maxFailures上的逻辑核心一样多的工作线程。
- spark://HOST:PORT:连接到给定的Spark独立集群主服务器。端口必须是主服务器配置使用的端口,默认为7077。
- spark://HOST1:PORT1,HOST2:PORT2:使用Zookeeper的备用主服务器连接到给定的Spark独立群集。该列表必须具有使用Zookeeper设置的高可用性群集中的所有主主机。端口必须是每个主服务器配置使用的默认端口,默认为7077。
- mesos://HOST:PORT:连接到给定的Mesos群集。端口必须是您配置使用的端口,默认为5050。或者,对于使用ZooKeeper的Mesos集群,请使用mesos://zk://....要使用--deploy-mode集群进行提交,应将HOST:PORT配置为连接到MesosClusterDispatcher。
- yarn:以客户端或集群模式连接到YARN集群,具体取决于--deploy-mode的值。将根据HADOOP_CONF_DIR或YARN_CONF_DIR变量找到群集位置。
- k8s://HOST:PORT:以群集模式连接到Kubernetes群集。客户端模式目前不受支持,将来的版本将支持。HOST和PORT参考[Kubernetes API服务器](https://kubernetes.io/docs/reference/generated/kube-apiserver/)。它默认使用TLS连接。为了强制它使用不安全的连接,您可以使用k8s://http://HOST:PORT。
从文件加载配置
spark-submit脚本可以从属性文件加载默认的Spark配置值,并将它们传递给您的应用程序。默认情况下,它将从Spark目录中的conf/spark-defaults.conf中读取选项。有关更多详细信息,请参阅有关加载默认配置的部分。
以这种方式加载默认Spark配置可以避免某些标志需要spark-submit。例如,如果设置了spark.master属性,则可以安全地从spark-submit中省略--master标志。通常,在SparkConf上显式设置的配置值采用最高优先级,然后传递给spark-submit的标志,然后是默认文件中的值。
如果您不清楚配置选项的来源,可以通过使用--verbose选项运行spark-submit来打印细粒度的调试信息。
高级依赖管理
使用spark-submit时,应用程序jar以及--jars选项中包含的任何jar都将自动传输到群集。-jars之后提供的URL必须用逗号分隔。该列表包含在驱动程序和执行程序类路径中。目录扩展不适用于--jars。
Spark使用以下URL方案来允许传播jar的不同策略:
file:-绝对路径和文file://URI由驱动程序的HTTP文件服务器提供服务,每个执行程序从驱动程序HTTP服务器提取文件。
hdfs:、http:、https:、ftp:-这些从URI中按预期下拉文件和JAR。
local:- 以local:/开头的URI应该作为每个工作节点上的本地文件存在。这意味着不会产生任何网络IO,并且适用于推送给每个工作者或通过NFS,GlusterFS等共享的大型文件/JAR。
请注意,JAR和文件将复制到执行程序节点上的每个SparkContext的工作目录中。随着时间的推移,这会占用大量空间,需要进行清理。使用YARN,可以自动处理清理,使用Spark standalone,可以使用spark.worker.cleanup.appDataTtl属性配置自动清理。
用户还可以通过使用--packages提供以逗号分隔的Maven坐标列表来包含任何其他依赖项。使用此命令时将处理所有传递依赖项。可以使用标志--repositories以逗号分隔的方式添加其他存储库(或SBT中的解析程序)。(请注意,在某些情况下,可以在存储库URI中提供受密码保护的存储库的凭据,例如在https://user:password@host/ ....以这种方式提供凭据时要小心。)这些命令可以是与pyspark,spark-shell和spark-submit一起使用以包含Spark Packages。
对于Python,可以使用等效的--py-files选项将.egg,.zip和.py库分发给执行程序。
更多信息
部署应用程序后,集群模式概述描述了分布式执行中涉及的组件,以及如何监视和调试应用程序。
坚壁清野