不知不觉,已经到了Spark的第19篇博客了,这个系列很不系统,基本上是学到哪写到哪,而不是成竹在胸之后,高屋建瓴的写,这个等到对Spark有了比较深刻的理解和把握之后再来整理这些博客,毕竟刚接触Spark10天,继续!
在之前的文章中,Spark都是使用默认的伪分布式部署方式,没有从系统部署的角度去审视Spark,目前的状态是能运行Spark能跑通例子的程度,在此之前,Spark的配置文件内容是:
export SCALA_HOME=/home/hadoop/software/scala-2.11.4 export JAVA_HOME=/home/hadoop/software/jdk1.7.0_67 ###localhost表示MASTER节点在本机? export SPARK_MASTER=localhost ###这个表示什么意思? export SPARK_LOCAL_IP=localhost export HADOOP_HOME=/home/hadoop/software/hadoop-2.5.2 export SPARK_HOME=/home/hadoop/software/spark-1.2.0-bin-hadoop2.4 export SPARK_LIBARY_PATH=.:$JAVA_HOME/lib:$JAVA_HOME/jre/lib:$HADOOP_HOME/lib/native ###这个是干啥用的,如果使用Spark独立运行的话,应该不需要配置YARN相关的选项 export YARN_CONF_DIR=$HADOOP_HOME/etc/hadoop
Spark基于YARN的两种部署方式
- yarn-client
- yarn-cluster
1. yarn-client
在这种模式下,Spark driver在客户机上运行,然后向YARN申请运行exeutor以运行Task,即Driver和YARN是分开的,Driver程序作为YARN集群的一个客户端,这是一种CS模式
2. yarn-cluster
这种模式下,Spark driver将作为一个ApplicationMaster在YARN集群中先启动,然后再由ApplicationMaster向RM申请资源启动executor以运行Task。也就是说,在这种部署方式下,Driver程序运行在YARN集群上
在YARN中部署Spark应用程序时,可以使用Spark的bin/spark-submit提交Spark应用程序。在YARN上部署Spark应用程序的时候,不需要象Standalone、Mesos一样提供URL作为master参数的值,因为Spark应用程序可以在hadoop的配置文件里面获取相关的信息,所以只需要简单以yarn-cluster或yarn-client指定给master就可以了。因此,因为Spark需要从hadoop(或者具体的yarn相关的配置)的配置文件中获取相关的信息,所以需要配置环境变量HADOOP_CONF_DIR或者YARN_CONF_DIR。
所在在上面的配置再加一个配置项到conf/spark-env.sh中,同时在/etc/profile中也添加一行
export YARN_CONF_DIR=$HADOOP_HOME/etc/hadoop
yarn-client部署
1. 提交命令:
./spark-submit --name SparkWordCount --class spark.examples.SparkWordCount --master yarn-client --executor-memory 512M --total-executor-cores 1 SparkWordCount.jar README.md
对比之前的由Spark自己管理计算资源的提交方式
./spark-submit --name SparkWordCount --class spark.examples.SparkWordCount --master spark://hadoop.master:7077 --executor-memory 512M --total-executor-cores 1 SparkWordCount.jar README.md
2. 说明:
2.1. 采用yarn-client方式,因为driver在客户端,所以可以通过webUI访问driver的状态,默认是http://hadoop.master:4040访问,而YARN通过http://haoop.master:8088访问。
2.2 提交一个作业,产生的日志以及整个过程貌似很复杂的样子
[hadoop@hadoop bin]$ sh submitSparkApplicationYarnClient.sh //yarn-client方式提交任务 Delete the HDFS output directory //删除上次执行任务时,产生的HDFS输出目录 15/01/10 07:27:49 INFO fs.TrashPolicyDefault: Namenode trash configuration: Deletion interval = 0 minutes, Emptier interval = 0 minutes. Deleted /user/hadoop/SortedWordCountRDDInSparkApplication Spark assembly has been built with Hive, including Datanucleus jars on classpath 15/01/10 07:27:52 INFO spark.SecurityManager: Changing view acls to: hadoop 15/01/10 07:27:52 INFO spark.SecurityManager: Changing modify acls to: hadoop 15/01/10 07:27:52 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(hadoop); users with modify permissions: Set(hadoop) 15/01/10 07:27:53 INFO slf4j.Slf4jLogger: Slf4jLogger started 15/01/10 07:27:53 INFO Remoting: Starting remoting 15/01/10 07:27:54 INFO util.Utils: Successfully started service 'sparkDriver' on port 35401. 15/01/10 07:27:54 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriver@localhost:35401] 15/01/10 07:27:54 INFO spark.SparkEnv: Registering MapOutputTracker 15/01/10 07:27:54 INFO spark.SparkEnv: Registering BlockManagerMaster 15/01/10 07:27:54 INFO storage.DiskBlockManager: Created local directory at /tmp/spark-local-20150110072754-dcdf 15/01/10 07:27:54 INFO storage.MemoryStore: MemoryStore started with capacity 267.3 MB 15/01/10 07:27:55 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 15/01/10 07:27:56 INFO spark.HttpFileServer: HTTP File server directory is /tmp/spark-8f55f6ec-399b-4371-9ab4-d648047381c5 15/01/10 07:27:56 INFO spark.HttpServer: Starting HTTP Server 15/01/10 07:27:56 INFO server.Server: jetty-8.y.z-SNAPSHOT 15/01/10 07:27:57 INFO server.AbstractConnector: Started [email protected]:52196 15/01/10 07:27:57 INFO util.Utils: Successfully started service 'HTTP file server' on port 52196. 15/01/10 07:27:57 INFO server.Server: jetty-8.y.z-SNAPSHOT 15/01/10 07:27:58 INFO server.AbstractConnector: Started [email protected]:4040 15/01/10 07:27:58 INFO util.Utils: Successfully started service 'SparkUI' on port 4040. 15/01/10 07:27:58 INFO ui.SparkUI: Started SparkUI at http://localhost:4040 15/01/10 07:27:58 INFO spark.SparkContext: Added JAR file:/home/hadoop/software/spark-1.2.0-bin-hadoop2.4/bin/SparkWordCount.jar at http://localhost:52196/jars/SparkWordCount.jar with timestamp 1420892878400 //////////到此时,Spark1工作做完,将任务提交给Yarn///// 15/01/10 07:28:00 INFO client.RMProxy: Connecting to ResourceManager at hadoop.master/192.168.26.136:8032 //连接ResourceManager 15/01/10 07:28:02 INFO yarn.Client: Requesting a new application from cluster with 1 NodeManagers //申请资源 15/01/10 07:28:02 INFO yarn.Client: Verifying our application has not requested more than the maximum memory capability of the cluster (8192 MB per container) 15/01/10 07:28:02 INFO yarn.Client: Will allocate AM container, with 896 MB memory including 384 MB overhead //分配一个资源单位,AM Container 15/01/10 07:28:02 INFO yarn.Client: Setting up container launch context for our AM //设置container 15/01/10 07:28:02 INFO yarn.Client: Preparing resources for our AM container 15/01/10 07:28:03 INFO yarn.Client: Uploading resource file:/home/hadoop/software/spark-1.2.0-bin-hadoop2.4/lib/spark-assembly-1.2.0-hadoop2.4.0.jar -> hdfs://hadoop.master:9000/user/hadoop/.sparkStaging/application_1420859110621_0002/spark-assembly-1.2.0-hadoop2.4.0.jar ////把spark-assembly-1.2.0-hadoop-2.4.0.jar上传到HDFS上??? 15/01/10 07:28:22 INFO yarn.Client: Setting up the launch environment for our AM container 15/01/10 07:28:22 INFO spark.SecurityManager: Changing view acls to: hadoop 15/01/10 07:28:22 INFO spark.SecurityManager: Changing modify acls to: hadoop 15/01/10 07:28:22 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(hadoop); users with modify permissions: Set(hadoop) 15/01/10 07:28:22 INFO yarn.Client: Submitting application 2 to ResourceManager ////任务提交 15/01/10 07:28:22 INFO impl.YarnClientImpl: Submitted application application_1420859110621_0002 15/01/10 07:28:23 INFO yarn.Client: Application report for application_1420859110621_0002 (state: ACCEPTED) 15/01/10 07:28:23 INFO yarn.Client: client token: N/A diagnostics: N/A ApplicationMaster host: N/A ApplicationMaster RPC port: -1 queue: default start time: 1420892902791 final status: UNDEFINED tracking URL: http://hadoop.master:8088/proxy/application_1420859110621_0002/ user: hadoop ///下面这一坨是什么情况?每秒钟rport一次状态?那这得产生多少垃圾日志? 15/01/10 07:28:24 INFO yarn.Client: Application report for application_1420859110621_0002 (state: ACCEPTED) 15/01/10 07:28:26 INFO yarn.Client: Application report for application_1420859110621_0002 (state: ACCEPTED) 15/01/10 07:28:27 INFO yarn.Client: Application report for application_1420859110621_0002 (state: ACCEPTED) 15/01/10 07:28:28 INFO yarn.Client: Application report for application_1420859110621_0002 (state: ACCEPTED) 15/01/10 07:28:29 INFO yarn.Client: Application report for application_1420859110621_0002 (state: ACCEPTED) 15/01/10 07:28:30 INFO yarn.Client: Application report for application_1420859110621_0002 (state: ACCEPTED) 15/01/10 07:28:31 INFO yarn.Client: Application report for application_1420859110621_0002 (state: ACCEPTED) 15/01/10 07:28:32 INFO yarn.Client: Application report for application_1420859110621_0002 (state: ACCEPTED) 15/01/10 07:28:34 INFO yarn.Client: Application report for application_1420859110621_0002 (state: ACCEPTED) 15/01/10 07:28:35 INFO yarn.Client: Application report for application_1420859110621_0002 (state: ACCEPTED) 15/01/10 07:28:36 INFO yarn.Client: Application report for application_1420859110621_0002 (state: ACCEPTED) 15/01/10 07:28:37 INFO yarn.Client: Application report for application_1420859110621_0002 (state: ACCEPTED) 15/01/10 07:28:38 INFO yarn.Client: Application report for application_1420859110621_0002 (state: ACCEPTED) 15/01/10 07:28:39 INFO yarn.Client: Application report for application_1420859110621_0002 (state: ACCEPTED) 15/01/10 07:28:40 INFO yarn.Client: Application report for application_1420859110621_0002 (state: ACCEPTED) 15/01/10 07:28:41 INFO yarn.Client: Application report for application_1420859110621_0002 (state: ACCEPTED) 15/01/10 07:28:42 INFO yarn.Client: Application report for application_1420859110621_0002 (state: ACCEPTED) 15/01/10 07:28:43 INFO yarn.Client: Application report for application_1420859110621_0002 (state: ACCEPTED) 15/01/10 07:28:44 INFO yarn.Client: Application report for application_1420859110621_0002 (state: ACCEPTED) 15/01/10 07:28:45 INFO yarn.Client: Application report for application_1420859110621_0002 (state: ACCEPTED) 15/01/10 07:28:46 INFO yarn.Client: Application report for application_1420859110621_0002 (state: ACCEPTED) 15/01/10 07:28:47 INFO yarn.Client: Application report for application_1420859110621_0002 (state: ACCEPTED) 15/01/10 07:28:48 INFO yarn.Client: Application report for application_1420859110621_0002 (state: ACCEPTED) ///这是回到Spark运行时环境了? 15/01/10 07:28:48 INFO cluster.YarnClientSchedulerBackend: ApplicationMaster registered as Actor[akka.tcp://[email protected]:43444/user/YarnAM#-519598456] 15/01/10 07:28:48 INFO cluster.YarnClientSchedulerBackend: Add WebUI Filter. org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter, Map(PROXY_HOSTS -> hadoop.master, PROXY_URI_BASES -> http://hadoop.master:8088/proxy/application_1420859110621_0002), /proxy/application_1420859110621_0002 15/01/10 07:28:48 INFO ui.JettyUtils: Adding filter: org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter //继续回到进度,此时已经从受理(ACCEPTED)状态到正在运行状态(RUNNING) 15/01/10 07:28:49 INFO yarn.Client: Application report for application_1420859110621_0002 (state: RUNNING) 15/01/10 07:28:49 INFO yarn.Client: client token: N/A diagnostics: N/A ApplicationMaster host: hadoop.master ApplicationMaster RPC port: 0 queue: default start time: 1420892902791 final status: UNDEFINED tracking URL: http://hadoop.master:8088/proxy/application_1420859110621_0002/ user: hadoop 15/01/10 07:28:49 INFO cluster.YarnClientSchedulerBackend: Application application_1420859110621_0002 has started running. 15/01/10 07:28:51 INFO netty.NettyBlockTransferService: Server created on 45652 15/01/10 07:28:51 INFO storage.BlockManagerMaster: Trying to register BlockManager 15/01/10 07:28:51 INFO storage.BlockManagerMasterActor: Registering block manager localhost:45652 with 267.3 MB RAM, BlockManagerId(<driver>, localhost, 45652) 15/01/10 07:28:51 INFO storage.BlockManagerMaster: Registered BlockManager 15/01/10 07:28:51 INFO scheduler.EventLoggingListener: Logging events to hdfs://hadoop.master:9000/user/hadoop/sparkevt/application_1420859110621_0002 15/01/10 07:28:51 INFO cluster.YarnClientSchedulerBackend: SchedulerBackend is ready for scheduling beginning after waiting maxRegisteredResourcesWaitingTime: 30000(ms) 15/01/10 07:28:52 INFO storage.MemoryStore: ensureFreeSpace(216263) called with curMem=0, maxMem=280248975 15/01/10 07:28:52 INFO storage.MemoryStore: Block broadcast_0 stored as values in memory (estimated size 211.2 KB, free 267.1 MB) 15/01/10 07:28:52 INFO storage.MemoryStore: ensureFreeSpace(31667) called with curMem=216263, maxMem=280248975 15/01/10 07:28:52 INFO storage.MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 30.9 KB, free 267.0 MB) 15/01/10 07:28:52 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on localhost:45652 (size: 30.9 KB, free: 267.2 MB) 15/01/10 07:28:52 INFO storage.BlockManagerMaster: Updated info of block broadcast_0_piece0 15/01/10 07:28:52 INFO spark.SparkContext: Created broadcast 0 from textFile at SparkWordCount.scala:41 15/01/10 07:28:52 INFO mapred.FileInputFormat: Total input paths to process : 1 15/01/10 07:28:53 INFO spark.SparkContext: Starting job: sortByKey at SparkWordCount.scala:44 15/01/10 07:28:53 INFO scheduler.DAGScheduler: Registering RDD 3 (map at SparkWordCount.scala:44) 15/01/10 07:28:53 INFO scheduler.DAGScheduler: Got job 0 (sortByKey at SparkWordCount.scala:44) with 2 output partitions (allowLocal=false) 15/01/10 07:28:53 INFO scheduler.DAGScheduler: Final stage: Stage 1(sortByKey at SparkWordCount.scala:44) 15/01/10 07:28:53 INFO scheduler.DAGScheduler: Parents of final stage: List(Stage 0) 15/01/10 07:28:53 INFO scheduler.DAGScheduler: Missing parents: List(Stage 0) 15/01/10 07:28:53 INFO scheduler.DAGScheduler: Submitting Stage 0 (MappedRDD[3] at map at SparkWordCount.scala:44), which has no missing parents 15/01/10 07:28:53 INFO storage.MemoryStore: ensureFreeSpace(3528) called with curMem=247930, maxMem=280248975 15/01/10 07:28:53 INFO storage.MemoryStore: Block broadcast_1 stored as values in memory (estimated size 3.4 KB, free 267.0 MB) 15/01/10 07:28:53 INFO storage.MemoryStore: ensureFreeSpace(2498) called with curMem=251458, maxMem=280248975 15/01/10 07:28:53 INFO storage.MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 2.4 KB, free 267.0 MB) 15/01/10 07:28:53 INFO storage.BlockManagerInfo: Added broadcast_1_piece0 in memory on localhost:45652 (size: 2.4 KB, free: 267.2 MB) 15/01/10 07:28:53 INFO storage.BlockManagerMaster: Updated info of block broadcast_1_piece0 15/01/10 07:28:53 INFO spark.SparkContext: Created broadcast 1 from broadcast at DAGScheduler.scala:838 15/01/10 07:28:53 INFO scheduler.DAGScheduler: Submitting 2 missing tasks from Stage 0 (MappedRDD[3] at map at SparkWordCount.scala:44) 15/01/10 07:28:53 INFO cluster.YarnClientClusterScheduler: Adding task set 0.0 with 2 tasks 15/01/10 07:28:53 INFO util.RackResolver: Resolved hadoop.master to /default-rack 15/01/10 07:29:06 INFO cluster.YarnClientSchedulerBackend: Registered executor: Actor[akka.tcp://[email protected]:34326/user/Executor#-1519914394] with ID 1 15/01/10 07:29:06 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, hadoop.master, NODE_LOCAL, 1356 bytes) 15/01/10 07:29:06 INFO cluster.YarnClientSchedulerBackend: Registered executor: Actor[akka.tcp://[email protected]:59070/user/Executor#763574095] with ID 2 15/01/10 07:29:06 INFO scheduler.TaskSetManager: Starting task 1.0 in stage 0.0 (TID 1, hadoop.master, NODE_LOCAL, 1356 bytes) 15/01/10 07:29:09 INFO storage.BlockManagerMasterActor: Registering block manager hadoop.master:44394 with 267.3 MB RAM, BlockManagerId(1, hadoop.master, 44394) 15/01/10 07:29:09 INFO storage.BlockManagerMasterActor: Registering block manager hadoop.master:52439 with 267.3 MB RAM, BlockManagerId(2, hadoop.master, 52439) 15/01/10 07:29:11 INFO storage.BlockManagerInfo: Added broadcast_1_piece0 in memory on hadoop.master:52439 (size: 2.4 KB, free: 267.3 MB) 15/01/10 07:29:11 INFO storage.BlockManagerInfo: Added broadcast_1_piece0 in memory on hadoop.master:44394 (size: 2.4 KB, free: 267.3 MB) 15/01/10 07:29:15 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on hadoop.master:44394 (size: 30.9 KB, free: 267.2 MB) 15/01/10 07:29:15 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on hadoop.master:52439 (size: 30.9 KB, free: 267.2 MB) 15/01/10 07:29:34 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 27521 ms on hadoop.master (1/2) 15/01/10 07:29:34 INFO scheduler.TaskSetManager: Finished task 1.0 in stage 0.0 (TID 1) in 27381 ms on hadoop.master (2/2) 15/01/10 07:29:34 INFO scheduler.DAGScheduler: Stage 0 (map at SparkWordCount.scala:44) finished in 40.486 s 15/01/10 07:29:34 INFO scheduler.DAGScheduler: looking for newly runnable stages 15/01/10 07:29:34 INFO scheduler.DAGScheduler: running: Set() 15/01/10 07:29:34 INFO scheduler.DAGScheduler: waiting: Set(Stage 1) 15/01/10 07:29:34 INFO scheduler.DAGScheduler: failed: Set() 15/01/10 07:29:34 INFO cluster.YarnClientClusterScheduler: Removed TaskSet 0.0, whose tasks have all completed, from pool 15/01/10 07:29:34 INFO scheduler.DAGScheduler: Missing parents for Stage 1: List() 15/01/10 07:29:34 INFO scheduler.DAGScheduler: Submitting Stage 1 (MapPartitionsRDD[7] at sortByKey at SparkWordCount.scala:44), which is now runnable 15/01/10 07:29:34 INFO storage.MemoryStore: ensureFreeSpace(3072) called with curMem=253956, maxMem=280248975 15/01/10 07:29:34 INFO storage.MemoryStore: Block broadcast_2 stored as values in memory (estimated size 3.0 KB, free 267.0 MB) 15/01/10 07:29:34 INFO storage.MemoryStore: ensureFreeSpace(2122) called with curMem=257028, maxMem=280248975 15/01/10 07:29:34 INFO storage.MemoryStore: Block broadcast_2_piece0 stored as bytes in memory (estimated size 2.1 KB, free 267.0 MB) 15/01/10 07:29:34 INFO storage.BlockManagerInfo: Added broadcast_2_piece0 in memory on localhost:45652 (size: 2.1 KB, free: 267.2 MB) 15/01/10 07:29:34 INFO storage.BlockManagerMaster: Updated info of block broadcast_2_piece0 15/01/10 07:29:34 INFO spark.SparkContext: Created broadcast 2 from broadcast at DAGScheduler.scala:838 15/01/10 07:29:34 INFO scheduler.DAGScheduler: Submitting 2 missing tasks from Stage 1 (MapPartitionsRDD[7] at sortByKey at SparkWordCount.scala:44) 15/01/10 07:29:34 INFO cluster.YarnClientClusterScheduler: Adding task set 1.0 with 2 tasks 15/01/10 07:29:34 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 1.0 (TID 2, hadoop.master, PROCESS_LOCAL, 1112 bytes) 15/01/10 07:29:34 INFO scheduler.TaskSetManager: Starting task 1.0 in stage 1.0 (TID 3, hadoop.master, PROCESS_LOCAL, 1112 bytes) 15/01/10 07:29:34 INFO storage.BlockManagerInfo: Added broadcast_2_piece0 in memory on hadoop.master:52439 (size: 2.1 KB, free: 267.2 MB) 15/01/10 07:29:35 INFO storage.BlockManagerInfo: Added broadcast_2_piece0 in memory on hadoop.master:44394 (size: 2.1 KB, free: 267.2 MB) 15/01/10 07:29:35 INFO spark.MapOutputTrackerMasterActor: Asked to send map output locations for shuffle 0 to [email protected]:59070 15/01/10 07:29:35 INFO spark.MapOutputTrackerMaster: Size of output statuses for shuffle 0 is 158 bytes 15/01/10 07:29:35 INFO spark.MapOutputTrackerMasterActor: Asked to send map output locations for shuffle 0 to [email protected]:34326 15/01/10 07:29:37 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 1.0 (TID 2) in 3297 ms on hadoop.master (1/2) 15/01/10 07:29:37 INFO scheduler.TaskSetManager: Finished task 1.0 in stage 1.0 (TID 3) in 3303 ms on hadoop.master (2/2) 15/01/10 07:29:37 INFO cluster.YarnClientClusterScheduler: Removed TaskSet 1.0, whose tasks have all completed, from pool 15/01/10 07:29:37 INFO scheduler.DAGScheduler: Stage 1 (sortByKey at SparkWordCount.scala:44) finished in 3.307 s ///Job 0执行完成 15/01/10 07:29:37 INFO scheduler.DAGScheduler: Job 0 finished: sortByKey at SparkWordCount.scala:44, took 44.720124 s 15/01/10 07:29:38 INFO Configuration.deprecation: mapred.tip.id is deprecated. Instead, use mapreduce.task.id 15/01/10 07:29:38 INFO Configuration.deprecation: mapred.task.id is deprecated. Instead, use mapreduce.task.attempt.id 15/01/10 07:29:38 INFO Configuration.deprecation: mapred.task.is.map is deprecated. Instead, use mapreduce.task.ismap 15/01/10 07:29:38 INFO Configuration.deprecation: mapred.task.partition is deprecated. Instead, use mapreduce.task.partition 15/01/10 07:29:38 INFO Configuration.deprecation: mapred.job.id is deprecated. Instead, use mapreduce.job.id 15/01/10 07:29:38 INFO spark.SparkContext: Starting job: saveAsTextFile at SparkWordCount.scala:44 15/01/10 07:29:38 INFO scheduler.DAGScheduler: Registering RDD 5 (map at SparkWordCount.scala:44) 15/01/10 07:29:38 INFO scheduler.DAGScheduler: Got job 1 (saveAsTextFile at SparkWordCount.scala:44) with 2 output partitions (allowLocal=false) 15/01/10 07:29:38 INFO scheduler.DAGScheduler: Final stage: Stage 4(saveAsTextFile at SparkWordCount.scala:44) 15/01/10 07:29:38 INFO scheduler.DAGScheduler: Parents of final stage: List(Stage 3) 15/01/10 07:29:38 INFO scheduler.DAGScheduler: Missing parents: List(Stage 3) 15/01/10 07:29:38 INFO scheduler.DAGScheduler: Submitting Stage 3 (MappedRDD[5] at map at SparkWordCount.scala:44), which has no missing parents 15/01/10 07:29:38 INFO storage.MemoryStore: ensureFreeSpace(2992) called with curMem=259150, maxMem=280248975 15/01/10 07:29:38 INFO storage.MemoryStore: Block broadcast_3 stored as values in memory (estimated size 2.9 KB, free 267.0 MB) 15/01/10 07:29:38 INFO storage.MemoryStore: ensureFreeSpace(2168) called with curMem=262142, maxMem=280248975 15/01/10 07:29:38 INFO storage.MemoryStore: Block broadcast_3_piece0 stored as bytes in memory (estimated size 2.1 KB, free 267.0 MB) 15/01/10 07:29:38 INFO storage.BlockManagerInfo: Added broadcast_3_piece0 in memory on localhost:45652 (size: 2.1 KB, free: 267.2 MB) 15/01/10 07:29:38 INFO storage.BlockManagerMaster: Updated info of block broadcast_3_piece0 15/01/10 07:29:38 INFO spark.SparkContext: Created broadcast 3 from broadcast at DAGScheduler.scala:838 15/01/10 07:29:38 INFO scheduler.DAGScheduler: Submitting 2 missing tasks from Stage 3 (MappedRDD[5] at map at SparkWordCount.scala:44) 15/01/10 07:29:38 INFO cluster.YarnClientClusterScheduler: Adding task set 3.0 with 2 tasks 15/01/10 07:29:38 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 3.0 (TID 4, hadoop.master, PROCESS_LOCAL, 1101 bytes) 15/01/10 07:29:38 INFO scheduler.TaskSetManager: Starting task 1.0 in stage 3.0 (TID 5, hadoop.master, PROCESS_LOCAL, 1101 bytes) 15/01/10 07:29:38 INFO storage.BlockManagerInfo: Added broadcast_3_piece0 in memory on hadoop.master:52439 (size: 2.1 KB, free: 267.2 MB) 15/01/10 07:29:38 INFO storage.BlockManagerInfo: Added broadcast_3_piece0 in memory on hadoop.master:44394 (size: 2.1 KB, free: 267.2 MB) 15/01/10 07:29:38 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 3.0 (TID 4) in 441 ms on hadoop.master (1/2) 15/01/10 07:29:38 INFO scheduler.TaskSetManager: Finished task 1.0 in stage 3.0 (TID 5) in 470 ms on hadoop.master (2/2) 15/01/10 07:29:38 INFO cluster.YarnClientClusterScheduler: Removed TaskSet 3.0, whose tasks have all completed, from pool 15/01/10 07:29:38 INFO scheduler.DAGScheduler: Stage 3 (map at SparkWordCount.scala:44) finished in 0.474 s 15/01/10 07:29:38 INFO scheduler.DAGScheduler: looking for newly runnable stages 15/01/10 07:29:38 INFO scheduler.DAGScheduler: running: Set() 15/01/10 07:29:38 INFO scheduler.DAGScheduler: waiting: Set(Stage 4) 15/01/10 07:29:38 INFO scheduler.DAGScheduler: failed: Set() 15/01/10 07:29:39 INFO scheduler.DAGScheduler: Missing parents for Stage 4: List() 15/01/10 07:29:39 INFO scheduler.DAGScheduler: Submitting Stage 4 (MappedRDD[10] at saveAsTextFile at SparkWordCount.scala:44), which is now runnable 15/01/10 07:29:39 INFO storage.MemoryStore: ensureFreeSpace(113152) called with curMem=264310, maxMem=280248975 15/01/10 07:29:39 INFO storage.MemoryStore: Block broadcast_4 stored as values in memory (estimated size 110.5 KB, free 266.9 MB) 15/01/10 07:29:39 INFO storage.MemoryStore: ensureFreeSpace(68432) called with curMem=377462, maxMem=280248975 15/01/10 07:29:39 INFO storage.MemoryStore: Block broadcast_4_piece0 stored as bytes in memory (estimated size 66.8 KB, free 266.8 MB) 15/01/10 07:29:39 INFO storage.BlockManagerInfo: Added broadcast_4_piece0 in memory on localhost:45652 (size: 66.8 KB, free: 267.2 MB) 15/01/10 07:29:39 INFO storage.BlockManagerMaster: Updated info of block broadcast_4_piece0 15/01/10 07:29:39 INFO spark.SparkContext: Created broadcast 4 from broadcast at DAGScheduler.scala:838 15/01/10 07:29:39 INFO scheduler.DAGScheduler: Submitting 2 missing tasks from Stage 4 (MappedRDD[10] at saveAsTextFile at SparkWordCount.scala:44) 15/01/10 07:29:39 INFO cluster.YarnClientClusterScheduler: Adding task set 4.0 with 2 tasks 15/01/10 07:29:39 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 4.0 (TID 6, hadoop.master, PROCESS_LOCAL, 1112 bytes) 15/01/10 07:29:39 INFO scheduler.TaskSetManager: Starting task 1.0 in stage 4.0 (TID 7, hadoop.master, PROCESS_LOCAL, 1112 bytes) 15/01/10 07:29:39 INFO storage.BlockManagerInfo: Added broadcast_4_piece0 in memory on hadoop.master:52439 (size: 66.8 KB, free: 267.2 MB) 15/01/10 07:29:39 INFO storage.BlockManagerInfo: Added broadcast_4_piece0 in memory on hadoop.master:44394 (size: 66.8 KB, free: 267.2 MB) 15/01/10 07:29:40 INFO spark.MapOutputTrackerMasterActor: Asked to send map output locations for shuffle 1 to [email protected]:34326 15/01/10 07:29:40 INFO spark.MapOutputTrackerMaster: Size of output statuses for shuffle 1 is 158 bytes 15/01/10 07:29:40 INFO spark.MapOutputTrackerMasterActor: Asked to send map output locations for shuffle 1 to [email protected]:59070 15/01/10 07:29:42 INFO scheduler.TaskSetManager: Finished task 1.0 in stage 4.0 (TID 7) in 3184 ms on hadoop.master (1/2) 15/01/10 07:29:42 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 4.0 (TID 6) in 3213 ms on hadoop.master (2/2) 15/01/10 07:29:42 INFO cluster.YarnClientClusterScheduler: Removed TaskSet 4.0, whose tasks have all completed, from pool 15/01/10 07:29:42 INFO scheduler.DAGScheduler: Stage 4 (saveAsTextFile at SparkWordCount.scala:44) finished in 3.215 s ////Job 1执行完成,至此所有的Job都执行完成了 15/01/10 07:29:42 INFO scheduler.DAGScheduler: Job 1 finished: saveAsTextFile at SparkWordCount.scala:44, took 3.969958 s ////下面这一坨是在干啥?? 15/01/10 07:29:42 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/stages/stage/kill,null} 15/01/10 07:29:42 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/,null} 15/01/10 07:29:42 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/static,null} 15/01/10 07:29:42 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/executors/threadDump/json,null} 15/01/10 07:29:42 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/executors/threadDump,null} 15/01/10 07:29:42 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/executors/json,null} 15/01/10 07:29:42 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/executors,null} 15/01/10 07:29:42 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/environment/json,null} 15/01/10 07:29:42 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/environment,null} 15/01/10 07:29:42 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/storage/rdd/json,null} 15/01/10 07:29:42 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/storage/rdd,null} 15/01/10 07:29:42 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/storage/json,null} 15/01/10 07:29:42 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/storage,null} 15/01/10 07:29:42 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/stages/pool/json,null} 15/01/10 07:29:42 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/stages/pool,null} 15/01/10 07:29:42 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/stages/stage/json,null} 15/01/10 07:29:42 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/stages/stage,null} 15/01/10 07:29:42 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/stages/json,null} 15/01/10 07:29:42 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/stages,null} 15/01/10 07:29:42 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/jobs/job/json,null} 15/01/10 07:29:42 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/jobs/job,null} 15/01/10 07:29:42 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/jobs/json,null} 15/01/10 07:29:42 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/jobs,null} ////Spark web UI被停了,只能看History了 15/01/10 07:29:42 INFO ui.SparkUI: Stopped Spark web UI at http://localhost:4040 15/01/10 07:29:42 INFO scheduler.DAGScheduler: Stopping DAGScheduler 15/01/10 07:29:42 INFO cluster.YarnClientSchedulerBackend: Shutting down all executors 15/01/10 07:29:43 INFO cluster.YarnClientSchedulerBackend: Asking each executor to shut down 15/01/10 07:29:43 INFO cluster.YarnClientSchedulerBackend: Stopped 15/01/10 07:29:44 INFO spark.MapOutputTrackerMasterActor: MapOutputTrackerActor stopped! 15/01/10 07:29:44 INFO storage.MemoryStore: MemoryStore cleared 15/01/10 07:29:44 INFO storage.BlockManager: BlockManager stopped 15/01/10 07:29:44 INFO storage.BlockManagerMaster: BlockManagerMaster stopped ///以上是资源释放的逻辑 //1. Executor停止 2. MemoryStore清空 3. BlockManager停止 4. BlockManagerMaster停止 15/01/10 07:29:44 INFO remote.RemoteActorRefProvider$RemotingTerminator: Shutting down remote daemon. 15/01/10 07:29:44 INFO remote.RemoteActorRefProvider$RemotingTerminator: Remote daemon shut down; proceeding with flushing remote transports. ////关闭SparkContext,整个任务结束 15/01/10 07:29:44 INFO spark.SparkContext: Successfully stopped SparkContext
访问http://hadoop.master:8088查看任务的执行情况。如下图所示,Spark的WorkCount程序确实能在Hadoop上看到,并且计算类型为Spark
访问http://hadoop.master:4040,不出意外的不能访问,原因是Spark在程序运行完成后,自动的将这个服务关闭了,这就是这个web服务是跟应用绑定的,而不是跟Spark绑定的。
访问http://hadoop.master:18080
yarn-cluster部署
1. 任务提交命令
./spark-submit --name SparkWordCount --class spark.examples.SparkWordCount --master yarn-cluster --executor-memory 512M --total-executor-cores 1 SparkWordCount.jar README.md
2. 任务日志
对以yarn-cluster模式运行产生的日志又产生不小的惊讶,原因是不像yarn-client那样产出了过程很复杂的日志,这里产生的日志很简单,总结下来就是任务被受理(ACCEPTED),任务处理中(RUNNING)以及任务执行完毕(FINISHED),除此别无其它,尤其是没有Spark的产生的日志...
[hadoop@hadoop bin]$ sh submitSparkApplicationYarnCluster.sh ////提交yarn-cluster模式的Spark应用程序 Delete the HDFS output directory ///删除上次任务执行时创建的HDFS输出目录 15/01/10 07:56:30 INFO fs.TrashPolicyDefault: Namenode trash configuration: Deletion interval = 0 minutes, Emptier interval = 0 minutes. Deleted /user/hadoop/SortedWordCountRDDInSparkApplication Spark assembly has been built with Hive, including Datanucleus jars on classpath 15/01/10 07:56:35 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable ///提交应用 ///申请资源 15/01/10 07:56:36 INFO client.RMProxy: Connecting to ResourceManager at hadoop.master/192.168.26.136:8032 15/01/10 07:56:38 INFO yarn.Client: Requesting a new application from cluster with 1 NodeManagers 15/01/10 07:56:38 INFO yarn.Client: Verifying our application has not requested more than the maximum memory capability of the cluster (8192 MB per container) 15/01/10 07:56:38 INFO yarn.Client: Will allocate AM container, with 896 MB memory including 384 MB overhead 15/01/10 07:56:38 INFO yarn.Client: Setting up container launch context for our AM 15/01/10 07:56:38 INFO yarn.Client: Preparing resources for our AM container ////spark-assembly-1.2.0-hadoop2.4.0.jar又一次被送到HDFS上了 15/01/10 07:56:39 INFO yarn.Client: Uploading resource file:/home/hadoop/software/spark-1.2.0-bin-hadoop2.4/lib/spark-assembly-1.2.0-hadoop2.4.0.jar -> hdfs://hadoop.master:9000/user/hadoop/.sparkStaging/application_1420859110621_0003/spark-assembly-1.2.0-hadoop2.4.0.jar ////应用程序本身的jar包 15/01/10 07:56:49 INFO yarn.Client: Uploading resource file:/home/hadoop/software/spark-1.2.0-bin-hadoop2.4/bin/SparkWordCount.jar -> hdfs://hadoop.master:9000/user/hadoop/.sparkStaging/application_1420859110621_0003/SparkWordCount.jar 15/01/10 07:56:49 INFO yarn.Client: Setting up the launch environment for our AM container 15/01/10 07:56:49 INFO spark.SecurityManager: Changing view acls to: hadoop 15/01/10 07:56:49 INFO spark.SecurityManager: Changing modify acls to: hadoop 15/01/10 07:56:49 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(hadoop); users with modify permissions: Set(hadoop) ///任务提交到ResourceManager 15/01/10 07:56:49 INFO yarn.Client: Submitting application 3 to ResourceManager 15/01/10 07:56:49 INFO impl.YarnClientImpl: Submitted application application_1420859110621_0003 15/01/10 07:56:50 INFO yarn.Client: Application report for application_1420859110621_0003 (state: ACCEPTED) 15/01/10 07:56:50 INFO yarn.Client: client token: N/A diagnostics: N/A ApplicationMaster host: N/A ApplicationMaster RPC port: -1 queue: default start time: 1420894609440 final status: UNDEFINED tracking URL: http://hadoop.master:8088/proxy/application_1420859110621_0003/ user: hadoop ////任务已受理 15/01/10 07:56:51 INFO yarn.Client: Application report for application_1420859110621_0003 (state: ACCEPTED) 15/01/10 07:56:52 INFO yarn.Client: Application report for application_1420859110621_0003 (state: ACCEPTED) 15/01/10 07:56:53 INFO yarn.Client: Application report for application_1420859110621_0003 (state: ACCEPTED) 15/01/10 07:56:54 INFO yarn.Client: Application report for application_1420859110621_0003 (state: ACCEPTED) 15/01/10 07:56:55 INFO yarn.Client: Application report for application_1420859110621_0003 (state: ACCEPTED) 15/01/10 07:56:56 INFO yarn.Client: Application report for application_1420859110621_0003 (state: ACCEPTED) 15/01/10 07:56:57 INFO yarn.Client: Application report for application_1420859110621_0003 (state: ACCEPTED) 15/01/10 07:56:58 INFO yarn.Client: Application report for application_1420859110621_0003 (state: ACCEPTED) 15/01/10 07:56:59 INFO yarn.Client: Application report for application_1420859110621_0003 (state: ACCEPTED) 15/01/10 07:57:00 INFO yarn.Client: Application report for application_1420859110621_0003 (state: ACCEPTED) 15/01/10 07:57:01 INFO yarn.Client: Application report for application_1420859110621_0003 (state: ACCEPTED) 15/01/10 07:57:02 INFO yarn.Client: Application report for application_1420859110621_0003 (state: ACCEPTED) 15/01/10 07:57:03 INFO yarn.Client: Application report for application_1420859110621_0003 (state: ACCEPTED) 15/01/10 07:57:04 INFO yarn.Client: Application report for application_1420859110621_0003 (state: ACCEPTED) 15/01/10 07:57:05 INFO yarn.Client: Application report for application_1420859110621_0003 (state: ACCEPTED) 15/01/10 07:57:06 INFO yarn.Client: Application report for application_1420859110621_0003 (state: RUNNING) 15/01/10 07:57:06 INFO yarn.Client: client token: N/A diagnostics: N/A ApplicationMaster host: hadoop.master ApplicationMaster RPC port: 0 queue: default start time: 1420894609440 final status: UNDEFINED tracking URL: http://hadoop.master:8088/proxy/application_1420859110621_0003/ user: hadoop ////任务开始执行 15/01/10 07:57:07 INFO yarn.Client: Application report for application_1420859110621_0003 (state: RUNNING) 15/01/10 07:57:08 INFO yarn.Client: Application report for application_1420859110621_0003 (state: RUNNING) 15/01/10 07:57:09 INFO yarn.Client: Application report for application_1420859110621_0003 (state: RUNNING) 15/01/10 07:57:10 INFO yarn.Client: Application report for application_1420859110621_0003 (state: RUNNING) 15/01/10 07:57:11 INFO yarn.Client: Application report for application_1420859110621_0003 (state: RUNNING) 15/01/10 07:57:12 INFO yarn.Client: Application report for application_1420859110621_0003 (state: RUNNING) 15/01/10 07:57:13 INFO yarn.Client: Application report for application_1420859110621_0003 (state: RUNNING) 15/01/10 07:57:14 INFO yarn.Client: Application report for application_1420859110621_0003 (state: RUNNING) 15/01/10 07:57:15 INFO yarn.Client: Application report for application_1420859110621_0003 (state: RUNNING) 15/01/10 07:57:16 INFO yarn.Client: Application report for application_1420859110621_0003 (state: RUNNING) 15/01/10 07:57:17 INFO yarn.Client: Application report for application_1420859110621_0003 (state: RUNNING) 15/01/10 07:57:18 INFO yarn.Client: Application report for application_1420859110621_0003 (state: RUNNING) 15/01/10 07:57:19 INFO yarn.Client: Application report for application_1420859110621_0003 (state: RUNNING) 15/01/10 07:57:20 INFO yarn.Client: Application report for application_1420859110621_0003 (state: RUNNING) 15/01/10 07:57:21 INFO yarn.Client: Application report for application_1420859110621_0003 (state: RUNNING) 15/01/10 07:57:22 INFO yarn.Client: Application report for application_1420859110621_0003 (state: RUNNING) 15/01/10 07:57:23 INFO yarn.Client: Application report for application_1420859110621_0003 (state: RUNNING) 15/01/10 07:57:25 INFO yarn.Client: Application report for application_1420859110621_0003 (state: RUNNING) 15/01/10 07:57:26 INFO yarn.Client: Application report for application_1420859110621_0003 (state: RUNNING) 15/01/10 07:57:27 INFO yarn.Client: Application report for application_1420859110621_0003 (state: RUNNING) 15/01/10 07:57:28 INFO yarn.Client: Application report for application_1420859110621_0003 (state: RUNNING) 15/01/10 07:57:29 INFO yarn.Client: Application report for application_1420859110621_0003 (state: RUNNING) 15/01/10 07:57:30 INFO yarn.Client: Application report for application_1420859110621_0003 (state: RUNNING) 15/01/10 07:57:31 INFO yarn.Client: Application report for application_1420859110621_0003 (state: RUNNING) 15/01/10 07:57:32 INFO yarn.Client: Application report for application_1420859110621_0003 (state: RUNNING) 15/01/10 07:57:33 INFO yarn.Client: Application report for application_1420859110621_0003 (state: RUNNING) 15/01/10 07:57:34 INFO yarn.Client: Application report for application_1420859110621_0003 (state: RUNNING) 15/01/10 07:57:35 INFO yarn.Client: Application report for application_1420859110621_0003 (state: RUNNING) 15/01/10 07:57:36 INFO yarn.Client: Application report for application_1420859110621_0003 (state: RUNNING) 15/01/10 07:57:37 INFO yarn.Client: Application report for application_1420859110621_0003 (state: RUNNING) 15/01/10 07:57:38 INFO yarn.Client: Application report for application_1420859110621_0003 (state: RUNNING) 15/01/10 07:57:39 INFO yarn.Client: Application report for application_1420859110621_0003 (state: RUNNING) 15/01/10 07:57:40 INFO yarn.Client: Application report for application_1420859110621_0003 (state: RUNNING) 15/01/10 07:57:41 INFO yarn.Client: Application report for application_1420859110621_0003 (state: RUNNING) ///任务执行完毕 15/01/10 07:57:42 INFO yarn.Client: Application report for application_1420859110621_0003 (state: FINISHED) 15/01/10 07:57:42 INFO yarn.Client: client token: N/A diagnostics: N/A ApplicationMaster host: hadoop.master ApplicationMaster RPC port: 0 queue: default start time: 1420894609440 final status: SUCCEEDED tracking URL: http://hadoop.master:8088/proxy/application_1420859110621_0003/A user: hadoop
3. 状态查看
3.1 采用yarn-cluster方式,因为driver在YARN中运行,要通过webUI访问driver的状态,需要点YARN中该job的Tracking UI。点TrackingUI上History会打开这个应用程序的历史记录,但是当点它的时候,却访问不到,原因是
Hadoop没有启动history server,应该使用Hadoop目录下的sbin/mr-jobhistory-daemon.sh启动history server
3.2 采用yarn-cluster方式,因为driver在YARN中运行,所以程序的运行结果不能在客户端显示,所以最好将结果保存在hdfs上,客户端的终端显示的是作为YARN的job的运行情况
3.3 访问http://hadoop.master:8088查看任务执行结果
本文参考:http://blog.csdn.net/book_mmicky/article/details/25714287