【Spark十九】Spark on YARN部署

不知不觉,已经到了Spark的第19篇博客了,这个系列很不系统,基本上是学到哪写到哪,而不是成竹在胸之后,高屋建瓴的写,这个等到对Spark有了比较深刻的理解和把握之后再来整理这些博客,毕竟刚接触Spark10天,继续!

在之前的文章中,Spark都是使用默认的伪分布式部署方式,没有从系统部署的角度去审视Spark,目前的状态是能运行Spark能跑通例子的程度,在此之前,Spark的配置文件内容是:

export SCALA_HOME=/home/hadoop/software/scala-2.11.4
export JAVA_HOME=/home/hadoop/software/jdk1.7.0_67

###localhost表示MASTER节点在本机?
export SPARK_MASTER=localhost

###这个表示什么意思?
export SPARK_LOCAL_IP=localhost
export HADOOP_HOME=/home/hadoop/software/hadoop-2.5.2
export SPARK_HOME=/home/hadoop/software/spark-1.2.0-bin-hadoop2.4
export SPARK_LIBARY_PATH=.:$JAVA_HOME/lib:$JAVA_HOME/jre/lib:$HADOOP_HOME/lib/native

###这个是干啥用的,如果使用Spark独立运行的话,应该不需要配置YARN相关的选项
export YARN_CONF_DIR=$HADOOP_HOME/etc/hadoop

 

Spark基于YARN的两种部署方式

  • yarn-client
  • yarn-cluster

 1. yarn-client

在这种模式下,Spark driver在客户机上运行,然后向YARN申请运行exeutor以运行Task,即Driver和YARN是分开的,Driver程序作为YARN集群的一个客户端,这是一种CS模式

 2. yarn-cluster

这种模式下,Spark driver将作为一个ApplicationMaster在YARN集群中先启动,然后再由ApplicationMaster向RM申请资源启动executor以运行Task。也就是说,在这种部署方式下,Driver程序运行在YARN集群上

在YARN中部署Spark应用程序时,可以使用Spark的bin/spark-submit提交Spark应用程序。在YARN上部署Spark应用程序的时候,不需要象Standalone、Mesos一样提供URL作为master参数的值,因为Spark应用程序可以在hadoop的配置文件里面获取相关的信息,所以只需要简单以yarn-cluster或yarn-client指定给master就可以了。因此,因为Spark需要从hadoop(或者具体的yarn相关的配置)的配置文件中获取相关的信息,所以需要配置环境变量HADOOP_CONF_DIR或者YARN_CONF_DIR。
所在在上面的配置再加一个配置项到conf/spark-env.sh中,同时在/etc/profile中也添加一行

export YARN_CONF_DIR=$HADOOP_HOME/etc/hadoop

 

yarn-client部署

1.  提交命令:

./spark-submit --name SparkWordCount --class spark.examples.SparkWordCount --master yarn-client --executor-memory 512M --total-executor-cores 1 SparkWordCount.jar README.md

 

对比之前的由Spark自己管理计算资源的提交方式

./spark-submit --name SparkWordCount --class spark.examples.SparkWordCount --master spark://hadoop.master:7077 --executor-memory 512M --total-executor-cores 1 SparkWordCount.jar README.md

2. 说明:

2.1. 采用yarn-client方式,因为driver在客户端,所以可以通过webUI访问driver的状态,默认是http://hadoop.master:4040访问,而YARN通过http://haoop.master:8088访问。

2.2   提交一个作业,产生的日志以及整个过程貌似很复杂的样子

[hadoop@hadoop bin]$ sh submitSparkApplicationYarnClient.sh //yarn-client方式提交任务
Delete the HDFS output directory //删除上次执行任务时,产生的HDFS输出目录
15/01/10 07:27:49 INFO fs.TrashPolicyDefault: Namenode trash configuration: Deletion interval = 0 minutes, Emptier interval = 0 minutes.
Deleted /user/hadoop/SortedWordCountRDDInSparkApplication
Spark assembly has been built with Hive, including Datanucleus jars on classpath
15/01/10 07:27:52 INFO spark.SecurityManager: Changing view acls to: hadoop
15/01/10 07:27:52 INFO spark.SecurityManager: Changing modify acls to: hadoop
15/01/10 07:27:52 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(hadoop); users with modify permissions: Set(hadoop)
15/01/10 07:27:53 INFO slf4j.Slf4jLogger: Slf4jLogger started
15/01/10 07:27:53 INFO Remoting: Starting remoting
15/01/10 07:27:54 INFO util.Utils: Successfully started service 'sparkDriver' on port 35401.
15/01/10 07:27:54 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriver@localhost:35401]
15/01/10 07:27:54 INFO spark.SparkEnv: Registering MapOutputTracker
15/01/10 07:27:54 INFO spark.SparkEnv: Registering BlockManagerMaster
15/01/10 07:27:54 INFO storage.DiskBlockManager: Created local directory at /tmp/spark-local-20150110072754-dcdf
15/01/10 07:27:54 INFO storage.MemoryStore: MemoryStore started with capacity 267.3 MB
15/01/10 07:27:55 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
15/01/10 07:27:56 INFO spark.HttpFileServer: HTTP File server directory is /tmp/spark-8f55f6ec-399b-4371-9ab4-d648047381c5
15/01/10 07:27:56 INFO spark.HttpServer: Starting HTTP Server
15/01/10 07:27:56 INFO server.Server: jetty-8.y.z-SNAPSHOT
15/01/10 07:27:57 INFO server.AbstractConnector: Started [email protected]:52196
15/01/10 07:27:57 INFO util.Utils: Successfully started service 'HTTP file server' on port 52196.
15/01/10 07:27:57 INFO server.Server: jetty-8.y.z-SNAPSHOT
15/01/10 07:27:58 INFO server.AbstractConnector: Started [email protected]:4040
15/01/10 07:27:58 INFO util.Utils: Successfully started service 'SparkUI' on port 4040.
15/01/10 07:27:58 INFO ui.SparkUI: Started SparkUI at http://localhost:4040
15/01/10 07:27:58 INFO spark.SparkContext: Added JAR file:/home/hadoop/software/spark-1.2.0-bin-hadoop2.4/bin/SparkWordCount.jar at http://localhost:52196/jars/SparkWordCount.jar with timestamp 1420892878400
//////////到此时,Spark1工作做完,将任务提交给Yarn/////
15/01/10 07:28:00 INFO client.RMProxy: Connecting to ResourceManager at hadoop.master/192.168.26.136:8032 //连接ResourceManager
15/01/10 07:28:02 INFO yarn.Client: Requesting a new application from cluster with 1 NodeManagers //申请资源
15/01/10 07:28:02 INFO yarn.Client: Verifying our application has not requested more than the maximum memory capability of the cluster (8192 MB per container)
15/01/10 07:28:02 INFO yarn.Client: Will allocate AM container, with 896 MB memory including 384 MB overhead //分配一个资源单位,AM Container
15/01/10 07:28:02 INFO yarn.Client: Setting up container launch context for our AM //设置container
15/01/10 07:28:02 INFO yarn.Client: Preparing resources for our AM container 
15/01/10 07:28:03 INFO yarn.Client: Uploading resource file:/home/hadoop/software/spark-1.2.0-bin-hadoop2.4/lib/spark-assembly-1.2.0-hadoop2.4.0.jar -> hdfs://hadoop.master:9000/user/hadoop/.sparkStaging/application_1420859110621_0002/spark-assembly-1.2.0-hadoop2.4.0.jar
////把spark-assembly-1.2.0-hadoop-2.4.0.jar上传到HDFS上???
15/01/10 07:28:22 INFO yarn.Client: Setting up the launch environment for our AM container
15/01/10 07:28:22 INFO spark.SecurityManager: Changing view acls to: hadoop
15/01/10 07:28:22 INFO spark.SecurityManager: Changing modify acls to: hadoop
15/01/10 07:28:22 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(hadoop); users with modify permissions: Set(hadoop)
15/01/10 07:28:22 INFO yarn.Client: Submitting application 2 to ResourceManager
////任务提交
15/01/10 07:28:22 INFO impl.YarnClientImpl: Submitted application application_1420859110621_0002
15/01/10 07:28:23 INFO yarn.Client: Application report for application_1420859110621_0002 (state: ACCEPTED)
15/01/10 07:28:23 INFO yarn.Client: 
	 client token: N/A
	 diagnostics: N/A
	 ApplicationMaster host: N/A
	 ApplicationMaster RPC port: -1
	 queue: default
	 start time: 1420892902791
	 final status: UNDEFINED
	 tracking URL: http://hadoop.master:8088/proxy/application_1420859110621_0002/
	 user: hadoop

///下面这一坨是什么情况?每秒钟rport一次状态?那这得产生多少垃圾日志?
15/01/10 07:28:24 INFO yarn.Client: Application report for application_1420859110621_0002 (state: ACCEPTED)
15/01/10 07:28:26 INFO yarn.Client: Application report for application_1420859110621_0002 (state: ACCEPTED)
15/01/10 07:28:27 INFO yarn.Client: Application report for application_1420859110621_0002 (state: ACCEPTED)
15/01/10 07:28:28 INFO yarn.Client: Application report for application_1420859110621_0002 (state: ACCEPTED)
15/01/10 07:28:29 INFO yarn.Client: Application report for application_1420859110621_0002 (state: ACCEPTED)
15/01/10 07:28:30 INFO yarn.Client: Application report for application_1420859110621_0002 (state: ACCEPTED)
15/01/10 07:28:31 INFO yarn.Client: Application report for application_1420859110621_0002 (state: ACCEPTED)
15/01/10 07:28:32 INFO yarn.Client: Application report for application_1420859110621_0002 (state: ACCEPTED)
15/01/10 07:28:34 INFO yarn.Client: Application report for application_1420859110621_0002 (state: ACCEPTED)
15/01/10 07:28:35 INFO yarn.Client: Application report for application_1420859110621_0002 (state: ACCEPTED)
15/01/10 07:28:36 INFO yarn.Client: Application report for application_1420859110621_0002 (state: ACCEPTED)
15/01/10 07:28:37 INFO yarn.Client: Application report for application_1420859110621_0002 (state: ACCEPTED)
15/01/10 07:28:38 INFO yarn.Client: Application report for application_1420859110621_0002 (state: ACCEPTED)
15/01/10 07:28:39 INFO yarn.Client: Application report for application_1420859110621_0002 (state: ACCEPTED)
15/01/10 07:28:40 INFO yarn.Client: Application report for application_1420859110621_0002 (state: ACCEPTED)
15/01/10 07:28:41 INFO yarn.Client: Application report for application_1420859110621_0002 (state: ACCEPTED)
15/01/10 07:28:42 INFO yarn.Client: Application report for application_1420859110621_0002 (state: ACCEPTED)
15/01/10 07:28:43 INFO yarn.Client: Application report for application_1420859110621_0002 (state: ACCEPTED)
15/01/10 07:28:44 INFO yarn.Client: Application report for application_1420859110621_0002 (state: ACCEPTED)
15/01/10 07:28:45 INFO yarn.Client: Application report for application_1420859110621_0002 (state: ACCEPTED)
15/01/10 07:28:46 INFO yarn.Client: Application report for application_1420859110621_0002 (state: ACCEPTED)
15/01/10 07:28:47 INFO yarn.Client: Application report for application_1420859110621_0002 (state: ACCEPTED)
15/01/10 07:28:48 INFO yarn.Client: Application report for application_1420859110621_0002 (state: ACCEPTED)
///这是回到Spark运行时环境了?
15/01/10 07:28:48 INFO cluster.YarnClientSchedulerBackend: ApplicationMaster registered as Actor[akka.tcp://[email protected]:43444/user/YarnAM#-519598456]
15/01/10 07:28:48 INFO cluster.YarnClientSchedulerBackend: Add WebUI Filter. org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter, Map(PROXY_HOSTS -> hadoop.master, PROXY_URI_BASES -> http://hadoop.master:8088/proxy/application_1420859110621_0002), /proxy/application_1420859110621_0002
15/01/10 07:28:48 INFO ui.JettyUtils: Adding filter: org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter
//继续回到进度,此时已经从受理(ACCEPTED)状态到正在运行状态(RUNNING)
15/01/10 07:28:49 INFO yarn.Client: Application report for application_1420859110621_0002 (state: RUNNING)
15/01/10 07:28:49 INFO yarn.Client: 
	 client token: N/A
	 diagnostics: N/A
	 ApplicationMaster host: hadoop.master
	 ApplicationMaster RPC port: 0
	 queue: default
	 start time: 1420892902791
	 final status: UNDEFINED
	 tracking URL: http://hadoop.master:8088/proxy/application_1420859110621_0002/
	 user: hadoop
15/01/10 07:28:49 INFO cluster.YarnClientSchedulerBackend: Application application_1420859110621_0002 has started running.
15/01/10 07:28:51 INFO netty.NettyBlockTransferService: Server created on 45652
15/01/10 07:28:51 INFO storage.BlockManagerMaster: Trying to register BlockManager
15/01/10 07:28:51 INFO storage.BlockManagerMasterActor: Registering block manager localhost:45652 with 267.3 MB RAM, BlockManagerId(<driver>, localhost, 45652)
15/01/10 07:28:51 INFO storage.BlockManagerMaster: Registered BlockManager
15/01/10 07:28:51 INFO scheduler.EventLoggingListener: Logging events to hdfs://hadoop.master:9000/user/hadoop/sparkevt/application_1420859110621_0002
15/01/10 07:28:51 INFO cluster.YarnClientSchedulerBackend: SchedulerBackend is ready for scheduling beginning after waiting maxRegisteredResourcesWaitingTime: 30000(ms)
15/01/10 07:28:52 INFO storage.MemoryStore: ensureFreeSpace(216263) called with curMem=0, maxMem=280248975
15/01/10 07:28:52 INFO storage.MemoryStore: Block broadcast_0 stored as values in memory (estimated size 211.2 KB, free 267.1 MB)
15/01/10 07:28:52 INFO storage.MemoryStore: ensureFreeSpace(31667) called with curMem=216263, maxMem=280248975
15/01/10 07:28:52 INFO storage.MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 30.9 KB, free 267.0 MB)
15/01/10 07:28:52 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on localhost:45652 (size: 30.9 KB, free: 267.2 MB)
15/01/10 07:28:52 INFO storage.BlockManagerMaster: Updated info of block broadcast_0_piece0
15/01/10 07:28:52 INFO spark.SparkContext: Created broadcast 0 from textFile at SparkWordCount.scala:41
15/01/10 07:28:52 INFO mapred.FileInputFormat: Total input paths to process : 1
15/01/10 07:28:53 INFO spark.SparkContext: Starting job: sortByKey at SparkWordCount.scala:44
15/01/10 07:28:53 INFO scheduler.DAGScheduler: Registering RDD 3 (map at SparkWordCount.scala:44)
15/01/10 07:28:53 INFO scheduler.DAGScheduler: Got job 0 (sortByKey at SparkWordCount.scala:44) with 2 output partitions (allowLocal=false)
15/01/10 07:28:53 INFO scheduler.DAGScheduler: Final stage: Stage 1(sortByKey at SparkWordCount.scala:44)
15/01/10 07:28:53 INFO scheduler.DAGScheduler: Parents of final stage: List(Stage 0)
15/01/10 07:28:53 INFO scheduler.DAGScheduler: Missing parents: List(Stage 0)
15/01/10 07:28:53 INFO scheduler.DAGScheduler: Submitting Stage 0 (MappedRDD[3] at map at SparkWordCount.scala:44), which has no missing parents
15/01/10 07:28:53 INFO storage.MemoryStore: ensureFreeSpace(3528) called with curMem=247930, maxMem=280248975
15/01/10 07:28:53 INFO storage.MemoryStore: Block broadcast_1 stored as values in memory (estimated size 3.4 KB, free 267.0 MB)
15/01/10 07:28:53 INFO storage.MemoryStore: ensureFreeSpace(2498) called with curMem=251458, maxMem=280248975
15/01/10 07:28:53 INFO storage.MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 2.4 KB, free 267.0 MB)
15/01/10 07:28:53 INFO storage.BlockManagerInfo: Added broadcast_1_piece0 in memory on localhost:45652 (size: 2.4 KB, free: 267.2 MB)
15/01/10 07:28:53 INFO storage.BlockManagerMaster: Updated info of block broadcast_1_piece0
15/01/10 07:28:53 INFO spark.SparkContext: Created broadcast 1 from broadcast at DAGScheduler.scala:838
15/01/10 07:28:53 INFO scheduler.DAGScheduler: Submitting 2 missing tasks from Stage 0 (MappedRDD[3] at map at SparkWordCount.scala:44)
15/01/10 07:28:53 INFO cluster.YarnClientClusterScheduler: Adding task set 0.0 with 2 tasks
15/01/10 07:28:53 INFO util.RackResolver: Resolved hadoop.master to /default-rack
15/01/10 07:29:06 INFO cluster.YarnClientSchedulerBackend: Registered executor: Actor[akka.tcp://[email protected]:34326/user/Executor#-1519914394] with ID 1
15/01/10 07:29:06 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, hadoop.master, NODE_LOCAL, 1356 bytes)
15/01/10 07:29:06 INFO cluster.YarnClientSchedulerBackend: Registered executor: Actor[akka.tcp://[email protected]:59070/user/Executor#763574095] with ID 2
15/01/10 07:29:06 INFO scheduler.TaskSetManager: Starting task 1.0 in stage 0.0 (TID 1, hadoop.master, NODE_LOCAL, 1356 bytes)
15/01/10 07:29:09 INFO storage.BlockManagerMasterActor: Registering block manager hadoop.master:44394 with 267.3 MB RAM, BlockManagerId(1, hadoop.master, 44394)
15/01/10 07:29:09 INFO storage.BlockManagerMasterActor: Registering block manager hadoop.master:52439 with 267.3 MB RAM, BlockManagerId(2, hadoop.master, 52439)
15/01/10 07:29:11 INFO storage.BlockManagerInfo: Added broadcast_1_piece0 in memory on hadoop.master:52439 (size: 2.4 KB, free: 267.3 MB)
15/01/10 07:29:11 INFO storage.BlockManagerInfo: Added broadcast_1_piece0 in memory on hadoop.master:44394 (size: 2.4 KB, free: 267.3 MB)
15/01/10 07:29:15 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on hadoop.master:44394 (size: 30.9 KB, free: 267.2 MB)
15/01/10 07:29:15 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on hadoop.master:52439 (size: 30.9 KB, free: 267.2 MB)
15/01/10 07:29:34 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 27521 ms on hadoop.master (1/2)
15/01/10 07:29:34 INFO scheduler.TaskSetManager: Finished task 1.0 in stage 0.0 (TID 1) in 27381 ms on hadoop.master (2/2)
15/01/10 07:29:34 INFO scheduler.DAGScheduler: Stage 0 (map at SparkWordCount.scala:44) finished in 40.486 s
15/01/10 07:29:34 INFO scheduler.DAGScheduler: looking for newly runnable stages
15/01/10 07:29:34 INFO scheduler.DAGScheduler: running: Set()
15/01/10 07:29:34 INFO scheduler.DAGScheduler: waiting: Set(Stage 1)
15/01/10 07:29:34 INFO scheduler.DAGScheduler: failed: Set()
15/01/10 07:29:34 INFO cluster.YarnClientClusterScheduler: Removed TaskSet 0.0, whose tasks have all completed, from pool 
15/01/10 07:29:34 INFO scheduler.DAGScheduler: Missing parents for Stage 1: List()
15/01/10 07:29:34 INFO scheduler.DAGScheduler: Submitting Stage 1 (MapPartitionsRDD[7] at sortByKey at SparkWordCount.scala:44), which is now runnable
15/01/10 07:29:34 INFO storage.MemoryStore: ensureFreeSpace(3072) called with curMem=253956, maxMem=280248975
15/01/10 07:29:34 INFO storage.MemoryStore: Block broadcast_2 stored as values in memory (estimated size 3.0 KB, free 267.0 MB)
15/01/10 07:29:34 INFO storage.MemoryStore: ensureFreeSpace(2122) called with curMem=257028, maxMem=280248975
15/01/10 07:29:34 INFO storage.MemoryStore: Block broadcast_2_piece0 stored as bytes in memory (estimated size 2.1 KB, free 267.0 MB)
15/01/10 07:29:34 INFO storage.BlockManagerInfo: Added broadcast_2_piece0 in memory on localhost:45652 (size: 2.1 KB, free: 267.2 MB)
15/01/10 07:29:34 INFO storage.BlockManagerMaster: Updated info of block broadcast_2_piece0
15/01/10 07:29:34 INFO spark.SparkContext: Created broadcast 2 from broadcast at DAGScheduler.scala:838
15/01/10 07:29:34 INFO scheduler.DAGScheduler: Submitting 2 missing tasks from Stage 1 (MapPartitionsRDD[7] at sortByKey at SparkWordCount.scala:44)
15/01/10 07:29:34 INFO cluster.YarnClientClusterScheduler: Adding task set 1.0 with 2 tasks
15/01/10 07:29:34 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 1.0 (TID 2, hadoop.master, PROCESS_LOCAL, 1112 bytes)
15/01/10 07:29:34 INFO scheduler.TaskSetManager: Starting task 1.0 in stage 1.0 (TID 3, hadoop.master, PROCESS_LOCAL, 1112 bytes)
15/01/10 07:29:34 INFO storage.BlockManagerInfo: Added broadcast_2_piece0 in memory on hadoop.master:52439 (size: 2.1 KB, free: 267.2 MB)
15/01/10 07:29:35 INFO storage.BlockManagerInfo: Added broadcast_2_piece0 in memory on hadoop.master:44394 (size: 2.1 KB, free: 267.2 MB)
15/01/10 07:29:35 INFO spark.MapOutputTrackerMasterActor: Asked to send map output locations for shuffle 0 to [email protected]:59070
15/01/10 07:29:35 INFO spark.MapOutputTrackerMaster: Size of output statuses for shuffle 0 is 158 bytes
15/01/10 07:29:35 INFO spark.MapOutputTrackerMasterActor: Asked to send map output locations for shuffle 0 to [email protected]:34326
15/01/10 07:29:37 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 1.0 (TID 2) in 3297 ms on hadoop.master (1/2)
15/01/10 07:29:37 INFO scheduler.TaskSetManager: Finished task 1.0 in stage 1.0 (TID 3) in 3303 ms on hadoop.master (2/2)
15/01/10 07:29:37 INFO cluster.YarnClientClusterScheduler: Removed TaskSet 1.0, whose tasks have all completed, from pool 
15/01/10 07:29:37 INFO scheduler.DAGScheduler: Stage 1 (sortByKey at SparkWordCount.scala:44) finished in 3.307 s
///Job 0执行完成
15/01/10 07:29:37 INFO scheduler.DAGScheduler: Job 0 finished: sortByKey at SparkWordCount.scala:44, took 44.720124 s
15/01/10 07:29:38 INFO Configuration.deprecation: mapred.tip.id is deprecated. Instead, use mapreduce.task.id
15/01/10 07:29:38 INFO Configuration.deprecation: mapred.task.id is deprecated. Instead, use mapreduce.task.attempt.id
15/01/10 07:29:38 INFO Configuration.deprecation: mapred.task.is.map is deprecated. Instead, use mapreduce.task.ismap
15/01/10 07:29:38 INFO Configuration.deprecation: mapred.task.partition is deprecated. Instead, use mapreduce.task.partition
15/01/10 07:29:38 INFO Configuration.deprecation: mapred.job.id is deprecated. Instead, use mapreduce.job.id
15/01/10 07:29:38 INFO spark.SparkContext: Starting job: saveAsTextFile at SparkWordCount.scala:44
15/01/10 07:29:38 INFO scheduler.DAGScheduler: Registering RDD 5 (map at SparkWordCount.scala:44)
15/01/10 07:29:38 INFO scheduler.DAGScheduler: Got job 1 (saveAsTextFile at SparkWordCount.scala:44) with 2 output partitions (allowLocal=false)
15/01/10 07:29:38 INFO scheduler.DAGScheduler: Final stage: Stage 4(saveAsTextFile at SparkWordCount.scala:44)
15/01/10 07:29:38 INFO scheduler.DAGScheduler: Parents of final stage: List(Stage 3)
15/01/10 07:29:38 INFO scheduler.DAGScheduler: Missing parents: List(Stage 3)
15/01/10 07:29:38 INFO scheduler.DAGScheduler: Submitting Stage 3 (MappedRDD[5] at map at SparkWordCount.scala:44), which has no missing parents
15/01/10 07:29:38 INFO storage.MemoryStore: ensureFreeSpace(2992) called with curMem=259150, maxMem=280248975
15/01/10 07:29:38 INFO storage.MemoryStore: Block broadcast_3 stored as values in memory (estimated size 2.9 KB, free 267.0 MB)
15/01/10 07:29:38 INFO storage.MemoryStore: ensureFreeSpace(2168) called with curMem=262142, maxMem=280248975
15/01/10 07:29:38 INFO storage.MemoryStore: Block broadcast_3_piece0 stored as bytes in memory (estimated size 2.1 KB, free 267.0 MB)
15/01/10 07:29:38 INFO storage.BlockManagerInfo: Added broadcast_3_piece0 in memory on localhost:45652 (size: 2.1 KB, free: 267.2 MB)
15/01/10 07:29:38 INFO storage.BlockManagerMaster: Updated info of block broadcast_3_piece0
15/01/10 07:29:38 INFO spark.SparkContext: Created broadcast 3 from broadcast at DAGScheduler.scala:838
15/01/10 07:29:38 INFO scheduler.DAGScheduler: Submitting 2 missing tasks from Stage 3 (MappedRDD[5] at map at SparkWordCount.scala:44)
15/01/10 07:29:38 INFO cluster.YarnClientClusterScheduler: Adding task set 3.0 with 2 tasks
15/01/10 07:29:38 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 3.0 (TID 4, hadoop.master, PROCESS_LOCAL, 1101 bytes)
15/01/10 07:29:38 INFO scheduler.TaskSetManager: Starting task 1.0 in stage 3.0 (TID 5, hadoop.master, PROCESS_LOCAL, 1101 bytes)
15/01/10 07:29:38 INFO storage.BlockManagerInfo: Added broadcast_3_piece0 in memory on hadoop.master:52439 (size: 2.1 KB, free: 267.2 MB)
15/01/10 07:29:38 INFO storage.BlockManagerInfo: Added broadcast_3_piece0 in memory on hadoop.master:44394 (size: 2.1 KB, free: 267.2 MB)
15/01/10 07:29:38 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 3.0 (TID 4) in 441 ms on hadoop.master (1/2)
15/01/10 07:29:38 INFO scheduler.TaskSetManager: Finished task 1.0 in stage 3.0 (TID 5) in 470 ms on hadoop.master (2/2)
15/01/10 07:29:38 INFO cluster.YarnClientClusterScheduler: Removed TaskSet 3.0, whose tasks have all completed, from pool 
15/01/10 07:29:38 INFO scheduler.DAGScheduler: Stage 3 (map at SparkWordCount.scala:44) finished in 0.474 s
15/01/10 07:29:38 INFO scheduler.DAGScheduler: looking for newly runnable stages
15/01/10 07:29:38 INFO scheduler.DAGScheduler: running: Set()
15/01/10 07:29:38 INFO scheduler.DAGScheduler: waiting: Set(Stage 4)
15/01/10 07:29:38 INFO scheduler.DAGScheduler: failed: Set()
15/01/10 07:29:39 INFO scheduler.DAGScheduler: Missing parents for Stage 4: List()
15/01/10 07:29:39 INFO scheduler.DAGScheduler: Submitting Stage 4 (MappedRDD[10] at saveAsTextFile at SparkWordCount.scala:44), which is now runnable
15/01/10 07:29:39 INFO storage.MemoryStore: ensureFreeSpace(113152) called with curMem=264310, maxMem=280248975
15/01/10 07:29:39 INFO storage.MemoryStore: Block broadcast_4 stored as values in memory (estimated size 110.5 KB, free 266.9 MB)
15/01/10 07:29:39 INFO storage.MemoryStore: ensureFreeSpace(68432) called with curMem=377462, maxMem=280248975
15/01/10 07:29:39 INFO storage.MemoryStore: Block broadcast_4_piece0 stored as bytes in memory (estimated size 66.8 KB, free 266.8 MB)
15/01/10 07:29:39 INFO storage.BlockManagerInfo: Added broadcast_4_piece0 in memory on localhost:45652 (size: 66.8 KB, free: 267.2 MB)
15/01/10 07:29:39 INFO storage.BlockManagerMaster: Updated info of block broadcast_4_piece0
15/01/10 07:29:39 INFO spark.SparkContext: Created broadcast 4 from broadcast at DAGScheduler.scala:838
15/01/10 07:29:39 INFO scheduler.DAGScheduler: Submitting 2 missing tasks from Stage 4 (MappedRDD[10] at saveAsTextFile at SparkWordCount.scala:44)
15/01/10 07:29:39 INFO cluster.YarnClientClusterScheduler: Adding task set 4.0 with 2 tasks
15/01/10 07:29:39 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 4.0 (TID 6, hadoop.master, PROCESS_LOCAL, 1112 bytes)
15/01/10 07:29:39 INFO scheduler.TaskSetManager: Starting task 1.0 in stage 4.0 (TID 7, hadoop.master, PROCESS_LOCAL, 1112 bytes)
15/01/10 07:29:39 INFO storage.BlockManagerInfo: Added broadcast_4_piece0 in memory on hadoop.master:52439 (size: 66.8 KB, free: 267.2 MB)
15/01/10 07:29:39 INFO storage.BlockManagerInfo: Added broadcast_4_piece0 in memory on hadoop.master:44394 (size: 66.8 KB, free: 267.2 MB)
15/01/10 07:29:40 INFO spark.MapOutputTrackerMasterActor: Asked to send map output locations for shuffle 1 to [email protected]:34326
15/01/10 07:29:40 INFO spark.MapOutputTrackerMaster: Size of output statuses for shuffle 1 is 158 bytes
15/01/10 07:29:40 INFO spark.MapOutputTrackerMasterActor: Asked to send map output locations for shuffle 1 to [email protected]:59070
15/01/10 07:29:42 INFO scheduler.TaskSetManager: Finished task 1.0 in stage 4.0 (TID 7) in 3184 ms on hadoop.master (1/2)
15/01/10 07:29:42 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 4.0 (TID 6) in 3213 ms on hadoop.master (2/2)
15/01/10 07:29:42 INFO cluster.YarnClientClusterScheduler: Removed TaskSet 4.0, whose tasks have all completed, from pool 
15/01/10 07:29:42 INFO scheduler.DAGScheduler: Stage 4 (saveAsTextFile at SparkWordCount.scala:44) finished in 3.215 s
////Job 1执行完成,至此所有的Job都执行完成了
15/01/10 07:29:42 INFO scheduler.DAGScheduler: Job 1 finished: saveAsTextFile at SparkWordCount.scala:44, took 3.969958 s

////下面这一坨是在干啥??
15/01/10 07:29:42 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/stages/stage/kill,null}
15/01/10 07:29:42 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/,null}
15/01/10 07:29:42 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/static,null}
15/01/10 07:29:42 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/executors/threadDump/json,null}
15/01/10 07:29:42 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/executors/threadDump,null}
15/01/10 07:29:42 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/executors/json,null}
15/01/10 07:29:42 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/executors,null}
15/01/10 07:29:42 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/environment/json,null}
15/01/10 07:29:42 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/environment,null}
15/01/10 07:29:42 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/storage/rdd/json,null}
15/01/10 07:29:42 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/storage/rdd,null}
15/01/10 07:29:42 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/storage/json,null}
15/01/10 07:29:42 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/storage,null}
15/01/10 07:29:42 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/stages/pool/json,null}
15/01/10 07:29:42 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/stages/pool,null}
15/01/10 07:29:42 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/stages/stage/json,null}
15/01/10 07:29:42 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/stages/stage,null}
15/01/10 07:29:42 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/stages/json,null}
15/01/10 07:29:42 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/stages,null}
15/01/10 07:29:42 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/jobs/job/json,null}
15/01/10 07:29:42 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/jobs/job,null}
15/01/10 07:29:42 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/jobs/json,null}
15/01/10 07:29:42 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/jobs,null}

////Spark web UI被停了,只能看History了
15/01/10 07:29:42 INFO ui.SparkUI: Stopped Spark web UI at http://localhost:4040
15/01/10 07:29:42 INFO scheduler.DAGScheduler: Stopping DAGScheduler
15/01/10 07:29:42 INFO cluster.YarnClientSchedulerBackend: Shutting down all executors
15/01/10 07:29:43 INFO cluster.YarnClientSchedulerBackend: Asking each executor to shut down
15/01/10 07:29:43 INFO cluster.YarnClientSchedulerBackend: Stopped
15/01/10 07:29:44 INFO spark.MapOutputTrackerMasterActor: MapOutputTrackerActor stopped!
15/01/10 07:29:44 INFO storage.MemoryStore: MemoryStore cleared
15/01/10 07:29:44 INFO storage.BlockManager: BlockManager stopped
15/01/10 07:29:44 INFO storage.BlockManagerMaster: BlockManagerMaster stopped

///以上是资源释放的逻辑
//1. Executor停止 2. MemoryStore清空 3. BlockManager停止 4. BlockManagerMaster停止
15/01/10 07:29:44 INFO remote.RemoteActorRefProvider$RemotingTerminator: Shutting down remote daemon.
15/01/10 07:29:44 INFO remote.RemoteActorRefProvider$RemotingTerminator: Remote daemon shut down; proceeding with flushing remote transports.

////关闭SparkContext,整个任务结束
15/01/10 07:29:44 INFO spark.SparkContext: Successfully stopped SparkContext

 

访问http://hadoop.master:8088查看任务的执行情况。如下图所示,Spark的WorkCount程序确实能在Hadoop上看到,并且计算类型为Spark

 

访问http://hadoop.master:4040,不出意外的不能访问,原因是Spark在程序运行完成后,自动的将这个服务关闭了,这就是这个web服务是跟应用绑定的,而不是跟Spark绑定的。

访问http://hadoop.master:18080

 

 

yarn-cluster部署

1. 任务提交命令

./spark-submit --name SparkWordCount --class spark.examples.SparkWordCount --master yarn-cluster --executor-memory 512M --total-executor-cores 1 SparkWordCount.jar README.md

2. 任务日志

对以yarn-cluster模式运行产生的日志又产生不小的惊讶,原因是不像yarn-client那样产出了过程很复杂的日志,这里产生的日志很简单,总结下来就是任务被受理(ACCEPTED),任务处理中(RUNNING)以及任务执行完毕(FINISHED),除此别无其它,尤其是没有Spark的产生的日志...

[hadoop@hadoop bin]$ sh submitSparkApplicationYarnCluster.sh  ////提交yarn-cluster模式的Spark应用程序
Delete the HDFS output directory ///删除上次任务执行时创建的HDFS输出目录
15/01/10 07:56:30 INFO fs.TrashPolicyDefault: Namenode trash configuration: Deletion interval = 0 minutes, Emptier interval = 0 minutes.
Deleted /user/hadoop/SortedWordCountRDDInSparkApplication
Spark assembly has been built with Hive, including Datanucleus jars on classpath
15/01/10 07:56:35 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
///提交应用
///申请资源
15/01/10 07:56:36 INFO client.RMProxy: Connecting to ResourceManager at hadoop.master/192.168.26.136:8032
15/01/10 07:56:38 INFO yarn.Client: Requesting a new application from cluster with 1 NodeManagers
15/01/10 07:56:38 INFO yarn.Client: Verifying our application has not requested more than the maximum memory capability of the cluster (8192 MB per container)
15/01/10 07:56:38 INFO yarn.Client: Will allocate AM container, with 896 MB memory including 384 MB overhead
15/01/10 07:56:38 INFO yarn.Client: Setting up container launch context for our AM
15/01/10 07:56:38 INFO yarn.Client: Preparing resources for our AM container

////spark-assembly-1.2.0-hadoop2.4.0.jar又一次被送到HDFS上了
15/01/10 07:56:39 INFO yarn.Client: Uploading resource file:/home/hadoop/software/spark-1.2.0-bin-hadoop2.4/lib/spark-assembly-1.2.0-hadoop2.4.0.jar -> hdfs://hadoop.master:9000/user/hadoop/.sparkStaging/application_1420859110621_0003/spark-assembly-1.2.0-hadoop2.4.0.jar

////应用程序本身的jar包
15/01/10 07:56:49 INFO yarn.Client: Uploading resource file:/home/hadoop/software/spark-1.2.0-bin-hadoop2.4/bin/SparkWordCount.jar -> hdfs://hadoop.master:9000/user/hadoop/.sparkStaging/application_1420859110621_0003/SparkWordCount.jar
15/01/10 07:56:49 INFO yarn.Client: Setting up the launch environment for our AM container
15/01/10 07:56:49 INFO spark.SecurityManager: Changing view acls to: hadoop
15/01/10 07:56:49 INFO spark.SecurityManager: Changing modify acls to: hadoop
15/01/10 07:56:49 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(hadoop); users with modify permissions: Set(hadoop)
///任务提交到ResourceManager
15/01/10 07:56:49 INFO yarn.Client: Submitting application 3 to ResourceManager
15/01/10 07:56:49 INFO impl.YarnClientImpl: Submitted application application_1420859110621_0003
15/01/10 07:56:50 INFO yarn.Client: Application report for application_1420859110621_0003 (state: ACCEPTED)
15/01/10 07:56:50 INFO yarn.Client: 
	 client token: N/A
	 diagnostics: N/A
	 ApplicationMaster host: N/A
	 ApplicationMaster RPC port: -1
	 queue: default
	 start time: 1420894609440
	 final status: UNDEFINED
	 tracking URL: http://hadoop.master:8088/proxy/application_1420859110621_0003/
	 user: hadoop

////任务已受理
15/01/10 07:56:51 INFO yarn.Client: Application report for application_1420859110621_0003 (state: ACCEPTED)
15/01/10 07:56:52 INFO yarn.Client: Application report for application_1420859110621_0003 (state: ACCEPTED)
15/01/10 07:56:53 INFO yarn.Client: Application report for application_1420859110621_0003 (state: ACCEPTED)
15/01/10 07:56:54 INFO yarn.Client: Application report for application_1420859110621_0003 (state: ACCEPTED)
15/01/10 07:56:55 INFO yarn.Client: Application report for application_1420859110621_0003 (state: ACCEPTED)
15/01/10 07:56:56 INFO yarn.Client: Application report for application_1420859110621_0003 (state: ACCEPTED)
15/01/10 07:56:57 INFO yarn.Client: Application report for application_1420859110621_0003 (state: ACCEPTED)
15/01/10 07:56:58 INFO yarn.Client: Application report for application_1420859110621_0003 (state: ACCEPTED)
15/01/10 07:56:59 INFO yarn.Client: Application report for application_1420859110621_0003 (state: ACCEPTED)
15/01/10 07:57:00 INFO yarn.Client: Application report for application_1420859110621_0003 (state: ACCEPTED)
15/01/10 07:57:01 INFO yarn.Client: Application report for application_1420859110621_0003 (state: ACCEPTED)
15/01/10 07:57:02 INFO yarn.Client: Application report for application_1420859110621_0003 (state: ACCEPTED)
15/01/10 07:57:03 INFO yarn.Client: Application report for application_1420859110621_0003 (state: ACCEPTED)
15/01/10 07:57:04 INFO yarn.Client: Application report for application_1420859110621_0003 (state: ACCEPTED)
15/01/10 07:57:05 INFO yarn.Client: Application report for application_1420859110621_0003 (state: ACCEPTED)
15/01/10 07:57:06 INFO yarn.Client: Application report for application_1420859110621_0003 (state: RUNNING)
15/01/10 07:57:06 INFO yarn.Client: 
	 client token: N/A
	 diagnostics: N/A
	 ApplicationMaster host: hadoop.master
	 ApplicationMaster RPC port: 0
	 queue: default
	 start time: 1420894609440
	 final status: UNDEFINED
	 tracking URL: http://hadoop.master:8088/proxy/application_1420859110621_0003/
	 user: hadoop
////任务开始执行
15/01/10 07:57:07 INFO yarn.Client: Application report for application_1420859110621_0003 (state: RUNNING)
15/01/10 07:57:08 INFO yarn.Client: Application report for application_1420859110621_0003 (state: RUNNING)
15/01/10 07:57:09 INFO yarn.Client: Application report for application_1420859110621_0003 (state: RUNNING)
15/01/10 07:57:10 INFO yarn.Client: Application report for application_1420859110621_0003 (state: RUNNING)
15/01/10 07:57:11 INFO yarn.Client: Application report for application_1420859110621_0003 (state: RUNNING)
15/01/10 07:57:12 INFO yarn.Client: Application report for application_1420859110621_0003 (state: RUNNING)
15/01/10 07:57:13 INFO yarn.Client: Application report for application_1420859110621_0003 (state: RUNNING)
15/01/10 07:57:14 INFO yarn.Client: Application report for application_1420859110621_0003 (state: RUNNING)
15/01/10 07:57:15 INFO yarn.Client: Application report for application_1420859110621_0003 (state: RUNNING)
15/01/10 07:57:16 INFO yarn.Client: Application report for application_1420859110621_0003 (state: RUNNING)
15/01/10 07:57:17 INFO yarn.Client: Application report for application_1420859110621_0003 (state: RUNNING)
15/01/10 07:57:18 INFO yarn.Client: Application report for application_1420859110621_0003 (state: RUNNING)
15/01/10 07:57:19 INFO yarn.Client: Application report for application_1420859110621_0003 (state: RUNNING)
15/01/10 07:57:20 INFO yarn.Client: Application report for application_1420859110621_0003 (state: RUNNING)
15/01/10 07:57:21 INFO yarn.Client: Application report for application_1420859110621_0003 (state: RUNNING)
15/01/10 07:57:22 INFO yarn.Client: Application report for application_1420859110621_0003 (state: RUNNING)
15/01/10 07:57:23 INFO yarn.Client: Application report for application_1420859110621_0003 (state: RUNNING)
15/01/10 07:57:25 INFO yarn.Client: Application report for application_1420859110621_0003 (state: RUNNING)
15/01/10 07:57:26 INFO yarn.Client: Application report for application_1420859110621_0003 (state: RUNNING)
15/01/10 07:57:27 INFO yarn.Client: Application report for application_1420859110621_0003 (state: RUNNING)
15/01/10 07:57:28 INFO yarn.Client: Application report for application_1420859110621_0003 (state: RUNNING)
15/01/10 07:57:29 INFO yarn.Client: Application report for application_1420859110621_0003 (state: RUNNING)
15/01/10 07:57:30 INFO yarn.Client: Application report for application_1420859110621_0003 (state: RUNNING)
15/01/10 07:57:31 INFO yarn.Client: Application report for application_1420859110621_0003 (state: RUNNING)
15/01/10 07:57:32 INFO yarn.Client: Application report for application_1420859110621_0003 (state: RUNNING)
15/01/10 07:57:33 INFO yarn.Client: Application report for application_1420859110621_0003 (state: RUNNING)
15/01/10 07:57:34 INFO yarn.Client: Application report for application_1420859110621_0003 (state: RUNNING)
15/01/10 07:57:35 INFO yarn.Client: Application report for application_1420859110621_0003 (state: RUNNING)
15/01/10 07:57:36 INFO yarn.Client: Application report for application_1420859110621_0003 (state: RUNNING)
15/01/10 07:57:37 INFO yarn.Client: Application report for application_1420859110621_0003 (state: RUNNING)
15/01/10 07:57:38 INFO yarn.Client: Application report for application_1420859110621_0003 (state: RUNNING)
15/01/10 07:57:39 INFO yarn.Client: Application report for application_1420859110621_0003 (state: RUNNING)
15/01/10 07:57:40 INFO yarn.Client: Application report for application_1420859110621_0003 (state: RUNNING)
15/01/10 07:57:41 INFO yarn.Client: Application report for application_1420859110621_0003 (state: RUNNING)

///任务执行完毕
15/01/10 07:57:42 INFO yarn.Client: Application report for application_1420859110621_0003 (state: FINISHED)
15/01/10 07:57:42 INFO yarn.Client: 
	 client token: N/A
	 diagnostics: N/A
	 ApplicationMaster host: hadoop.master
	 ApplicationMaster RPC port: 0
	 queue: default
	 start time: 1420894609440
	 final status: SUCCEEDED
	 tracking URL: http://hadoop.master:8088/proxy/application_1420859110621_0003/A
	 user: hadoop

 

3. 状态查看

3.1 采用yarn-cluster方式,因为driver在YARN中运行,要通过webUI访问driver的状态,需要点YARN中该job的Tracking UI。点TrackingUI上History会打开这个应用程序的历史记录,但是当点它的时候,却访问不到,原因是

Hadoop没有启动history server,应该使用Hadoop目录下的sbin/mr-jobhistory-daemon.sh启动history server
3.2 采用yarn-cluster方式,因为driver在YARN中运行,所以程序的运行结果不能在客户端显示,所以最好将结果保存在hdfs上,客户端的终端显示的是作为YARN的job的运行情况

3.3 访问http://hadoop.master:8088查看任务执行结果

 

 

 本文参考:http://blog.csdn.net/book_mmicky/article/details/25714287

猜你喜欢

转载自bit1129.iteye.com/blog/2174677