Spark WordCount 执行流程

[hadoop@isstech001 ~]$ spark-submit --class WordCount /home/hadoop/WordCount.jar
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
18/12/04 21:35:11 INFO SparkContext: Running Spark version 1.6.3
18/12/04 21:35:11 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
18/12/04 21:35:12 INFO SecurityManager: Changing view acls to: hadoop
18/12/04 21:35:12 INFO SecurityManager: Changing modify acls to: hadoop
18/12/04 21:35:12 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(hadoop); users with modify permissions: Set(hadoop)
18/12/04 21:35:12 INFO Utils: Successfully started service 'sparkDriver' on port 36068.
18/12/04 21:35:13 INFO Slf4jLogger: Slf4jLogger started
18/12/04 21:35:13 INFO Remoting: Starting remoting
18/12/04 21:35:13 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://[email protected]:33396]
18/12/04 21:35:13 INFO Utils: Successfully started service 'sparkDriverActorSystem' on port 33396.
18/12/04 21:35:13 INFO SparkEnv: Registering MapOutputTracker
18/12/04 21:35:13 INFO SparkEnv: Registering BlockManagerMaster
18/12/04 21:35:13 INFO DiskBlockManager: Created local directory at /tmp/blockmgr-cc216cca-d8bf-4163-946d-1c11eb183fcb
18/12/04 21:35:13 INFO MemoryStore: MemoryStore started with capacity 517.4 MB
18/12/04 21:35:14 INFO SparkEnv: Registering OutputCommitCoordinator
18/12/04 21:35:42 WARN Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041.
18/12/04 21:36:10 INFO Utils: Successfully started service 'SparkUI' on port 4041.
18/12/04 21:36:10 INFO SparkUI: Started SparkUI at http://192.168.66.81:4041
18/12/04 21:36:10 INFO HttpFileServer: HTTP File server directory is /tmp/spark-7f6bd0a0-2d36-45a4-b9e4-33f4fbbbc378/httpd-dc31af63-8bdb-4e9f-979d-cf6205cbeacd
18/12/04 21:36:10 INFO HttpServer: Starting HTTP Server
18/12/04 21:36:10 INFO Utils: Successfully started service 'HTTP file server' on port 58873.
18/12/04 21:36:10 INFO SparkContext: Added JAR file:/home/hadoop/WordCount.jar at http://192.168.66.81:58873/jars/WordCount.jar with timestamp 1543930570661
18/12/04 21:36:10 INFO Executor: Starting executor ID driver on host localhost
18/12/04 21:36:10 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 48979.
18/12/04 21:36:10 INFO NettyBlockTransferService: Server created on 48979
18/12/04 21:36:10 INFO BlockManagerMaster: Trying to register BlockManager
18/12/04 21:36:10 INFO BlockManagerMasterEndpoint: Registering block manager localhost:48979 with 517.4 MB RAM, BlockManagerId(driver, localhost, 48979)
18/12/04 21:36:10 INFO BlockManagerMaster: Registered BlockManager
18/12/04 21:36:11 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 134.7 KB, free 517.3 MB)
18/12/04 21:36:11 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 14.6 KB, free 517.3 MB)
18/12/04 21:36:11 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on localhost:48979 (size: 14.6 KB, free: 517.4 MB)
18/12/04 21:36:11 INFO SparkContext: Created broadcast 0 from textFile at WordCount.scala:9
18/12/04 21:36:11 INFO FileInputFormat: Total input paths to process : 1
18/12/04 21:36:11 INFO SparkContext: Starting job: foreach at WordCount.scala:11
18/12/04 21:36:11 INFO DAGScheduler: Registering RDD 3 (map at WordCount.scala:10)
18/12/04 21:36:11 INFO DAGScheduler: Got job 0 (foreach at WordCount.scala:11) with 1 output partitions
18/12/04 21:36:11 INFO DAGScheduler: Final stage: ResultStage 1 (foreach at WordCount.scala:11)
18/12/04 21:36:11 INFO DAGScheduler: Parents of final stage: List(ShuffleMapStage 0)
18/12/04 21:36:11 INFO DAGScheduler: Missing parents: List(ShuffleMapStage 0)
18/12/04 21:36:12 INFO DAGScheduler: Submitting ShuffleMapStage 0 (MapPartitionsRDD[3] at map at WordCount.scala:10), which has no missing parents
18/12/04 21:36:12 INFO MemoryStore: Block broadcast_1 stored as values in memory (estimated size 4.1 KB, free 517.3 MB)
18/12/04 21:36:12 INFO MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 2.3 KB, free 517.3 MB)
18/12/04 21:36:12 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on localhost:48979 (size: 2.3 KB, free: 517.4 MB)
18/12/04 21:36:12 INFO SparkContext: Created broadcast 1 from broadcast at DAGScheduler.scala:1006
18/12/04 21:36:12 INFO DAGScheduler: Submitting 1 missing tasks from ShuffleMapStage 0 (MapPartitionsRDD[3] at map at WordCount.scala:10)
18/12/04 21:36:12 INFO TaskSchedulerImpl: Adding task set 0.0 with 1 tasks
18/12/04 21:36:12 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, localhost, partition 0,PROCESS_LOCAL, 2172 bytes)
18/12/04 21:36:12 INFO Executor: Running task 0.0 in stage 0.0 (TID 0)
18/12/04 21:36:12 INFO Executor: Fetching http://192.168.66.81:58873/jars/WordCount.jar with timestamp 1543930570661
18/12/04 21:36:12 INFO Utils: Fetching http://192.168.66.81:58873/jars/WordCount.jar to /tmp/spark-7f6bd0a0-2d36-45a4-b9e4-33f4fbbbc378/userFiles-304edb0f-728a-4f10-a0f8-fc78d84b19d3/fetchFileTemp2708984029694120651.tmp
18/12/04 21:36:12 INFO Executor: Adding file:/tmp/spark-7f6bd0a0-2d36-45a4-b9e4-33f4fbbbc378/userFiles-304edb0f-728a-4f10-a0f8-fc78d84b19d3/WordCount.jar to class loader
18/12/04 21:36:12 INFO HadoopRDD: Input split: file:/home/hadoop/word.txt:0+986
18/12/04 21:36:12 INFO deprecation: mapred.tip.id is deprecated. Instead, use mapreduce.task.id
18/12/04 21:36:12 INFO deprecation: mapred.task.id is deprecated. Instead, use mapreduce.task.attempt.id
18/12/04 21:36:12 INFO deprecation: mapred.task.is.map is deprecated. Instead, use mapreduce.task.ismap
18/12/04 21:36:12 INFO deprecation: mapred.task.partition is deprecated. Instead, use mapreduce.task.partition
18/12/04 21:36:12 INFO deprecation: mapred.job.id is deprecated. Instead, use mapreduce.job.id
18/12/04 21:36:12 INFO Executor: Finished task 0.0 in stage 0.0 (TID 0). 2253 bytes result sent to driver
18/12/04 21:36:12 INFO DAGScheduler: ShuffleMapStage 0 (map at WordCount.scala:10) finished in 0.383 s
18/12/04 21:36:12 INFO DAGScheduler: looking for newly runnable stages
18/12/04 21:36:12 INFO DAGScheduler: running: Set()
18/12/04 21:36:12 INFO TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 352 ms on localhost (1/1)
18/12/04 21:36:12 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool 
18/12/04 21:36:12 INFO DAGScheduler: waiting: Set(ResultStage 1)
18/12/04 21:36:12 INFO DAGScheduler: failed: Set()
18/12/04 21:36:12 INFO DAGScheduler: Submitting ResultStage 1 (ShuffledRDD[4] at reduceByKey at WordCount.scala:10), which has no missing parents
18/12/04 21:36:12 INFO MemoryStore: Block broadcast_2 stored as values in memory (estimated size 2.5 KB, free 517.3 MB)
18/12/04 21:36:12 INFO MemoryStore: Block broadcast_2_piece0 stored as bytes in memory (estimated size 1571.0 B, free 517.3 MB)
18/12/04 21:36:12 INFO BlockManagerInfo: Added broadcast_2_piece0 in memory on localhost:48979 (size: 1571.0 B, free: 517.4 MB)
18/12/04 21:36:12 INFO SparkContext: Created broadcast 2 from broadcast at DAGScheduler.scala:1006
18/12/04 21:36:12 INFO DAGScheduler: Submitting 1 missing tasks from ResultStage 1 (ShuffledRDD[4] at reduceByKey at WordCount.scala:10)
18/12/04 21:36:12 INFO TaskSchedulerImpl: Adding task set 1.0 with 1 tasks
18/12/04 21:36:12 INFO TaskSetManager: Starting task 0.0 in stage 1.0 (TID 1, localhost, partition 0,NODE_LOCAL, 1949 bytes)
18/12/04 21:36:12 INFO Executor: Running task 0.0 in stage 1.0 (TID 1)
18/12/04 21:36:12 INFO ShuffleBlockFetcherIterator: Getting 1 non-empty blocks out of 1 blocks
18/12/04 21:36:12 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 7 ms
(spark,10)
(you,2)
(gwp,1)
(hive,10)
(是,1)
(hadoop,10)
(R,10)
(love,2)
(i,1)
(sparkR,102)
(word,1)
(你,1)
(isstech,1)
(yarn,1)
(我,1)
(sql,1)
(爸爸,1)
(me,1)
(ty,1)
18/12/04 21:36:12 INFO Executor: Finished task 0.0 in stage 1.0 (TID 1). 1165 bytes result sent to driver
18/12/04 21:36:12 INFO DAGScheduler: ResultStage 1 (foreach at WordCount.scala:11) finished in 0.085 s
18/12/04 21:36:12 INFO TaskSetManager: Finished task 0.0 in stage 1.0 (TID 1) in 88 ms on localhost (1/1)
18/12/04 21:36:12 INFO TaskSchedulerImpl: Removed TaskSet 1.0, whose tasks have all completed, from pool 
18/12/04 21:36:12 INFO DAGScheduler: Job 0 finished: foreach at WordCount.scala:11, took 0.730115 s
18/12/04 21:36:12 INFO SparkContext: Invoking stop() from shutdown hook
18/12/04 21:36:12 INFO SparkUI: Stopped Spark web UI at http://192.168.66.81:4041
18/12/04 21:36:12 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
18/12/04 21:36:12 INFO MemoryStore: MemoryStore cleared
18/12/04 21:36:12 INFO BlockManager: BlockManager stopped
18/12/04 21:36:12 INFO BlockManagerMaster: BlockManagerMaster stopped
18/12/04 21:36:12 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
18/12/04 21:36:12 INFO SparkContext: Successfully stopped SparkContext
18/12/04 21:36:12 INFO ShutdownHookManager: Shutdown hook called
18/12/04 21:36:12 INFO ShutdownHookManager: Deleting directory /tmp/spark-7f6bd0a0-2d36-45a4-b9e4-33f4fbbbc378/httpd-dc31af63-8bdb-4e9f-979d-cf6205cbeacd
18/12/04 21:36:12 INFO ShutdownHookManager: Deleting directory /tmp/spark-7f6bd0a0-2d36-45a4-b9e4-33f4fbbbc378

  

猜你喜欢

转载自www.cnblogs.com/RHadoop-Hive/p/10065803.html