Spark实践 | 矩阵乘法 && 各省份的学生平均成绩

问题背景

详见之前博文
矩阵乘法
各省份的学生平均成绩

该博文前置博文
Spark | 第一个例子WordCount

结果记录在日志里

矩阵乘法

代码如下:

object MatricProduct {
  def main(args: Array[String]) {
    if (args.length < 1) {
      System.err.println("Usage: <file>")
      System.exit(1)
    }

    val conf = new SparkConf()
    val sc = new SparkContext(conf)
    val mats = sc.textFile(args(0))

    val firstMat = mats.filter(line=>line.contains("M"))
    val secondMat = mats.filter(line=>line.contains("N"))

    val firstItems = firstMat.map(line => {
      val lineSplit = line.split(" ")
      (lineSplit(2), (lineSplit(0), lineSplit(1), lineSplit(3)))
    })
    val secondItems = secondMat.map(line=>{
      val lineSplit = line.split(" ")
      (lineSplit(1), (lineSplit(0), lineSplit(2), lineSplit(3)))
    })

    val newItems = firstItems.join(secondItems).values.map(v=>{
      (v._1._2 + " " + v._2._2, v._1._3.toDouble * v._2._3.toDouble)
    })

    val res = newItems.reduceByKey((x,y)=>x+y)
    res.collect().foreach(x=>println(x._1+" "+x._2))

    sc.stop()
  }
}

spark-submit 部署应用

spark-submit --name MatricProduct --class edu.test.MatricProduct --executor-memory 512M --total-executor-cores 1 /home/jackherrick/Documents/MatricProduct.jar hdfs:/matric_spark/matric.txt

日志如下:
这里写图片描述

18/04/22 09:31:30 INFO spark.SparkContext: Running Spark version 2.2.0
18/04/22 09:31:30 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
18/04/22 09:31:31 INFO spark.SparkContext: Submitted application: MatricProduct
18/04/22 09:31:31 INFO spark.SecurityManager: Changing view acls to: jackherrick
18/04/22 09:31:31 INFO spark.SecurityManager: Changing modify acls to: jackherrick
18/04/22 09:31:31 INFO spark.SecurityManager: Changing view acls groups to: 
18/04/22 09:31:31 INFO spark.SecurityManager: Changing modify acls groups to: 
18/04/22 09:31:31 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(jackherrick); groups with view permissions: Set(); users  with modify permissions: Set(jackherrick); groups with modify permissions: Set()
18/04/22 09:31:31 INFO util.Utils: Successfully started service 'sparkDriver' on port 37737.
18/04/22 09:31:31 INFO spark.SparkEnv: Registering MapOutputTracker
18/04/22 09:31:31 INFO spark.SparkEnv: Registering BlockManagerMaster
18/04/22 09:31:31 INFO storage.BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
18/04/22 09:31:31 INFO storage.BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
18/04/22 09:31:31 INFO storage.DiskBlockManager: Created local directory at /tmp/blockmgr-a675e15d-7662-41bb-8923-84bcf3471595
18/04/22 09:31:31 INFO memory.MemoryStore: MemoryStore started with capacity 366.3 MB
18/04/22 09:31:32 INFO spark.SparkEnv: Registering OutputCommitCoordinator
18/04/22 09:31:32 INFO util.log: Logging initialized @3997ms
18/04/22 09:31:32 INFO server.Server: jetty-9.3.z-SNAPSHOT
18/04/22 09:31:32 INFO server.Server: Started @4145ms
18/04/22 09:31:32 INFO server.AbstractConnector: Started ServerConnector@1bc01d90{HTTP/1.1,[http/1.1]}{0.0.0.0:4040}
18/04/22 09:31:32 INFO util.Utils: Successfully started service 'SparkUI' on port 4040.
18/04/22 09:31:32 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@287f94b1{/jobs,null,AVAILABLE,@Spark}
18/04/22 09:31:32 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@2ad3a1bb{/jobs/json,null,AVAILABLE,@Spark}
18/04/22 09:31:32 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@324c64cd{/jobs/job,null,AVAILABLE,@Spark}
18/04/22 09:31:32 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@5bd73d1a{/jobs/job/json,null,AVAILABLE,@Spark}
18/04/22 09:31:32 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@2555fff0{/stages,null,AVAILABLE,@Spark}
18/04/22 09:31:32 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@120f38e6{/stages/json,null,AVAILABLE,@Spark}
18/04/22 09:31:32 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@702ed190{/stages/stage,null,AVAILABLE,@Spark}
18/04/22 09:31:32 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@70e29e14{/stages/stage/json,null,AVAILABLE,@Spark}
18/04/22 09:31:32 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@5a4bef8{/stages/pool,null,AVAILABLE,@Spark}
18/04/22 09:31:32 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@2449cff7{/stages/pool/json,null,AVAILABLE,@Spark}
18/04/22 09:31:32 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@62da83ed{/storage,null,AVAILABLE,@Spark}
18/04/22 09:31:32 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@37d80fe7{/storage/json,null,AVAILABLE,@Spark}
18/04/22 09:31:32 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@e3cee7b{/storage/rdd,null,AVAILABLE,@Spark}
18/04/22 09:31:32 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@6b9267b{/storage/rdd/json,null,AVAILABLE,@Spark}
18/04/22 09:31:32 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@29ad44e3{/environment,null,AVAILABLE,@Spark}
18/04/22 09:31:32 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@5af9926a{/environment/json,null,AVAILABLE,@Spark}
18/04/22 09:31:32 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@fac80{/executors,null,AVAILABLE,@Spark}
18/04/22 09:31:32 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@649f2009{/executors/json,null,AVAILABLE,@Spark}
18/04/22 09:31:32 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@69adf72c{/executors/threadDump,null,AVAILABLE,@Spark}
18/04/22 09:31:32 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@1a15b789{/executors/threadDump/json,null,AVAILABLE,@Spark}
18/04/22 09:31:32 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@51650883{/static,null,AVAILABLE,@Spark}
18/04/22 09:31:32 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@1c25b8a7{/,null,AVAILABLE,@Spark}
18/04/22 09:31:32 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@750fe12e{/api,null,AVAILABLE,@Spark}
18/04/22 09:31:32 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@1e11bc55{/jobs/job/kill,null,AVAILABLE,@Spark}
18/04/22 09:31:32 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@70e0accd{/stages/stage/kill,null,AVAILABLE,@Spark}
18/04/22 09:31:32 INFO ui.SparkUI: Bound SparkUI to 0.0.0.0, and started at http://192.168.201.128:4040
18/04/22 09:31:32 INFO spark.SparkContext: Added JAR file:/home/jackherrick/Documents/MatricProduct.jar at spark://192.168.201.128:37737/jars/MatricProduct.jar with timestamp 1524414692842
18/04/22 09:31:33 INFO executor.Executor: Starting executor ID driver on host localhost
18/04/22 09:31:33 INFO util.Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 33397.
18/04/22 09:31:33 INFO netty.NettyBlockTransferService: Server created on 192.168.201.128:33397
18/04/22 09:31:33 INFO storage.BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
18/04/22 09:31:33 INFO storage.BlockManagerMaster: Registering BlockManager BlockManagerId(driver, 192.168.201.128, 33397, None)
18/04/22 09:31:33 INFO storage.BlockManagerMasterEndpoint: Registering block manager 192.168.201.128:33397 with 366.3 MB RAM, BlockManagerId(driver, 192.168.201.128, 33397, None)
18/04/22 09:31:33 INFO storage.BlockManagerMaster: Registered BlockManager BlockManagerId(driver, 192.168.201.128, 33397, None)
18/04/22 09:31:33 INFO storage.BlockManager: Initialized BlockManager: BlockManagerId(driver, 192.168.201.128, 33397, None)
18/04/22 09:31:33 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@2234078{/metrics/json,null,AVAILABLE,@Spark}
18/04/22 09:31:35 INFO memory.MemoryStore: Block broadcast_0 stored as values in memory (estimated size 240.0 KB, free 366.1 MB)
18/04/22 09:31:36 INFO memory.MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 23.3 KB, free 366.0 MB)
18/04/22 09:31:36 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on 192.168.201.128:33397 (size: 23.3 KB, free: 366.3 MB)
18/04/22 09:31:36 INFO spark.SparkContext: Created broadcast 0 from textFile at MatricProduct.scala:17
18/04/22 09:31:37 INFO mapred.FileInputFormat: Total input paths to process : 1
18/04/22 09:31:38 INFO spark.SparkContext: Starting job: collect at MatricProduct.scala:36
18/04/22 09:31:38 INFO scheduler.DAGScheduler: Registering RDD 5 (map at MatricProduct.scala:26)
18/04/22 09:31:38 INFO scheduler.DAGScheduler: Registering RDD 4 (map at MatricProduct.scala:22)
18/04/22 09:31:38 INFO scheduler.DAGScheduler: Registering RDD 10 (map at MatricProduct.scala:31)
18/04/22 09:31:38 INFO scheduler.DAGScheduler: Got job 0 (collect at MatricProduct.scala:36) with 2 output partitions
18/04/22 09:31:38 INFO scheduler.DAGScheduler: Final stage: ResultStage 3 (collect at MatricProduct.scala:36)
18/04/22 09:31:38 INFO scheduler.DAGScheduler: Parents of final stage: List(ShuffleMapStage 2)
18/04/22 09:31:38 INFO scheduler.DAGScheduler: Missing parents: List(ShuffleMapStage 2)
18/04/22 09:31:38 INFO scheduler.DAGScheduler: Submitting ShuffleMapStage 0 (MapPartitionsRDD[5] at map at MatricProduct.scala:26), which has no missing parents
18/04/22 09:31:38 INFO memory.MemoryStore: Block broadcast_1 stored as values in memory (estimated size 4.2 KB, free 366.0 MB)
18/04/22 09:31:38 INFO memory.MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 2.4 KB, free 366.0 MB)
18/04/22 09:31:38 INFO storage.BlockManagerInfo: Added broadcast_1_piece0 in memory on 192.168.201.128:33397 (size: 2.4 KB, free: 366.3 MB)
18/04/22 09:31:38 INFO spark.SparkContext: Created broadcast 1 from broadcast at DAGScheduler.scala:1006
18/04/22 09:31:39 INFO scheduler.DAGScheduler: Submitting 2 missing tasks from ShuffleMapStage 0 (MapPartitionsRDD[5] at map at MatricProduct.scala:26) (first 15 tasks are for partitions Vector(0, 1))
18/04/22 09:31:39 INFO scheduler.TaskSchedulerImpl: Adding task set 0.0 with 2 tasks
18/04/22 09:31:39 INFO scheduler.DAGScheduler: Submitting ShuffleMapStage 1 (MapPartitionsRDD[4] at map at MatricProduct.scala:22), which has no missing parents
18/04/22 09:31:39 INFO memory.MemoryStore: Block broadcast_2 stored as values in memory (estimated size 4.2 KB, free 366.0 MB)
18/04/22 09:31:39 INFO memory.MemoryStore: Block broadcast_2_piece0 stored as bytes in memory (estimated size 2.4 KB, free 366.0 MB)
18/04/22 09:31:39 INFO storage.BlockManagerInfo: Added broadcast_2_piece0 in memory on 192.168.201.128:33397 (size: 2.4 KB, free: 366.3 MB)
18/04/22 09:31:39 INFO spark.SparkContext: Created broadcast 2 from broadcast at DAGScheduler.scala:1006
18/04/22 09:31:39 INFO scheduler.DAGScheduler: Submitting 2 missing tasks from ShuffleMapStage 1 (MapPartitionsRDD[4] at map at MatricProduct.scala:22) (first 15 tasks are for partitions Vector(0, 1))
18/04/22 09:31:39 INFO scheduler.TaskSchedulerImpl: Adding task set 1.0 with 2 tasks
18/04/22 09:31:39 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, localhost, executor driver, partition 0, ANY, 4850 bytes)
18/04/22 09:31:39 INFO scheduler.TaskSetManager: Starting task 1.0 in stage 0.0 (TID 1, localhost, executor driver, partition 1, ANY, 4850 bytes)
18/04/22 09:31:39 INFO executor.Executor: Running task 0.0 in stage 0.0 (TID 0)
18/04/22 09:31:39 INFO executor.Executor: Running task 1.0 in stage 0.0 (TID 1)
18/04/22 09:31:39 INFO executor.Executor: Fetching spark://192.168.201.128:37737/jars/MatricProduct.jar with timestamp 1524414692842
18/04/22 09:31:39 INFO client.TransportClientFactory: Successfully created connection to /192.168.201.128:37737 after 80 ms (0 ms spent in bootstraps)
18/04/22 09:31:39 INFO util.Utils: Fetching spark://192.168.201.128:37737/jars/MatricProduct.jar to /tmp/spark-e0eee34b-3247-48aa-93f2-0c723c27bc57/userFiles-b5d58476-496d-46a8-afc2-427f3d4bf09d/fetchFileTemp5283151710875209144.tmp
18/04/22 09:31:39 INFO executor.Executor: Adding file:/tmp/spark-e0eee34b-3247-48aa-93f2-0c723c27bc57/userFiles-b5d58476-496d-46a8-afc2-427f3d4bf09d/MatricProduct.jar to class loader
18/04/22 09:31:39 INFO rdd.HadoopRDD: Input split: hdfs://master:9000/matric_spark/matrix.txt:16+16
18/04/22 09:31:39 INFO rdd.HadoopRDD: Input split: hdfs://master:9000/matric_spark/matrix.txt:0+16
18/04/22 09:31:40 INFO executor.Executor: Finished task 0.0 in stage 0.0 (TID 0). 1028 bytes result sent to driver
18/04/22 09:31:40 INFO executor.Executor: Finished task 1.0 in stage 0.0 (TID 1). 1028 bytes result sent to driver
18/04/22 09:31:40 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 1.0 (TID 2, localhost, executor driver, partition 0, ANY, 4850 bytes)
18/04/22 09:31:40 INFO executor.Executor: Running task 0.0 in stage 1.0 (TID 2)
18/04/22 09:31:40 INFO scheduler.TaskSetManager: Starting task 1.0 in stage 1.0 (TID 3, localhost, executor driver, partition 1, ANY, 4850 bytes)
18/04/22 09:31:40 INFO executor.Executor: Running task 1.0 in stage 1.0 (TID 3)
18/04/22 09:31:40 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 1558 ms on localhost (executor driver) (1/2)
18/04/22 09:31:40 INFO rdd.HadoopRDD: Input split: hdfs://master:9000/matric_spark/matrix.txt:0+16
18/04/22 09:31:40 INFO scheduler.TaskSetManager: Finished task 1.0 in stage 0.0 (TID 1) in 1579 ms on localhost (executor driver) (2/2)
18/04/22 09:31:40 INFO rdd.HadoopRDD: Input split: hdfs://master:9000/matric_spark/matrix.txt:16+16
18/04/22 09:31:40 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool 
18/04/22 09:31:40 INFO scheduler.DAGScheduler: ShuffleMapStage 0 (map at MatricProduct.scala:26) finished in 1.816 s
18/04/22 09:31:40 INFO scheduler.DAGScheduler: looking for newly runnable stages
18/04/22 09:31:40 INFO executor.Executor: Finished task 0.0 in stage 1.0 (TID 2). 985 bytes result sent to driver
18/04/22 09:31:40 INFO executor.Executor: Finished task 1.0 in stage 1.0 (TID 3). 856 bytes result sent to driver
18/04/22 09:31:40 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 1.0 (TID 2) in 412 ms on localhost (executor driver) (1/2)
18/04/22 09:31:41 INFO scheduler.DAGScheduler: running: Set(ShuffleMapStage 1)
18/04/22 09:31:41 INFO scheduler.DAGScheduler: waiting: Set(ShuffleMapStage 2, ResultStage 3)
18/04/22 09:31:41 INFO scheduler.DAGScheduler: failed: Set()
18/04/22 09:31:41 INFO scheduler.TaskSetManager: Finished task 1.0 in stage 1.0 (TID 3) in 414 ms on localhost (executor driver) (2/2)
18/04/22 09:31:41 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 1.0, whose tasks have all completed, from pool 
18/04/22 09:31:41 INFO scheduler.DAGScheduler: ShuffleMapStage 1 (map at MatricProduct.scala:22) finished in 1.811 s
18/04/22 09:31:41 INFO scheduler.DAGScheduler: looking for newly runnable stages
18/04/22 09:31:41 INFO scheduler.DAGScheduler: running: Set()
18/04/22 09:31:41 INFO scheduler.DAGScheduler: waiting: Set(ShuffleMapStage 2, ResultStage 3)
18/04/22 09:31:41 INFO scheduler.DAGScheduler: failed: Set()
18/04/22 09:31:41 INFO scheduler.DAGScheduler: Submitting ShuffleMapStage 2 (MapPartitionsRDD[10] at map at MatricProduct.scala:31), which has no missing parents
18/04/22 09:31:41 INFO memory.MemoryStore: Block broadcast_3 stored as values in memory (estimated size 4.1 KB, free 366.0 MB)
18/04/22 09:31:41 INFO memory.MemoryStore: Block broadcast_3_piece0 stored as bytes in memory (estimated size 2.3 KB, free 366.0 MB)
18/04/22 09:31:41 INFO storage.BlockManagerInfo: Added broadcast_3_piece0 in memory on 192.168.201.128:33397 (size: 2.3 KB, free: 366.3 MB)
18/04/22 09:31:41 INFO spark.SparkContext: Created broadcast 3 from broadcast at DAGScheduler.scala:1006
18/04/22 09:31:41 INFO scheduler.DAGScheduler: Submitting 2 missing tasks from ShuffleMapStage 2 (MapPartitionsRDD[10] at map at MatricProduct.scala:31) (first 15 tasks are for partitions Vector(0, 1))
18/04/22 09:31:41 INFO scheduler.TaskSchedulerImpl: Adding task set 2.0 with 2 tasks
18/04/22 09:31:41 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 2.0 (TID 4, localhost, executor driver, partition 0, PROCESS_LOCAL, 4673 bytes)
18/04/22 09:31:41 INFO scheduler.TaskSetManager: Starting task 1.0 in stage 2.0 (TID 5, localhost, executor driver, partition 1, PROCESS_LOCAL, 4673 bytes)
18/04/22 09:31:41 INFO executor.Executor: Running task 0.0 in stage 2.0 (TID 4)
18/04/22 09:31:41 INFO executor.Executor: Running task 1.0 in stage 2.0 (TID 5)
18/04/22 09:31:41 INFO storage.ShuffleBlockFetcherIterator: Getting 1 non-empty blocks out of 2 blocks
18/04/22 09:31:41 INFO storage.ShuffleBlockFetcherIterator: Getting 1 non-empty blocks out of 2 blocks
18/04/22 09:31:41 INFO storage.ShuffleBlockFetcherIterator: Started 0 remote fetches in 17 ms
18/04/22 09:31:41 INFO storage.ShuffleBlockFetcherIterator: Started 0 remote fetches in 17 ms
18/04/22 09:31:41 INFO storage.ShuffleBlockFetcherIterator: Getting 1 non-empty blocks out of 2 blocks
18/04/22 09:31:41 INFO storage.ShuffleBlockFetcherIterator: Started 0 remote fetches in 1 ms
18/04/22 09:31:41 INFO storage.ShuffleBlockFetcherIterator: Getting 1 non-empty blocks out of 2 blocks
18/04/22 09:31:41 INFO storage.ShuffleBlockFetcherIterator: Started 0 remote fetches in 11 ms
18/04/22 09:31:41 INFO executor.Executor: Finished task 0.0 in stage 2.0 (TID 4). 1286 bytes result sent to driver
18/04/22 09:31:41 INFO executor.Executor: Finished task 1.0 in stage 2.0 (TID 5). 1329 bytes result sent to driver
18/04/22 09:31:41 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 2.0 (TID 4) in 339 ms on localhost (executor driver) (1/2)
18/04/22 09:31:41 INFO scheduler.TaskSetManager: Finished task 1.0 in stage 2.0 (TID 5) in 337 ms on localhost (executor driver) (2/2)
18/04/22 09:31:41 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 2.0, whose tasks have all completed, from pool 
18/04/22 09:31:41 INFO scheduler.DAGScheduler: ShuffleMapStage 2 (map at MatricProduct.scala:31) finished in 0.319 s
18/04/22 09:31:41 INFO scheduler.DAGScheduler: looking for newly runnable stages
18/04/22 09:31:41 INFO scheduler.DAGScheduler: running: Set()
18/04/22 09:31:41 INFO scheduler.DAGScheduler: waiting: Set(ResultStage 3)
18/04/22 09:31:41 INFO scheduler.DAGScheduler: failed: Set()
18/04/22 09:31:41 INFO scheduler.DAGScheduler: Submitting ResultStage 3 (ShuffledRDD[11] at reduceByKey at MatricProduct.scala:35), which has no missing parents
18/04/22 09:31:41 INFO memory.MemoryStore: Block broadcast_4 stored as values in memory (estimated size 3.2 KB, free 366.0 MB)
18/04/22 09:31:41 INFO memory.MemoryStore: Block broadcast_4_piece0 stored as bytes in memory (estimated size 1970.0 B, free 366.0 MB)
18/04/22 09:31:41 INFO storage.BlockManagerInfo: Added broadcast_4_piece0 in memory on 192.168.201.128:33397 (size: 1970.0 B, free: 366.3 MB)
18/04/22 09:31:41 INFO spark.SparkContext: Created broadcast 4 from broadcast at DAGScheduler.scala:1006
18/04/22 09:31:41 INFO scheduler.DAGScheduler: Submitting 2 missing tasks from ResultStage 3 (ShuffledRDD[11] at reduceByKey at MatricProduct.scala:35) (first 15 tasks are for partitions Vector(0, 1))
18/04/22 09:31:41 INFO scheduler.TaskSchedulerImpl: Adding task set 3.0 with 2 tasks
18/04/22 09:31:41 INFO scheduler.TaskSetManager: Starting task 1.0 in stage 3.0 (TID 6, localhost, executor driver, partition 1, PROCESS_LOCAL, 4621 bytes)
18/04/22 09:31:41 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 3.0 (TID 7, localhost, executor driver, partition 0, ANY, 4621 bytes)
18/04/22 09:31:41 INFO executor.Executor: Running task 1.0 in stage 3.0 (TID 6)
18/04/22 09:31:41 INFO executor.Executor: Running task 0.0 in stage 3.0 (TID 7)
18/04/22 09:31:41 INFO storage.ShuffleBlockFetcherIterator: Getting 2 non-empty blocks out of 2 blocks
18/04/22 09:31:41 INFO storage.ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
18/04/22 09:31:41 INFO storage.ShuffleBlockFetcherIterator: Getting 0 non-empty blocks out of 2 blocks
18/04/22 09:31:41 INFO storage.ShuffleBlockFetcherIterator: Started 0 remote fetches in 1 ms
18/04/22 09:31:41 INFO executor.Executor: Finished task 1.0 in stage 3.0 (TID 6). 1091 bytes result sent to driver
18/04/22 09:31:41 INFO executor.Executor: Finished task 0.0 in stage 3.0 (TID 7). 1285 bytes result sent to driver
18/04/22 09:31:41 INFO scheduler.TaskSetManager: Finished task 1.0 in stage 3.0 (TID 6) in 163 ms on localhost (executor driver) (1/2)
18/04/22 09:31:41 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 3.0 (TID 7) in 150 ms on localhost (executor driver) (2/2)
18/04/22 09:31:41 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 3.0, whose tasks have all completed, from pool 
18/04/22 09:31:41 INFO scheduler.DAGScheduler: ResultStage 3 (collect at MatricProduct.scala:36) finished in 0.175 s
18/04/22 09:31:41 INFO scheduler.DAGScheduler: Job 0 finished: collect at MatricProduct.scala:36, took 3.524449 s
1 1 11.0
18/04/22 09:31:41 INFO server.AbstractConnector: Stopped Spark@1bc01d90{HTTP/1.1,[http/1.1]}{0.0.0.0:4040}
18/04/22 09:31:41 INFO ui.SparkUI: Stopped Spark web UI at http://192.168.201.128:4040
18/04/22 09:31:41 INFO spark.MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
18/04/22 09:31:41 INFO memory.MemoryStore: MemoryStore cleared
18/04/22 09:31:41 INFO storage.BlockManager: BlockManager stopped
18/04/22 09:31:41 INFO storage.BlockManagerMaster: BlockManagerMaster stopped
18/04/22 09:31:41 INFO scheduler.OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
18/04/22 09:31:41 INFO spark.SparkContext: Successfully stopped SparkContext
18/04/22 09:31:41 INFO util.ShutdownHookManager: Shutdown hook called
18/04/22 09:31:41 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-e0eee34b-3247-48aa-93f2-0c723c27bc57

各省份的学生平均成绩

代码:

object score {
  def main(args: Array[String]) {
    if (args.length < 1) {
      System.err.println("Usage: <file>")
      System.exit(1)
    }

    val conf = new SparkConf()
    val sc = new SparkContext(conf)
    val city = sc.textFile(args(0))
    val score = sc.textFile(args(1))

    val cityItems = city.filter(_.trim().length>0).map(line => {
      val lineSplit = line.split(",")
      (lineSplit(0), (lineSplit(1)))
    })
    val scoreItems = score.filter(_.trim().length>0).map(line=>{
      val lineSplit = line.split(",")
      (lineSplit(0), (lineSplit(1)))
    })

    val newItems = cityItems.join(scoreItems).values.map(v=>{
      (v._1, v._2.toInt)
    })


    val res =  newItems.groupByKey().map(x => {  
      var num = 0.0  
      var sum = 0   
      for(i <- x._2){  
        sum = sum +i
        num = num +1  
      }  
      val avg = sum/num   
      val format = f"$avg%1.2f".toDouble  
      (x._1,format)  
    }).collect.foreach(x => println(x._1+"\t"+x._2)) 

    sc.stop()
  }
}

spark-submit部署应用

spark-submit --name score --class edu.test.score --executor-memory 512M --total-executor-cores 1 /home/jackherrick/Documents/score.jar hdfs:/score/placeTable hdfs:/score/scoreTable 

这里写图片描述
日志:

18/04/22 09:05:18 INFO spark.SparkContext: Running Spark version 2.2.0
18/04/22 09:05:18 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
18/04/22 09:05:18 INFO spark.SparkContext: Submitted application: score
18/04/22 09:05:18 INFO spark.SecurityManager: Changing view acls to: jackherrick
18/04/22 09:05:18 INFO spark.SecurityManager: Changing modify acls to: jackherrick
18/04/22 09:05:18 INFO spark.SecurityManager: Changing view acls groups to: 
18/04/22 09:05:18 INFO spark.SecurityManager: Changing modify acls groups to: 
18/04/22 09:05:18 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(jackherrick); groups with view permissions: Set(); users  with modify permissions: Set(jackherrick); groups with modify permissions: Set()
18/04/22 09:05:19 INFO util.Utils: Successfully started service 'sparkDriver' on port 38199.
18/04/22 09:05:19 INFO spark.SparkEnv: Registering MapOutputTracker
18/04/22 09:05:19 INFO spark.SparkEnv: Registering BlockManagerMaster
18/04/22 09:05:19 INFO storage.BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
18/04/22 09:05:19 INFO storage.BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
18/04/22 09:05:19 INFO storage.DiskBlockManager: Created local directory at /tmp/blockmgr-042cf3e1-3a16-4175-85b3-b9cdfe4d41e1
18/04/22 09:05:19 INFO memory.MemoryStore: MemoryStore started with capacity 366.3 MB
18/04/22 09:05:19 INFO spark.SparkEnv: Registering OutputCommitCoordinator
18/04/22 09:05:19 INFO util.log: Logging initialized @3343ms
18/04/22 09:05:19 INFO server.Server: jetty-9.3.z-SNAPSHOT
18/04/22 09:05:20 INFO server.Server: Started @3514ms
18/04/22 09:05:20 INFO server.AbstractConnector: Started ServerConnector@aa6c33d{HTTP/1.1,[http/1.1]}{0.0.0.0:4040}
18/04/22 09:05:20 INFO util.Utils: Successfully started service 'SparkUI' on port 4040.
18/04/22 09:05:20 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@287f94b1{/jobs,null,AVAILABLE,@Spark}
18/04/22 09:05:20 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@2ad3a1bb{/jobs/json,null,AVAILABLE,@Spark}
18/04/22 09:05:20 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@324c64cd{/jobs/job,null,AVAILABLE,@Spark}
18/04/22 09:05:20 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@5bd73d1a{/jobs/job/json,null,AVAILABLE,@Spark}
18/04/22 09:05:20 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@2555fff0{/stages,null,AVAILABLE,@Spark}
18/04/22 09:05:20 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@120f38e6{/stages/json,null,AVAILABLE,@Spark}
18/04/22 09:05:20 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@702ed190{/stages/stage,null,AVAILABLE,@Spark}
18/04/22 09:05:20 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@70e29e14{/stages/stage/json,null,AVAILABLE,@Spark}
18/04/22 09:05:20 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@5a4bef8{/stages/pool,null,AVAILABLE,@Spark}
18/04/22 09:05:20 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@2449cff7{/stages/pool/json,null,AVAILABLE,@Spark}
18/04/22 09:05:20 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@62da83ed{/storage,null,AVAILABLE,@Spark}
18/04/22 09:05:20 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@37d80fe7{/storage/json,null,AVAILABLE,@Spark}
18/04/22 09:05:20 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@e3cee7b{/storage/rdd,null,AVAILABLE,@Spark}
18/04/22 09:05:20 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@6b9267b{/storage/rdd/json,null,AVAILABLE,@Spark}
18/04/22 09:05:20 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@29ad44e3{/environment,null,AVAILABLE,@Spark}
18/04/22 09:05:20 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@5af9926a{/environment/json,null,AVAILABLE,@Spark}
18/04/22 09:05:20 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@fac80{/executors,null,AVAILABLE,@Spark}
18/04/22 09:05:20 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@649f2009{/executors/json,null,AVAILABLE,@Spark}
18/04/22 09:05:20 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@69adf72c{/executors/threadDump,null,AVAILABLE,@Spark}
18/04/22 09:05:20 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@1a15b789{/executors/threadDump/json,null,AVAILABLE,@Spark}
18/04/22 09:05:20 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@51650883{/static,null,AVAILABLE,@Spark}
18/04/22 09:05:20 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@1c25b8a7{/,null,AVAILABLE,@Spark}
18/04/22 09:05:20 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@750fe12e{/api,null,AVAILABLE,@Spark}
18/04/22 09:05:20 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@1e11bc55{/jobs/job/kill,null,AVAILABLE,@Spark}
18/04/22 09:05:20 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@70e0accd{/stages/stage/kill,null,AVAILABLE,@Spark}
18/04/22 09:05:20 INFO ui.SparkUI: Bound SparkUI to 0.0.0.0, and started at http://192.168.201.128:4040
18/04/22 09:05:20 INFO spark.SparkContext: Added JAR file:/home/jackherrick/Documents/score.jar at spark://192.168.201.128:38199/jars/score.jar with timestamp 1524413120432
18/04/22 09:05:20 INFO executor.Executor: Starting executor ID driver on host localhost
18/04/22 09:05:20 INFO util.Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 45565.
18/04/22 09:05:20 INFO netty.NettyBlockTransferService: Server created on 192.168.201.128:45565
18/04/22 09:05:20 INFO storage.BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
18/04/22 09:05:20 INFO storage.BlockManagerMaster: Registering BlockManager BlockManagerId(driver, 192.168.201.128, 45565, None)
18/04/22 09:05:20 INFO storage.BlockManagerMasterEndpoint: Registering block manager 192.168.201.128:45565 with 366.3 MB RAM, BlockManagerId(driver, 192.168.201.128, 45565, None)
18/04/22 09:05:20 INFO storage.BlockManagerMaster: Registered BlockManager BlockManagerId(driver, 192.168.201.128, 45565, None)
18/04/22 09:05:20 INFO storage.BlockManager: Initialized BlockManager: BlockManagerId(driver, 192.168.201.128, 45565, None)
18/04/22 09:05:21 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@66908383{/metrics/json,null,AVAILABLE,@Spark}
18/04/22 09:05:22 INFO memory.MemoryStore: Block broadcast_0 stored as values in memory (estimated size 240.0 KB, free 366.1 MB)
18/04/22 09:05:22 INFO memory.MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 23.3 KB, free 366.0 MB)
18/04/22 09:05:22 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on 192.168.201.128:45565 (size: 23.3 KB, free: 366.3 MB)
18/04/22 09:05:22 INFO spark.SparkContext: Created broadcast 0 from textFile at score.scala:16
18/04/22 09:05:23 INFO memory.MemoryStore: Block broadcast_1 stored as values in memory (estimated size 240.0 KB, free 365.8 MB)
18/04/22 09:05:23 INFO memory.MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 23.3 KB, free 365.8 MB)
18/04/22 09:05:23 INFO storage.BlockManagerInfo: Added broadcast_1_piece0 in memory on 192.168.201.128:45565 (size: 23.3 KB, free: 366.3 MB)
18/04/22 09:05:23 INFO spark.SparkContext: Created broadcast 1 from textFile at score.scala:17
18/04/22 09:05:24 INFO mapred.FileInputFormat: Total input paths to process : 1
18/04/22 09:05:24 INFO mapred.FileInputFormat: Total input paths to process : 1
18/04/22 09:05:24 INFO spark.SparkContext: Starting job: collect at score.scala:43
18/04/22 09:05:24 INFO scheduler.DAGScheduler: Registering RDD 5 (map at score.scala:19)
18/04/22 09:05:24 INFO scheduler.DAGScheduler: Registering RDD 7 (map at score.scala:23)
18/04/22 09:05:24 INFO scheduler.DAGScheduler: Registering RDD 12 (map at score.scala:28)
18/04/22 09:05:24 INFO scheduler.DAGScheduler: Got job 0 (collect at score.scala:43) with 2 output partitions
18/04/22 09:05:24 INFO scheduler.DAGScheduler: Final stage: ResultStage 3 (collect at score.scala:43)
18/04/22 09:05:24 INFO scheduler.DAGScheduler: Parents of final stage: List(ShuffleMapStage 2)
18/04/22 09:05:24 INFO scheduler.DAGScheduler: Missing parents: List(ShuffleMapStage 2)
18/04/22 09:05:25 INFO scheduler.DAGScheduler: Submitting ShuffleMapStage 0 (MapPartitionsRDD[5] at map at score.scala:19), which has no missing parents
18/04/22 09:05:25 INFO memory.MemoryStore: Block broadcast_2 stored as values in memory (estimated size 4.2 KB, free 365.8 MB)
18/04/22 09:05:25 INFO memory.MemoryStore: Block broadcast_2_piece0 stored as bytes in memory (estimated size 2.4 KB, free 365.8 MB)
18/04/22 09:05:25 INFO storage.BlockManagerInfo: Added broadcast_2_piece0 in memory on 192.168.201.128:45565 (size: 2.4 KB, free: 366.3 MB)
18/04/22 09:05:25 INFO spark.SparkContext: Created broadcast 2 from broadcast at DAGScheduler.scala:1006
18/04/22 09:05:25 INFO scheduler.DAGScheduler: Submitting 2 missing tasks from ShuffleMapStage 0 (MapPartitionsRDD[5] at map at score.scala:19) (first 15 tasks are for partitions Vector(0, 1))
18/04/22 09:05:25 INFO scheduler.TaskSchedulerImpl: Adding task set 0.0 with 2 tasks
18/04/22 09:05:25 INFO scheduler.DAGScheduler: Submitting ShuffleMapStage 1 (MapPartitionsRDD[7] at map at score.scala:23), which has no missing parents
18/04/22 09:05:25 INFO memory.MemoryStore: Block broadcast_3 stored as values in memory (estimated size 4.2 KB, free 365.8 MB)
18/04/22 09:05:25 INFO memory.MemoryStore: Block broadcast_3_piece0 stored as bytes in memory (estimated size 2.4 KB, free 365.8 MB)
18/04/22 09:05:25 INFO storage.BlockManagerInfo: Added broadcast_3_piece0 in memory on 192.168.201.128:45565 (size: 2.4 KB, free: 366.2 MB)
18/04/22 09:05:25 INFO spark.SparkContext: Created broadcast 3 from broadcast at DAGScheduler.scala:1006
18/04/22 09:05:25 INFO scheduler.DAGScheduler: Submitting 2 missing tasks from ShuffleMapStage 1 (MapPartitionsRDD[7] at map at score.scala:23) (first 15 tasks are for partitions Vector(0, 1))
18/04/22 09:05:25 INFO scheduler.TaskSchedulerImpl: Adding task set 1.0 with 2 tasks
18/04/22 09:05:25 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, localhost, executor driver, partition 0, ANY, 4843 bytes)
18/04/22 09:05:25 INFO scheduler.TaskSetManager: Starting task 1.0 in stage 0.0 (TID 1, localhost, executor driver, partition 1, ANY, 4843 bytes)
18/04/22 09:05:25 INFO executor.Executor: Running task 0.0 in stage 0.0 (TID 0)
18/04/22 09:05:25 INFO executor.Executor: Running task 1.0 in stage 0.0 (TID 1)
18/04/22 09:05:25 INFO executor.Executor: Fetching spark://192.168.201.128:38199/jars/score.jar with timestamp 1524413120432
18/04/22 09:05:25 INFO client.TransportClientFactory: Successfully created connection to /192.168.201.128:38199 after 40 ms (0 ms spent in bootstraps)
18/04/22 09:05:25 INFO util.Utils: Fetching spark://192.168.201.128:38199/jars/score.jar to /tmp/spark-74b54e80-1e92-4b4b-94a0-f42848697bfc/userFiles-4b57cedf-2112-4d63-85a5-c9561262a9eb/fetchFileTemp8443601688809021952.tmp
18/04/22 09:05:25 INFO executor.Executor: Adding file:/tmp/spark-74b54e80-1e92-4b4b-94a0-f42848697bfc/userFiles-4b57cedf-2112-4d63-85a5-c9561262a9eb/score.jar to class loader
18/04/22 09:05:25 INFO rdd.HadoopRDD: Input split: hdfs://master:9000/score/placeTable:0+52
18/04/22 09:05:25 INFO rdd.HadoopRDD: Input split: hdfs://master:9000/score/placeTable:52+52
18/04/22 09:05:26 INFO executor.Executor: Finished task 0.0 in stage 0.0 (TID 0). 1028 bytes result sent to driver
18/04/22 09:05:26 INFO executor.Executor: Finished task 1.0 in stage 0.0 (TID 1). 985 bytes result sent to driver
18/04/22 09:05:26 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 1.0 (TID 2, localhost, executor driver, partition 0, ANY, 4843 bytes)
18/04/22 09:05:26 INFO executor.Executor: Running task 0.0 in stage 1.0 (TID 2)
18/04/22 09:05:26 INFO scheduler.TaskSetManager: Starting task 1.0 in stage 1.0 (TID 3, localhost, executor driver, partition 1, ANY, 4843 bytes)
18/04/22 09:05:26 INFO executor.Executor: Running task 1.0 in stage 1.0 (TID 3)
18/04/22 09:05:26 INFO scheduler.TaskSetManager: Finished task 1.0 in stage 0.0 (TID 1) in 814 ms on localhost (executor driver) (1/2)
18/04/22 09:05:26 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 909 ms on localhost (executor driver) (2/2)
18/04/22 09:05:26 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool 
18/04/22 09:05:26 INFO rdd.HadoopRDD: Input split: hdfs://master:9000/score/scoreTable:32+33
18/04/22 09:05:26 INFO scheduler.DAGScheduler: ShuffleMapStage 0 (map at score.scala:19) finished in 1.052 s
18/04/22 09:05:26 INFO scheduler.DAGScheduler: looking for newly runnable stages
18/04/22 09:05:26 INFO scheduler.DAGScheduler: running: Set(ShuffleMapStage 1)
18/04/22 09:05:26 INFO scheduler.DAGScheduler: waiting: Set(ShuffleMapStage 2, ResultStage 3)
18/04/22 09:05:26 INFO scheduler.DAGScheduler: failed: Set()
18/04/22 09:05:26 INFO rdd.HadoopRDD: Input split: hdfs://master:9000/score/scoreTable:0+32
18/04/22 09:05:26 INFO executor.Executor: Finished task 1.0 in stage 1.0 (TID 3). 1071 bytes result sent to driver
18/04/22 09:05:26 INFO scheduler.TaskSetManager: Finished task 1.0 in stage 1.0 (TID 3) in 383 ms on localhost (executor driver) (1/2)
18/04/22 09:05:26 INFO executor.Executor: Finished task 0.0 in stage 1.0 (TID 2). 1028 bytes result sent to driver
18/04/22 09:05:26 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 1.0 (TID 2) in 427 ms on localhost (executor driver) (2/2)
18/04/22 09:05:26 INFO scheduler.DAGScheduler: ShuffleMapStage 1 (map at score.scala:23) finished in 1.210 s
18/04/22 09:05:26 INFO scheduler.DAGScheduler: looking for newly runnable stages
18/04/22 09:05:26 INFO scheduler.DAGScheduler: running: Set()
18/04/22 09:05:26 INFO scheduler.DAGScheduler: waiting: Set(ShuffleMapStage 2, ResultStage 3)
18/04/22 09:05:26 INFO scheduler.DAGScheduler: failed: Set()
18/04/22 09:05:26 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 1.0, whose tasks have all completed, from pool 
18/04/22 09:05:26 INFO scheduler.DAGScheduler: Submitting ShuffleMapStage 2 (MapPartitionsRDD[12] at map at score.scala:28), which has no missing parents
18/04/22 09:05:26 INFO memory.MemoryStore: Block broadcast_4 stored as values in memory (estimated size 4.9 KB, free 365.8 MB)
18/04/22 09:05:26 INFO memory.MemoryStore: Block broadcast_4_piece0 stored as bytes in memory (estimated size 2.6 KB, free 365.8 MB)
18/04/22 09:05:26 INFO storage.BlockManagerInfo: Removed broadcast_2_piece0 on 192.168.201.128:45565 in memory (size: 2.4 KB, free: 366.3 MB)
18/04/22 09:05:26 INFO storage.BlockManagerInfo: Added broadcast_4_piece0 in memory on 192.168.201.128:45565 (size: 2.6 KB, free: 366.2 MB)
18/04/22 09:05:26 INFO spark.SparkContext: Created broadcast 4 from broadcast at DAGScheduler.scala:1006
18/04/22 09:05:26 INFO scheduler.DAGScheduler: Submitting 2 missing tasks from ShuffleMapStage 2 (MapPartitionsRDD[12] at map at score.scala:28) (first 15 tasks are for partitions Vector(0, 1))
18/04/22 09:05:26 INFO scheduler.TaskSchedulerImpl: Adding task set 2.0 with 2 tasks
18/04/22 09:05:26 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 2.0 (TID 4, localhost, executor driver, partition 0, PROCESS_LOCAL, 4673 bytes)
18/04/22 09:05:26 INFO scheduler.TaskSetManager: Starting task 1.0 in stage 2.0 (TID 5, localhost, executor driver, partition 1, PROCESS_LOCAL, 4673 bytes)
18/04/22 09:05:26 INFO executor.Executor: Running task 0.0 in stage 2.0 (TID 4)
18/04/22 09:05:26 INFO executor.Executor: Running task 1.0 in stage 2.0 (TID 5)
18/04/22 09:05:26 INFO storage.ShuffleBlockFetcherIterator: Getting 2 non-empty blocks out of 2 blocks
18/04/22 09:05:26 INFO storage.ShuffleBlockFetcherIterator: Getting 2 non-empty blocks out of 2 blocks
18/04/22 09:05:26 INFO storage.ShuffleBlockFetcherIterator: Started 0 remote fetches in 2 ms
18/04/22 09:05:26 INFO storage.ShuffleBlockFetcherIterator: Started 0 remote fetches in 14 ms
18/04/22 09:05:26 INFO storage.ShuffleBlockFetcherIterator: Getting 2 non-empty blocks out of 2 blocks
18/04/22 09:05:26 INFO storage.ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
18/04/22 09:05:26 INFO storage.ShuffleBlockFetcherIterator: Getting 2 non-empty blocks out of 2 blocks
18/04/22 09:05:26 INFO storage.ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
18/04/22 09:05:27 INFO executor.Executor: Finished task 1.0 in stage 2.0 (TID 5). 1286 bytes result sent to driver
18/04/22 09:05:27 INFO scheduler.TaskSetManager: Finished task 1.0 in stage 2.0 (TID 5) in 454 ms on localhost (executor driver) (1/2)
18/04/22 09:05:27 INFO executor.Executor: Finished task 0.0 in stage 2.0 (TID 4). 1286 bytes result sent to driver
18/04/22 09:05:27 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 2.0 (TID 4) in 482 ms on localhost (executor driver) (2/2)
18/04/22 09:05:27 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 2.0, whose tasks have all completed, from pool 
18/04/22 09:05:27 INFO scheduler.DAGScheduler: ShuffleMapStage 2 (map at score.scala:28) finished in 0.488 s
18/04/22 09:05:27 INFO scheduler.DAGScheduler: looking for newly runnable stages
18/04/22 09:05:27 INFO scheduler.DAGScheduler: running: Set()
18/04/22 09:05:27 INFO scheduler.DAGScheduler: waiting: Set(ResultStage 3)
18/04/22 09:05:27 INFO scheduler.DAGScheduler: failed: Set()
18/04/22 09:05:27 INFO scheduler.DAGScheduler: Submitting ResultStage 3 (MapPartitionsRDD[14] at map at score.scala:33), which has no missing parents
18/04/22 09:05:27 INFO memory.MemoryStore: Block broadcast_5 stored as values in memory (estimated size 5.9 KB, free 365.8 MB)
18/04/22 09:05:27 INFO memory.MemoryStore: Block broadcast_5_piece0 stored as bytes in memory (estimated size 3.0 KB, free 365.8 MB)
18/04/22 09:05:27 INFO storage.BlockManagerInfo: Added broadcast_5_piece0 in memory on 192.168.201.128:45565 (size: 3.0 KB, free: 366.2 MB)
18/04/22 09:05:27 INFO spark.SparkContext: Created broadcast 5 from broadcast at DAGScheduler.scala:1006
18/04/22 09:05:27 INFO scheduler.DAGScheduler: Submitting 2 missing tasks from ResultStage 3 (MapPartitionsRDD[14] at map at score.scala:33) (first 15 tasks are for partitions Vector(0, 1))
18/04/22 09:05:27 INFO scheduler.TaskSchedulerImpl: Adding task set 3.0 with 2 tasks
18/04/22 09:05:27 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 3.0 (TID 6, localhost, executor driver, partition 0, ANY, 4621 bytes)
18/04/22 09:05:27 INFO scheduler.TaskSetManager: Starting task 1.0 in stage 3.0 (TID 7, localhost, executor driver, partition 1, ANY, 4621 bytes)
18/04/22 09:05:27 INFO executor.Executor: Running task 0.0 in stage 3.0 (TID 6)
18/04/22 09:05:27 INFO storage.ShuffleBlockFetcherIterator: Getting 1 non-empty blocks out of 2 blocks
18/04/22 09:05:27 INFO executor.Executor: Running task 1.0 in stage 3.0 (TID 7)
18/04/22 09:05:27 INFO storage.ShuffleBlockFetcherIterator: Started 0 remote fetches in 3 ms
18/04/22 09:05:27 INFO storage.ShuffleBlockFetcherIterator: Getting 2 non-empty blocks out of 2 blocks
18/04/22 09:05:27 INFO storage.ShuffleBlockFetcherIterator: Started 0 remote fetches in 14 ms
18/04/22 09:05:27 INFO executor.Executor: Finished task 0.0 in stage 3.0 (TID 6). 1246 bytes result sent to driver
18/04/22 09:05:27 INFO executor.Executor: Finished task 1.0 in stage 3.0 (TID 7). 1354 bytes result sent to driver
18/04/22 09:05:27 INFO scheduler.TaskSetManager: Finished task 1.0 in stage 3.0 (TID 7) in 128 ms on localhost (executor driver) (1/2)
18/04/22 09:05:27 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 3.0 (TID 6) in 162 ms on localhost (executor driver) (2/2)
18/04/22 09:05:27 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 3.0, whose tasks have all completed, from pool 
18/04/22 09:05:27 INFO scheduler.DAGScheduler: ResultStage 3 (collect at score.scala:43) finished in 0.168 s
18/04/22 09:05:27 INFO scheduler.DAGScheduler: Job 0 finished: collect at score.scala:43, took 2.546882 s
beijing 1.0
shagnhai    100.0
shanghai    92.5
jiangsu 59.5
18/04/22 09:05:27 INFO server.AbstractConnector: Stopped Spark@aa6c33d{HTTP/1.1,[http/1.1]}{0.0.0.0:4040}
18/04/22 09:05:27 INFO ui.SparkUI: Stopped Spark web UI at http://192.168.201.128:4040
18/04/22 09:05:27 INFO spark.MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
18/04/22 09:05:27 INFO memory.MemoryStore: MemoryStore cleared
18/04/22 09:05:27 INFO storage.BlockManager: BlockManager stopped
18/04/22 09:05:27 INFO storage.BlockManagerMaster: BlockManagerMaster stopped
18/04/22 09:05:27 INFO scheduler.OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
18/04/22 09:05:27 INFO spark.SparkContext: Successfully stopped SparkContext
18/04/22 09:05:27 INFO util.ShutdownHookManager: Shutdown hook called
18/04/22 09:05:27 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-74b54e80-1e92-4b4b-94a0-f42848697bfc

猜你喜欢

转载自blog.csdn.net/jh_zhai/article/details/80042293
今日推荐