Spark RDD calculates total score and average score


1. Put forward the task

For the grade table, calculate the total score and average score of each student

Name language math English
Zhang San 78 90 76
Li Si 95 88 98
Wang Wu 78 80 60

Second, realize the idea

Read the grade file, generate lines; define the list of 2-tuple grades; traverse the lines, fill the 2-tuple grade list; create RDD based on the 2-tuple grade list; reduce rdd to get rdd1, calculate the total score; map rdd1 to rdd2, calculate the total score and average score.

Three, preparation

1. Start the HDFS service

Execute the command: start-dfs.sh
insert image description here

2. Start the Spark service

Execute the command: start-all.sh
insert image description here

3. Create a score file locally

Create scores.txt file in /home
insert image description here

4. Upload the score file to HDFS

Create the /scores/input directory on HDFS, and upload the score file to this directory
insert image description here

Four, complete the task

1. Complete tasks in Spark Shell

(1) Read the score file and generate RDD

Excuting an order:val lines = sc.textFile("hdfs://master:9000/scores/input/scores.txt")
insert image description here

(2) Define a list of binary grades

Execute command: import scala.collection.mutable.ListBuffer
Execute command:val scores = new ListBuffer[(String, Int)]()
insert image description here

(3) Use RDD to populate the list of binary groups

lines.collect.foreach(line => {
    
                     
  val fields = line.split(" ")                  
  scores.append((fields(0), fields(1).toInt))   
  scores.append((fields(0), fields(2).toInt))   
  scores.append((fields(0), fields(3).toInt))   
})
scores.foreach(println)       

Execute the above code
insert image description here

(4) Create an RDD based on a list of binary grades

Excuting an order:val rdd = sc.makeRDD(scores);
insert image description here

(5) Reduce the rdd button to get rdd1, and calculate the total score

Excuting an order:val rdd1 = rdd.reduceByKey(_ + _)
insert image description here

(6) Map rdd1 to rdd2, calculate the total score and average score

Excuting an order:val rdd2 = rdd1.map(score => (score._1, score._2, (score._2 / 3.0).formatted("%.2f")))

insert image description here

2. Complete the task in IntelliJ IDEA

(1) Open the RDD project

SparkRDDDemo
insert image description here

(2) Create an object for calculating the average score of the total score

Create the day07 subpackage in the net.army.rdd package, and then create the CalculateSumAvg object in the subpackage
insert image description here

package net.army.rdd.day07

import org.apache.spark.{
    
    SparkConf, SparkContext}

import scala.collection.mutable.ListBuffer

/**
 * 作者:梁辰兴
 * 日期:2023/6/6
 * 功能:统计总分与平均分
 */
object CalculateSumAvg {
    
    
  def main(args: Array[String]): Unit = {
    
    
    // 创建Spark配置对象
    val conf = new SparkConf()
      .setAppName("CalculateSumAvg ") // 设置应用名称
      .setMaster("local[*]") // 设置主节点位置(本地调试)
    // 基于Spark配置对象创建Spark容器
    val sc = new SparkContext(conf)
    // 读取成绩文件,生成RDD
    val lines = sc.textFile("hdfs://master:9000/scores/input/scores.txt")
    // 定义二元组成绩列表
    val scores = new ListBuffer[(String, Int)]()
    // 利用RDD填充二元组成绩列表
    lines.collect.foreach(line => {
    
    
      val fields = line.split(" ")
      scores.append((fields(0), fields(1).toInt))
      scores.append((fields(0), fields(2).toInt))
      scores.append((fields(0), fields(3).toInt))
    })
    // 基于二元组成绩列表创建RDD
    val rdd = sc.makeRDD(scores);
    // 对rdd按键归约得到rdd1,计算总分
    val rdd1 = rdd.reduceByKey(_ + _)
    // 将rdd1映射成rdd2,计算总分与平均分
    val rdd2 = rdd1.map(score => (score._1, score._2, (score._2 / 3.0).formatted("%.2f")))
    // 在控制台输出rdd2的内容
    rdd2.collect.foreach(println)
    // 将rdd2内容保存到HDFS指定位置
    rdd2.saveAsTextFile("hdfs://master:9000/scores/output")
    // 关闭Spark容器
    sc.stop()
  }
}

(3) Run the program and view the results

Run the program CalculateSumAvg, console results

insert image description here
View HDFS result files
insert image description here

Guess you like

Origin blog.csdn.net/m0_62617719/article/details/131064478