7 | Calculate the average value for each key and sort it in descending order

Suppose you have an RDD containing sales orders, where each element is a key-value pair, where the key represents the product name and the value represents the sales quantity. You want to group sales orders by product name and calculate the total sales quantity for each product. Finally, you want to get the total sales quantity for each product and a detailed list of sales orders grouped by product name.

  1. Calculate the sum and count corresponding to each key.
  2. Calculate the average value for each key and sort them in descending order.
  3. Print the average value for each key.

need:

  1. Load data from a CSV file. The format of the CSV file is as follows:

    A,1
    B,2
    A,3
    C,4
    B,5
    
  2. Use mapan operation to convert each row of data into an RDD in the form of key-value pairs, where the key is the first column of the CSV file and the value is the second column of the CSV file.

  3. Using reduceByKeyaggregated data, calculate the sum and count for each key. The result is in the form (键, (总和, 计数)).

  4. Print the sum and count corresponding to each key.

  5. Calculate the average for each key as the sum divided by the count.

  6. Sort the averages in descending order and print the sorted results.

package com.bigdata;

import org.apache.spark.SparkConf;
import org.apache.spark.api.java.JavaPairRDD;
import org.apache.spark.api.java.JavaRDD;
import org.apache.spark.api.java.JavaSparkContext;
import org.apache.spark.api.java.function.PairFunction;
import org.apache.spark.

Guess you like

Origin blog.csdn.net/weixin_44510615/article/details/132632177