Spark rewrite collation (two)

1. The sample class implements custom sorting, it needs to implement ordered characteristics, does not need to implement serializable, and does not need new objects (the sample class in this article)
2. Common classes implement custom sorting, need to implement ordered characteristics, and implement serializable

Name name age age face value fv
text is (Array("mimi1 22 85", "mimi2 22 86", "mimi3 23 86"))
in descending order of face value and age

import org.apache.spark.rdd.RDD
import org.apache.spark.{
    
    SparkConf, SparkContext}

object CustomSort_2 {
    
    
  def main(args: Array[String]): Unit = {
    
    
    val conf = new SparkConf()
    conf.setAppName(this.getClass.getName).setMaster("local[2]")
   val sc = new SparkContext(conf)
    val userInfo: RDD[String]
         = sc.parallelize(Array("mimi1 22 85", "mimi2 22 86", "mimi3 23 86"))
       //对文本进行拆分,并返回一个元组
    val personRDD: RDD[(String, Int, Int)] = userInfo.map(x => {
    
    
      val arr = x.split(" ")
      val name = arr(0)
      val age = arr(1).toInt
      val fv = arr(2).toInt
      (name, age, fv)
    })
    //指定排序规则,把元组的字段传入person2中,按照person2的compare方法进行排序
    val sorted: RDD[(String, Int, Int)] = personRDD.sortBy(x => person2(x._1, x._2, x._3))
    println(sorted.collect.toBuffer)
  }
}
case class person2(val name:String,val age:Int, val fv:Int) extends Ordered[person2]{
    
    
  override def compare(that: person2): Int = {
    
    
    if(this.fv!=that.fv)
      that.fv- this.fv
    else that.age - this.age
  }

  override def toString: String = s"$name,$age,$fv"
}

operation result

ArrayBuffer((mimi3,23,86), (mimi2,22,86), (mimi1,22,85))

Guess you like

Origin blog.csdn.net/qq_42706464/article/details/108355037