spark小案例实战2(scala + spark2 版本)

需求:

实现步骤:

  • 1、实现自定义的key,要实现Ordered接口和Serializable接口,在key中实现自己对多个列的排序算法
  • 2、将包含文本的RDD,映射成key为自定义key,value为文本的JavaPairRDD(map)
  • 3、使用sortByKey算子按照自定义的key进行排序(sortByKey)
  • 4、再次映射,剔除自定义的key,只保留文本行(map)
  • 5、打印输出(foreach)
class SecondSortKey(val first:Int,val second:Int) extends Ordered[SecondSortKey] with Serializable{
  override def compare(that: SecondSortKey): Int = {
    if(this.first - that.first !=0){
      this.first-that.first
    }else{
      this.second-that.second
    }
  }
}
object SecondSort {
  def main(args: Array[String]): Unit = {
    val spark = SparkSession.builder().appName("SecondSort").master("local").getOrCreate()
    val lines = spark.sparkContext.textFile("D:\\sort.txt")
    val pairs = lines.map{line => (
      new SecondSortKey(line.split(" ")(0).toInt,line.split(" ")(1).toInt),line
    )}
    val sortedParis = pairs.sortByKey()
    val sortedLines = sortedParis.map(pairs => pairs._2)
    sortedLines.foreach(s => println(s))
    spark.stop()
  }
}

猜你喜欢

转载自www.cnblogs.com/pocahontas/p/11334231.html
今日推荐