SparkSQL case demonstration has sample classes

Introduction: When the data type in the RDD has a CaseClass sample class, obtain the attribute name and type through reflection Reflecttion , construct a Schema, apply it to the RDD data set, and convert it to a DataFrame.

/**
 * @author liu a fu
 * @date 2021/1/17 0017
 * @version 1.0
 * @DESC   反射类型推断
 */

/**
  * 封装电影评分数据
  *
  * @param userId 用户ID
  * @param itemId 电影ID
  * @param rating 用户对电影评分
  * @param timestamp 评分时间戳
  */
case class MoviesRating(userid: Int,
                        moviesid: Int,
                        rating: Double,
                        timestamp: Long
                       )

object _06RatDataToDF {
    
    
  def main(args: Array[String]): Unit = {
    
    
    //1-准备环境
    val conf: SparkConf = new SparkConf().setAppName(this.getClass.getSimpleName.stripSuffix("$")).setMaster("local[*]")
    val spark: SparkSession = SparkSession
      .builder()
      .config(conf)
      .getOrCreate()
    spark.sparkContext.setLogLevel("WARN")

    //引入隐式转换   RDD ->DataFrame
    import spark.implicits._
    val fileRDD: RDD[String] = spark.sparkContext.textFile("data/input/sql/ml-100k/u.data") //读取数据源
    //filter  筛选  滤掉
    val ratingDF: DataFrame = fileRDD.filter(line => line != null && line.trim.split("\t").length == 4)
      .mapPartitions(iter => {
    
    
        iter.map(line => {
    
    
          val arr: Array[String] = line.split("\t")
          MoviesRating(arr(0).toInt, arr(1).toInt, arr(2).toDouble, arr(3).toLong)
        })
      }).toDF()


    ratingDF.show()
    ratingDF.printSchema()
  }
}
  • This method requires that the RDD data type must be CaseClass , and the field names in the converted DataFrame are the attribute names in CaseClass.

Guess you like

Origin blog.csdn.net/m0_49834705/article/details/112801200