(Scala version) Spark Sql RDD/DataFrame/DataSet conversion

Test Data

// 测试的rdd数据
case class User(name:String, age:Int)
val rdd: RDD[(String, Int)] =  spark.sparkContext.makeRDD(List(("jayChou",41),("burukeyou",23)))

1) ASD

1.2 Mutual conversion with DataSet

// 1-  RDD[User] ===> DataSet 
// 带具体类型(User)的RDD内部可通过反射直接转换成 DataSet	
val dsUser: Dataset[User]  = rdd.map(t=>User(t._1, t._2)).toDS 

// 2- DataSet ===> RDD[User]
val rdduser: RDD[User] = dsUser.rdd 

1.3 Mutual conversion with DataFrame

// ================ RDD转成DataFrame ===============================
// 法1: 不带对象类型的RDD通过指定Df的字段转换
//  RDD[(String, Int)] ===> DataFrame
val userDF01: DataFrame =  rdd.toDF("name","age")

//法2: 带对象类型的RDD直接转换
// RDD[User] ====> DataFrame
val userDF02: DataFrame = rdd.map(e => User(e._1,e._2)).toDF 

// ================DataFrame 转成 RDD =============================== 
val rddUser: RDD[Row]  = userDF01.rdd
val userRdd02: RDD[Row]  = userDF02.rdd

tip:

  • No matter how the previous DataFrame is converted, the type of RDD converted from DataFrame is Row. BecauseDataFrame = Dataset[Row], So the actual dataset[Row] is converted to RDD and the returned type is of course Row

2、DataFrame

2.1, and DataSet mutual conversion

// 测试的df
val df: DataFrame = rdd.toDF("name","age")

// 1- DataFrame ===> DataSet
val userDS: Dataset[User] = df.as[User]

// 2-DataSet[User] ===> DataFrame
val userDF: DataFrame = ds.toDF

3. Supplement (create DataFrame through spark)

  • Pay attention to the type of converted rdd
// 1- RDD[(String, Int)] --》	DataFrame
var scDF: DataFrame = spark.createDataFrame(rdd)

// 2- RDD[Row] ===> DataFrame
 val schema = StructType(Array(
      StructField("name",StringType,nullable = true),
      StructField("age",IntegerType,nullable = true)
))
var scDF_Schema: DataFrame = spark.createDataFrame(scDF.rdd, schema)

Guess you like

Origin blog.csdn.net/weixin_41347419/article/details/108887842