Test Data
// 测试的rdd数据
case class User(name:String, age:Int)
val rdd: RDD[(String, Int)] = spark.sparkContext.makeRDD(List(("jayChou",41),("burukeyou",23)))
1) ASD
1.2 Mutual conversion with DataSet
// 1- RDD[User] ===> DataSet
// 带具体类型(User)的RDD内部可通过反射直接转换成 DataSet
val dsUser: Dataset[User] = rdd.map(t=>User(t._1, t._2)).toDS
// 2- DataSet ===> RDD[User]
val rdduser: RDD[User] = dsUser.rdd
1.3 Mutual conversion with DataFrame
// ================ RDD转成DataFrame ===============================
// 法1: 不带对象类型的RDD通过指定Df的字段转换
// RDD[(String, Int)] ===> DataFrame
val userDF01: DataFrame = rdd.toDF("name","age")
//法2: 带对象类型的RDD直接转换
// RDD[User] ====> DataFrame
val userDF02: DataFrame = rdd.map(e => User(e._1,e._2)).toDF
// ================DataFrame 转成 RDD ===============================
val rddUser: RDD[Row] = userDF01.rdd
val userRdd02: RDD[Row] = userDF02.rdd
tip:
- No matter how the previous DataFrame is converted, the type of RDD converted from DataFrame is Row. BecauseDataFrame = Dataset[Row], So the actual dataset[Row] is converted to RDD and the returned type is of course Row
2、DataFrame
2.1, and DataSet mutual conversion
// 测试的df
val df: DataFrame = rdd.toDF("name","age")
// 1- DataFrame ===> DataSet
val userDS: Dataset[User] = df.as[User]
// 2-DataSet[User] ===> DataFrame
val userDF: DataFrame = ds.toDF
3. Supplement (create DataFrame through spark)
- Pay attention to the type of converted rdd
// 1- RDD[(String, Int)] --》 DataFrame
var scDF: DataFrame = spark.createDataFrame(rdd)
// 2- RDD[Row] ===> DataFrame
val schema = StructType(Array(
StructField("name",StringType,nullable = true),
StructField("age",IntegerType,nullable = true)
))
var scDF_Schema: DataFrame = spark.createDataFrame(scDF.rdd, schema)