(Scala版)Spark Sql RDD/DataFrame/DataSet 相互转换

测试数据

// 测试的rdd数据
case class User(name:String, age:Int)
val rdd: RDD[(String, Int)] =  spark.sparkContext.makeRDD(List(("jayChou",41),("burukeyou",23)))

1、RDD

1.2、与DataSet的互相转换

// 1-  RDD[User] ===> DataSet 
// 带具体类型(User)的RDD内部可通过反射直接转换成 DataSet	
val dsUser: Dataset[User]  = rdd.map(t=>User(t._1, t._2)).toDS 

// 2- DataSet ===> RDD[User]
val rdduser: RDD[User] = dsUser.rdd 

1.3 与DataFrame 的互相转换

// ================ RDD转成DataFrame ===============================
// 法1: 不带对象类型的RDD通过指定Df的字段转换
//  RDD[(String, Int)] ===> DataFrame
val userDF01: DataFrame =  rdd.toDF("name","age")

//法2: 带对象类型的RDD直接转换
// RDD[User] ====> DataFrame
val userDF02: DataFrame = rdd.map(e => User(e._1,e._2)).toDF 

// ================DataFrame 转成 RDD =============================== 
val rddUser: RDD[Row]  = userDF01.rdd
val userRdd02: RDD[Row]  = userDF02.rdd

tip:

  • 无论之前的DataFrame是如何转换过来的, DataFrame转成成的RDD的类型都是Row. 因为DataFrame = Dataset[Row], 所以实际是 Dataset[Row] 转成 RDD 返回的类型当然是Row

2、DataFrame

2.1、与DataSet 互相转换

// 测试的df
val df: DataFrame = rdd.toDF("name","age")

// 1- DataFrame ===> DataSet
val userDS: Dataset[User] = df.as[User]

// 2-DataSet[User] ===> DataFrame
val userDF: DataFrame = ds.toDF

3、补充(通过spark创造DataFrame)

  • 注意转换的rdd的类型
// 1- RDD[(String, Int)] --》	DataFrame
var scDF: DataFrame = spark.createDataFrame(rdd)

// 2- RDD[Row] ===> DataFrame
 val schema = StructType(Array(
      StructField("name",StringType,nullable = true),
      StructField("age",IntegerType,nullable = true)
))
var scDF_Schema: DataFrame = spark.createDataFrame(scDF.rdd, schema)

猜你喜欢

转载自blog.csdn.net/weixin_41347419/article/details/108887842
今日推荐