scala (type conversion between Row, Array, Tuple, RDD, DF)

Preface

Because of the package function processing of Rdd and Dataframe in spark, type conversion is often encountered. Today, we will record some common type conversions.

Array => Row

val arr = Array("aa/2/cc/10","xx/3/nn/30","xx/3/nn/20")
// val row = Row.fromSeq(arr)
val row = RowFactory.create(arr)

Row => Array

val a:Array[Any] = row.toSeq.toArray

Sometimes the array type T is limited, such as String. At this time, intermediate processing is needed

val a:Array[String] = row.toSeq.map(m => m.toString).toArray

Tuple => Array

val tuple = ((20201022,5060180989186180L,"[12, 15)"),288556)
tuple.productIterator.toArray

Object T to array can also use the above method.

Array => RDD

val rdd = sparkSession.sparkContext.parallelize(Array(tuple))

RDD => DataFrame

// 定义类
case class Person(name:String, age:Int)

Can pass RDD[Row]

	val rdd = sparkSession.sparkContext.parallelize(Array(("tom",1),("luna",2))).map(row =>Row(row._1, row._2))
	// 创建Schema
    val schema=StructType(Array(
      StructField("name",StringType,true),
      StructField("age",IntegerType,true)
    ))
   	val df = sparkSession.createDataFrame(rdd,schema)

You can also go sparkSession.implicits._directly to df through implicit conversion.

	import sparkSession.implicits._
	val df = sparkSession.sparkContext.parallelize(Array(("tom",1),("luna",2)))
      .map(row =>Person(row._1, row._2)).toDF()

DataFrame => RDD

	val rdd1 = df.rdd

Guess you like

Origin blog.csdn.net/yyoc97/article/details/109273555