Writing SparkSQL code through IDEA_Chapter 3

Write SparkSQL code through IDEA to
create DataFrame / DataSet
第1种:指定列名添加Schema
第2种:通过StructType指定Schema
第3种:编写样例类,利用反射机制推断Schema
Data tt.txt:

1 zhangsan 20
2 lisi 29
3 wangwu 25
4 zhaoliu 30
5 tianqi 35
6 kobe 40

The first type: specify the column name to add Schema

// 1, create sparksession object
val spark: SparkSession = SparkSession.builder (). Master (“local [*]”). AppName (“Demo01”). GetOrCreate ()
// 2, through sparksession object or get sparkcontext
val sc = spark.sparkContext
// 3. Read the data and convert it to RDD
val datas: RDD [String] = sc.textFile (“file: /// F: \ Second semester \ 12 \ 05-Spark \ Data \ tt. txt ”)
val lineRDD: RDD [Array [String]] = datas.map ( .split (" "))
// Type conversion
val rowRDD = lineRDD.map (a => (a (0) .toInt, a (1 ), a (2) .toInt))
// implicit conversion: Addition of the original object does not have the function / method
// scala converted implicitly implemented and adding toDF toDS method for RDD
Import spark.implicits.

/ / 4, Convert to DF / DS
val PersonDF: DataFrame = rowRDD.toDF ("id", "name", "age")
// 5, Query data through SQL
PersonDF.show ()

The second party specifies the Schema through StructType

def main (args: Array [String]): Unit = {
// 1, create sparksession
val spark: SparkSession = SparkSession.builder (). master (“local [*]”). appName (“demo01”). getOrCreate ( )
// 2, create sparkcontext
val sc: SparkContext = spark.sparkContext
// 3, read data. And operate
val ttRDD: RDD [String] = sc.textFile (“file: /// F: \ second semester \ 34 \ 05-Spark \ data \ tt.txt”)
val lineDatas: RDD [Array [String]] = ttRDD.map (a => a.split (""))
val RowRDD: RDD [Row] = lineDatas.map (x => Row (x (0) .toInt, x (1), x (2). toInt))
// Set the table structure and pay attention to the imported package import org.apache.spark.sql.types._
val schema: StructType = StructType (List (StructField ("id", IntegerType, true), StructField ("name", StringType, true), StructField ("age", IntegerType, true)))
val DataDF: DataFrame = spark.createDataFrame (RowRDD, schema)
// 4. View data
DataDF.show ()
DataDF.printSchema ()
// StructType cannot use the $ symbol to call variables
//DataDF.filter ($ “age”> 25) .show
// 5, close sparksession sparkcontext
sc.stop ()
spark.stop ()
}

The third method is to write sample classes and use reflection mechanism to infer Schema

// Prepare sample class
case class Person (id: Int, name: String, age: Int)
def main (args: Array [String]): Unit = {
// 1, create sparksession
val spark: SparkSession = SparkSession.builder () .master ("local [*]"). appName ("demo01"). getOrCreate ()
// 2, create sparkcontext
val sc: SparkContext = spark.sparkContext
// 3, read data. And operate
val ttRDD: RDD [String] = sc.textFile (“file: /// F: \ second semester \ 34 \ 05-Spark \ data \ tt.txt”)
val lineDatas: RDD [Array [String]] = ttRDD.map (a => a.split (""))
// Traverse each line of data and pass it to the sample class
val personRDD: RDD [Person] = lineDatas.map (z => Person (z (0). toInt, z (1), z (2) .toInt))
// Call implicit conversion
import spark.implicits._
val DataDF: DataFrame = personRDD.toDF ()
// 4, view data
DataDF.show ()
DataDF.printSchema()
//5、关闭 sparksession sparkcontext
sc.stop()
spark.stop()
}

Published 238 original articles · praised 429 · 250,000 views

Guess you like

Origin blog.csdn.net/qq_45765882/article/details/105560853