Spark SQL Run

  •  Spark SQL first experience 

    1. Entrance -SparkSession

● Before spark2.0 version

SQLContext is to create and execute SQL entrance DataFrame

HiveContext hive by hive sql statement table data operation, compatible hive operation, hiveContext inherited from SQLContext.

 

● After the spark2.0

SparkSession SqlContext encapsulates all the features and HiveContext. By SparkSession you can also get to SparkConetxt.

SparkSession can perform SparkSQL can also perform HiveSQL.

  1.  Creating DataFrame
    1.  Chong read a text file

1. Create a file locally, there are id, name, age three, separated by a space, and then uploaded to the hdfs

vim /opt/package/person

1 zhangsan 20

2 lysis 29

3 wangwu 25

4 zhaoliu 30

5 tianqi 35

6 kobe 40

Upload data files to the HDFS :( my local test mode used here )

hadoop fs -put /opt/person.txt    /

2. Run the following commands in the spark shell, reading data, the data of each row using the divided column delimiter

Open spark-shell 

 /export/servers/spark-2.2.0-bin-2.6.0-cdh5.14.0//bin/spark-shell 

Creating RDD

val lineRDD= sc.textFile("hdfs://node01:8020/person.txt").map(_.split(" "))

local

var lineData2  = sc.textFile("file:///opt/package/person.txt").map(_.split(" "))

// eet [Array [String]]

3. Define case class (schema corresponding to the table)

case class Person(id:Int, name:String, age:Int)

 

4. The RDD class and associated case

Val personRDD = lineRDD.map (x => Person (x (0) .toInt, x (1), x (2) .toInt)) // eet [Person]

 

The RDD converted into DataFrame

val personDF = personRDD.toDF //DataFrame

6. Review the data and schema

personDF.show

+---+--------+---+

| id|    name|age|

+---+--------+---+

| 1 | zhangsan | 20 |

| 2 | lysis | 29 |

| 3 | wangwu | 25 |

| 4 | zhaoliu | 30 |

| 5 | tianqi | 35 |

|  6|    kobe| 40|

+---+--------+---+

personDF.printSchema

 

7. Registry

personDF.createOrReplaceTempView("t_person")

 

8. Execute SQL

spark.sql("select id,name from t_person where id > 3").show

 

9 may also be constructed by SparkSession DataFrame

val dataFrame=spark.read.text("hdfs://node01:8020/person.txt") 

dataFrame.show  // Note: direct read text files do not have complete information schema

dataFrame.printSchema

 

Local mode operation:

  

Published 221 original articles · won praise 300 · Views 300,000 +

Guess you like

Origin blog.csdn.net/bbvjx1314/article/details/105410286