First, the data set
1 张三 26
2 李四 31
3 王五 22
4 赵柳 19
5 James 35
Second, the code
object SparkSqlDemo {
def main(args: Array[String]): Unit = {
//配置参数
val conf = new SparkConf().setMaster("local[*]").setAppName("sparkSql")
val sc = new SparkContext(conf)
val session = SparkSession.builder().config(conf).getOrCreate()
//读取文件
val lines = sc.textFile("hdfs://192.168.xx.xx:9000/test/user.txt")
//处理数据
val row = lines.map(line => {
val strings = line.split(" ")
val id = strings(0).toInt
val name = strings(1)
val age = strings(2).toInt
Row(id, name, age)
})
//表头
val sch = StructType(List(StructField("id",IntegerType,true),StructField("name",StringType,false),StructField("age",IntegerType,false)))
//组装
val frame = session.createDataFrame(row,sch)
frame.createOrReplaceTempView("table01")
session.sql("select sum(age) from table01").show()
sc.stop()
}
}
+--------+
|sum(age)|
+--------+
| 133|
+--------+
Description:
. 1 , Spark running locally has three modes: configuration through a setMaster ()
(1) local modes: local single thread;
(2) local [k] modes: local K threads running;
(3) local [*] mode: use as many local thread running.
2, spark sparkContext program from the start, we need to instantiate and configure parameters conf:
(1) setMaster (): Set run mode
(2) setAppName (): Set a program name for spark
3、sparkSession
SparkSession entrance spark sql, you need to create SparkSession objects:
build () constructor SparkSession is provided through the construction of various configurations may be added:
Method | Description |
---|---|
getOrCreate | Acquire or create a new sparkSession |
enableHiveSupport | Added support hive Support |
appName | Set application name |
config | Various configuration settings |
4, create DataFrame
) To create dataframe by createDataFrame (