Preliminary spark sql [combat]

First, the data set

1 张三 26
2 李四 31
3 王五 22
4 赵柳 19
5 James 35

Second, the code

object SparkSqlDemo {
  def main(args: Array[String]): Unit = {
    //配置参数
    val conf = new SparkConf().setMaster("local[*]").setAppName("sparkSql")
    val sc = new SparkContext(conf)
    val session = SparkSession.builder().config(conf).getOrCreate()

    //读取文件
    val lines = sc.textFile("hdfs://192.168.xx.xx:9000/test/user.txt")

    //处理数据
    val row = lines.map(line => {
      val strings = line.split(" ")
      val id = strings(0).toInt
      val name = strings(1)
      val age = strings(2).toInt

      Row(id, name, age)
    })

    //表头
    val sch = StructType(List(StructField("id",IntegerType,true),StructField("name",StringType,false),StructField("age",IntegerType,false)))

    //组装
    val frame = session.createDataFrame(row,sch)
    frame.createOrReplaceTempView("table01")
    session.sql("select sum(age) from table01").show()

    sc.stop()
  }
}
+--------+
|sum(age)|
+--------+
|     133|
+--------+

 

 

Description:

. 1 , Spark running locally has three modes: configuration through a setMaster ()

(1) local modes: local single thread;

(2) local [k] modes: local K threads running;

(3) local [*] mode: use as many local thread running.

2, spark sparkContext program from the start, we need to instantiate and configure parameters conf:

(1) setMaster (): Set run mode

(2) setAppName (): Set a program name for spark

3、sparkSession

SparkSession entrance spark sql, you need to create SparkSession objects:

build () constructor SparkSession is provided through the construction of various configurations may be added:

Method Description
getOrCreate Acquire or create a new sparkSession
enableHiveSupport Added support hive Support
appName Set application name
config Various configuration settings

4, create DataFrame

) To create dataframe by createDataFrame (

 

Guess you like

Origin blog.csdn.net/BD_fuhong/article/details/93194291