Spark教程——（4）Spark-shell调用SQLContext（HiveContext）

启动Spark-shell：

[root@node1 ~]# spark-shell
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel).
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 1.6.0
      /_/

Using Scala version 2.10.5 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_131)
Type in expressions to have them evaluated.
Type :help for more information.
Spark context available as sc (master = yarn-client, app id = application_1554951897984_0111).
SQL context available as sqlContext.

scala> sc
res0: org.apache.spark.SparkContext = org.apache.spark.SparkContext@272485a6

scala> sqlContext
res1: org.apache.spark.sql.SQLContext = org.apache.spark.sql.hive.HiveContext@11c95035

上下文已经包含 sc 和 sqlContext：

Spark context available as sc (master = yarn-client, app id = application_1554951897984_0111).
SQL context available as sqlContext.

本地创建people07041119.json

{"name":"zhangsan","job number":"101","age":33,"gender":"male","deptno":1,"sal":18000}
{"name":"lisi","job number":"102","age":30,"gender":"male","deptno":2,"sal":20000}
{"name":"wangwu","job number":"103","age":35,"gender":"female","deptno":3,"sal":50000}
{"name":"zhaoliu","job number":"104","age":31,"gender":"male","deptno":1,"sal":28000}
{"name":"tianqi","job number":"105","age":36,"gender":"female","deptno":3,"sal":90000}

本地创建dept.json

{"name":"development","deptno":1}
{"name":"personnel","deptno":2}
{"name":"testing","deptno":3}

将本地文件上传到HDFS上：

bash-4.2$ hadoop dfs -put /home/**/data/people07041119.json /user/**
bash-4.2$ hadoop dfs -put /home/**/data/dept.json /user/**

结果如下：

执行Scala脚本，加载文件：

scala> val people=sqlContext.jsonFile("/user/**/people07041119.json")
warning: there were 1 deprecation warning(s); re-run with -deprecation for details
people: org.apache.spark.sql.DataFrame = [age: bigint, deptno: bigint, gender: string, job number: string, name: string, sal: bigint]

scala> val dept=sqlContext.jsonFile("/user/**/dept.json")
warning: there were 1 deprecation warning(s); re-run with -deprecation for details
people: org.apache.spark.sql.DataFrame = [deptno: bigint, name: string]

执行Scala脚本，查看文件内容：

scala> people.show
+---+------+------+----------+--------+-----+
|age|deptno|gender|job number|    name|  sal|
+---+------+------+----------+--------+-----+
| 33|     1|  male|       101|zhangsan|18000|
| 30|     2|  male|       102|    lisi|20000|
| 35|     3|female|       103|  wangwu|50000|
| 31|     1|  male|       104| zhaoliu|28000|
| 36|     3|female|       105|  tianqi|90000|
+---+------+------+----------+--------+-----+

显示前三条记录：

scala> people.show(3)
+---+------+------+----------+--------+-----+
|age|deptno|gender|job number|    name|  sal|
+---+------+------+----------+--------+-----+
| 33|     1|  male|       101|zhangsan|18000|
| 30|     2|  male|       102|    lisi|20000|
| 35|     3|female|       103|  wangwu|50000|
+---+------+------+----------+--------+-----+
only showing top 3 rows

查看列信息：

scala>  people.columns
res5: Array[String] = Array(age, deptno, gender, job number, name, sal)

添加过滤条件：

scala>  people.filter("gender='male'").count
res6: Long = 3

参考：

https://blog.csdn.net/xiaolong_4_2/article/details/80886371

Spark教程——（4）Spark-shell调用SQLContext（HiveContext）

猜你喜欢