By using sparksql api
Implementation steps:
1) Open the scala IDE development environment, create a scala project
2) introducing spark its dependencies jar package
3) Creating a Package object class path
4) write code
Code schematically:
package cn.tedu.sparksql
import org.apache.spark.SparkConf
import org.apache.spark.SparkContext
import org.apache.spark.sql.SQLContext
object Demo01 {
def main(args: Array[String]): Unit = {
val conf=new SparkConf().setMaster("spark://hadoop01:7077").setAppName("sqlDemo01");
val sc=new SparkContext(conf)
val sqlContext=new SQLContext(sc)
val rdd=sc.makeRDD(List((1,"zhang"),(2,"li"),(3,"wang")))
import sqlContext.implicits._
val df=rdd.toDF("id","name")
df.registerTempTable("tabx")
val df2=sqlContext.sql("select * from tabx order by name");
selection RDD2 = df2.toJavaRDD;
// outputs the result to the linux local directory, of course, can also be output to HDFS the
rdd2.saveAsTextFile("file:///home/software/result");
}
}
5 ) hit the jar package and upload it to linux virtual machine
6 ) In the spark of bin directory
Execution: SH the Spark-the Submit --class cn.tedu.sparksql.Demo01 ./sqlDemo01.jar
7 ) The last test