Big Data-Advanced SparkSQL

Advanced Spark SQL

JDBC connection

Operate a relational database through JDBC, load it into Spark for analysis and processing

Start spark-shell (load mysql driver)
spark-shell --master spark://hadoop1:7077 --jars /root/temp/mysql-connector-java-8.0.13.jar --driver-class-path /root/temp/mysql-connector-java-8.0.13.jar

1. Use Option to connect to Windows Mysql

val mysql = spark.read.format("jdbc").option("url","jdbc:mysql://192.168.138.1:3306/data?serverTimezone=GMT%2B8").option("user","destiny").option("password","destiny").option("dbtable","log").load
mysql.show

Insert picture description here

Second, use the Properties class to connect to Windows Mysql

import java.util.Properties
val property = new Properties()
property.setProperty("user","destiny")
property.setProperty("password","destiny")
val mysql = spark.read.jdbc("jdbc:mysql://192.168.138.1:3306/data?serverTimezone=GMT%2B8","log",property)
mysql.show

Insert picture description here

Use Hive

(1) Put hdfs-site.xml, core-site.xml and hive-site.xml files into Spark's conf folder

(2) Start Zookeeper and Hadoop

zkServer.sh start
start-all.sh

(3) Start the Hive server on the Master

hive --service metastore

(4) Connect to Hive client on Slave

hive
# 设置当前的数据块为default
set hive.cli.print.current.db=true

(5) Start Spark on Slave

hive
./sbin/start-all.sh
spark-shell --master spark://hadoop2:7077

(6) Operate the database in Hive (create a new student table and insert data)

hive
spark.sql("create table default.student(studentID String,studentName String)row format delimited fields terminated by '\t'")
spark.sql("load data local inpath '/root/temp/studentSheet.txt' overwrite into table default.student")

Insert picture description here
Insert picture description here

IDEA creates DataFrame

1. Create with Schema
package Spark

import org.apache.log4j.{Level, Logger}
import org.apache.spark.sql.{Row, SparkSession}
import org.apache.spark.sql.types.{StringType, StructField, StructType}

object DataFrame {
  def main(args: Array[String]): Unit = {
    Logger.getLogger("org.apache.spark").setLevel(Level.ERROR)

    //创建Spark Session对象
    val spark = SparkSession.builder().appName("DataFrame").master("local").getOrCreate()
    //创建sparkRDD
    val sparkRDD = spark.sparkContext.textFile("F:\\IdeaProjects\\in\\studentSheet.txt").map(_.split("\t"))
    //创建rowRDD
    val rowRDD = sparkRDD.map(x => Row(x(0),x(1)))
    //创建Schema
    val schema = StructType(
      List(StructField("id",StringType),StructField("name",StringType))
    )
    //创建DataFrame
    val df = spark.createDataFrame(rowRDD,schema)
    //创建视图
    df.createOrReplaceTempView("student")
    //查询student表
    spark.sql("select * from student").show()
    //关闭
    spark.stop()
  }
}

Second, use case class to create

package Spark

import org.apache.log4j.{Level, Logger}
import org.apache.spark.sql.SparkSession

case class stu(id: String,name: String)

object DataFrame {
  def main(args: Array[String]): Unit = {
    Logger.getLogger("org.apache.spark").setLevel(Level.ERROR)
    //创建Spark Session对象
    val spark = SparkSession.builder().appName("DataFrame").master("local").getOrCreate()
    //创建sparkRDD
    val sparkRDD = spark.sparkContext.textFile("F:\\IdeaProjects\\in\\studentSheet.txt").map(x => x.split("\t"))
    //创建studentRDD
    val studentRDD = sparkRDD.map(x => stu(x(0),x(1)))
    //创建DataFrame
    import spark.sqlContext.implicits._
    val df = studentRDD.toDF
    //创建视图
    df.createOrReplaceTempView("student")
    //查询student表
    spark.sql("select * from student").show()
    //关闭
    spark.stop()
  }
}

result
Insert picture description here

Import spark results into MySQL using IDEA

package Spark

import java.util.Properties

import org.apache.log4j.{Level, Logger}
import org.apache.spark.sql.{Row, SparkSession}
import org.apache.spark.sql.types.{StringType, StructField, StructType}

object SparkMySQL {
  def main(args: Array[String]): Unit = {
    Logger.getLogger("org.apache.spark").setLevel(Level.ERROR)

    //创建Spark Session对象
    val spark = SparkSession.builder().appName("SparkMySQL").master("local").getOrCreate()
    //创建sparkRDD
    val sparkRDD = spark.sparkContext.textFile("F:\\IdeaProjects\\in\\studentSheet.txt").map(_.split("\t"))
    //创建rowRDD
    val rowRDD = sparkRDD.map(x => Row(x(0),x(1)))
    //创建Schema
    val schema = StructType(
      List(StructField("id",StringType),StructField("name",StringType))
    )
    //创建DataFrame
    val df = spark.createDataFrame(rowRDD,schema)
    //创建视图
    df.createOrReplaceTempView("student")
    //查询student表
    val result = spark.sql("select * from student")
    result.show()

    //将spark结果导入MySQL中
    val property = new Properties()
    property.setProperty("user","root")
    property.setProperty("password","root")
    property.setProperty("driver","com.mysql.cj.jdbc.Driver")
    result.write.mode("overwrite").jdbc("jdbc:mysql://localhost:3306/data?serverTimezone=GMT%2B8","student",property)
    //关闭
    spark.stop()
  }
}

result
Insert picture description here

Use IDEA Spark operation Hive to save the results in MySQL

scala code
package Spark

import java.util.Properties

import org.apache.spark.sql.SparkSession

object SparkHive {
  def main(args: Array[String]): Unit = {
    //实例化SparkSession对象
    val spark = SparkSession.builder().appName("SparkHive").enableHiveSupport().getOrCreate()
    //创建Spark SQL结果
    val result = spark.sql("select id,name from default.test")
    //实例化Properties对象
    val property = new Properties()
    //配置property
    property.setProperty("user","destiny")
    property.setProperty("password","destiny")
    result.write.mode("append").jdbc("jdbc:mysql://192.168.138.1:3306/data?serverTimezone=GMT%2B8","stu",property)
    //关闭
    spark.stop()
  }
}

File-> Project Structure-> Artifacts and Build-> Build Artifacts-> Rebuild into jar package, upload to Linux server

spark command
spark-submit --master spark://hadoop2:7077 --jars /root/temp/mysql-connector-java-8.0.13.jar --driver-class-path /root/temp/mysql-connector-java-8.0.13.jar --class Spark.SparkHive /root/temp/SparkSQL.jar

result
Insert picture description here
Insert picture description here
Insert picture description here

Published 131 original articles · won 12 · 60,000 views +

Guess you like

Origin blog.csdn.net/JavaDestiny/article/details/96895353