Spark - directly manipulate the data source MySQL

If our Mysql server performance is not ye drops, but it is hard enough, how we can do all kinds of complex aggregation operations? The answer is to use the computing power of the spark, we can mysql data source access to the spark.

Read

val mysqlDF = spark
  .read
  .format("jdbc")
  .option("driver","com.mysql.jdbc.Driver")
  .option("url","jdbc:mysql://localhost:3306/ttable")
  .option("user","root")
  .option("password","root")
  .option("dbtable","(select * from ttt where userId >1 AND userId < 10) as log")//条件查询出想要的表
  //.option("dbtable","ttable.ttt")//整张表
  .option("fetchsize","100")
  .option("useSSL","false")
  .load()

Partition reading

spark
  .read
  .format("jdbc")
  .option("url", url)
  .option("dbtable", "ttt")
  .option("user", user)
  .option("password", password)
  .option("numPartitions", 10)
  .option("partitionColumn", "userId")
  .option("lowerBound", 1)
  .option("upperBound", 10000)
  .load()

The actual query is generated as follows, (all partitions would have been inquiries, until the entire table data query completion date)

SELECT * FROM ttt WHERE userId >= 1 and userId < 1000
SELECT * FROM ttt WHERE userId >= 1000 and userId < 2000
SELECT * FROM ttt WHERE userId >= 2000 and userId < 3000
...

Write

mysqlDF.createTempView("log")

spark
  .sql("select * from log")
  .toDF()
  .write
  .mode(SaveMode.Overwrite)
  .format("jdbc")
  .option("driver","com.mysql.jdbc.Driver")
  .option("url","jdbc:mysql://localhost:3306/ttable")
  .option("dbtable","a")
  .option("user","root")
  .option("password","root")
  .option("fetchsize","100")
  .option("useSSL","false")
  .save()
9028759-3c0e86bf567a8fb7.png

9028759-07315bb8dadcd082.png

Guess you like

Origin blog.csdn.net/weixin_34356555/article/details/90840083