spark integrate external hive (reprint)

The next action is to do :( this operation, labeled jar package to the program running in the cluster)
(1) written hive on the spark online program to create tables and import data
(2) query data hive in
(3) MySQL to save the query results in
the code:

object SparkSqlTest {
    def main(args: Array[String]): Unit = {
        //屏蔽多余的日志
        Logger.getLogger("org.apache.hadoop").setLevel(Level.WARN) Logger.getLogger("org.apache.spark").setLevel(Level.WARN) Logger.getLogger("org.project-spark").setLevel(Level.WARN) //构建编程入口 val conf: SparkConf = new SparkConf() conf.setAppName("SparkSqlTest") val spark: SparkSession = SparkSession.builder().config(conf) .enableHiveSupport() //这句话表示支持hive .getOrCreate() //创建sqlcontext对象 val sqlContext: SQLContext = spark.sqlContext //创建sparkContext val sc: SparkContext = spark.sparkContext //创建数据库 var sql= """ |create database if not exists `test` """.stripMargin spark.sql(sql) //使用当前创建的数据库 sql= """ |use `test` """.stripMargin spark.sql(sql) //创建hive表 sql= """ |create table if not exists `test`.`teacher_basic`( |name string, |age int, |married boolean, |children int |) row format delimited |fields terminated by ',' """.stripMargin spark.sql(sql) //加载数据 sql= """ |load data local inpath 'file:///home/hadoop/teacher_info.txt' |into table `test`.`teacher_basic` """.stripMargin spark.sql(sql) //执行查询操作 sql= """ |select * from `test`.`teacher_basic` """.stripMargin val hiveDF=spark.sql(sql) val url="jdbc:mysql://localhost:3306/test" val table_name="teacher_basic" val pro=new Properties() pro.put("password","123456") pro.put("user","root") hiveDF.write.mode(SaveMode.Append).jdbc(url,table_name,pro) } }

The jar package to run the cluster: https://blog.51cto.com/14048416/2337760

Job submission shell:

spark-submit \
--class com.zy.sql.SparkSqlTest \
--master yarn \
--deploy-mode cluster \
--driver-memory 512M \
--executor-memory 512M \ --total-executor-cores 1 \ file:////home/hadoop/SparkSqlTest-1.0-SNAPSHOT.jar \

Then expectantly waiting for success, unfortunately, when the program runs half the time aborted:
I looked at the log print:
sparkSQL to complete the operation of the Hive
I checked the Internet a lot of information, say a hive version is too high, what I? 'not why !!
then thought I in the cluster, using the spark of the program, to be operated in the hive table, it is not spark hive needs and integrate what ah, then I checked the Internet how to spark consolidation hive, Overall hive is to share out the meta-database, so that spark can be accessed.
Specific operations :
① in the hive hive-site.xml added:

<property>
<name>hive.metastore.uris</name> <value>thrift://hadoop01:9083</value> #在哪里启动这个进程 </property>

② start the process configuration of the hive-site.xml in the corresponding node

nohup hive --service metastore 1>/home/hadoop/logs/hive_thriftserver.log 2>&1 &

ps: here need to look at, nohup is the background to start, and all information is directed output after this command, be sure to check whether this is really successful execution of the command:
Use: jsp check if there are appropriate after the process starts, if there is no express fails to start, certainly the / home / hadoop / logs the parent directory is not created, then create this directory, start, start checking whether success! ! ! ! ! ! !
③ Copy the hive-site.xml to under $ SPARK_HOME / conf (Note that each node must copy)
whether the test was successful ④: spark-SQL , if properly enter the hive and can access a table showing spark consolidation hive success! ! !


Then I have the original program, a re-run, no error results, the program runs successfully! ! !
I can not believe, I see a bit MySQL table:
sparkSQL to complete the operation of the Hive
confirm the success of the program! ! ! ! ! !

 

Original Path: https://blog.51cto.com/14048416/2339270

Guess you like

Origin www.cnblogs.com/zhnagqi-dream/p/11791678.html