Spark集成Hive和mysql

一、Spark集成Hive

1、将Hive的配置文件拷贝到Spark的配置文件目录下,软拷贝硬拷贝皆可以

ln -s /opt/software/hadoop/hive110/conf/hive-site.xml /opt/software/hadoop/spark244//conf/hive-site.xml

2、拷贝jar包

cp /opt/software/hadoop/hive110/lib/mysql-connector-java-5.1.32.jar /opt/software/hadoop/spark244/jars/

3、启动Spark-shell

spark-shell --jars /opt/software/hadoop/spark244/jars/mysql-connector-java-5.1.32.jar

4、在Hive中建表-略

5、在Spark SQL中插入数据-略,此处直接查询数据库做演示

scala> spark.sql("show databases").show()

6、在Hive中查询数据即可看到在Spark中的操作

7、IDEA中集成

Maven搜索Spark-Hive,选第一个Spark Project Hive » [2.4.4],找到对应的scala版本号

    <!-- https://mvnrepository.com/artifact/org.apache.spark/spark-hive -->
    <dependency>
      <groupId>org.apache.spark</groupId>
      <artifactId>spark-hive_2.11</artifactId>
      <version>2.4.4</version>

    </dependency>
    <!-- mysql-connector-java -->
    <dependency>
      <groupId>mysql</groupId>
      <artifactId>mysql-connector-java</artifactId>
      <version>5.1.31</version>
    </dependency>

8、把hive110/conf/hive-site.xml文件拷贝到resources资源包中

把第一个property中的hive仓库路径添加hdfs端口hdfs://192.168.221.140:9000

<property>
<name>hive.metastore.warehouse.dir</name>
<value>hdfs://192.168.221.140:9000/opt/software/hadoop/hive110/warehouse</value>
</property>

9、mysql中创建Hive账号并赋予权限

mysql中输入以下命令:

grant all on *.* to 'root'@'%' identified by 'kb10';
grant all on *.* to 'root'@'localhost' identified by 'kb10';
flush privileges;

10、IDEA代码如下,即可连接成功

object HiveSpark{
    
    
  def main(args: Array[String]): Unit = {
    
    
    val spark = SparkSession.builder()
      .master("local[4]")
      .appName(this.getClass.getSimpleName)
      .enableHiveSupport()
      .getOrCreate()
    spark.sql("show databases").show()
  }
}

做完以上步骤后,在回到虚拟机下使用beeline -u jdbc:hive2://192.168.221.140:10000命令时,启动的是spark内置的beeline,因此无法启动,此时需要进入hive/bin目录下用bash启动即可

二、Spark集成mysql

object ConnectSql{
    
    
  def main(args: Array[String]): Unit = {
    
    
    val spark = SparkSession.builder()
      .master("local[4]")
      .appName(this.getClass.getSimpleName)
      .enableHiveSupport().getOrCreate()
    //最后面是数据库名
    val url = "jdbc:mysql://192.168.221.140:3306/exam"
    val tableName = "cron_test"//表名
    // 设置连接用户、密码、数据库驱动类
    val prop = new java.util.Properties
    prop.setProperty("user","root")
    prop.setProperty("password","kb10")
    prop.setProperty("driver","com.mysql.jdbc.Driver")
    // 取得该表数据
    val jdbcDF = spark.read.jdbc(url,tableName,prop)
    jdbcDF.show
    //DF存为新的表
    jdbcDF.write.mode("append").jdbc(url,"t2",prop)
  }
}

猜你喜欢

转载自blog.csdn.net/xiaoxaoyu/article/details/112391975