SparkSQL integration Hive

1、Hive query process and principle
执行HQL时,先到MySQL元数据库中查找描述信息,然后解析HQL并根据描述信息生成MR任务
Hive converts SQL to MapReduce and executes slowly
Using SparkSQL to integrate Hive is actually the 让SparkSQL去加载Hive 的元数据库,然后通过SparkSQL执行引擎去操作Hive表内的数据
first thing you need to turn on Hive ’s metabase service to enable SparkSQL to load metadata
2. Hive starts the MetaStore service
1: Modify hive / conf / hive-site.xml Add the following configuration

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
    <property>
      <name>hive.metastore.warehouse.dir</name>
      <value>/user/hive/warehouse</value>
    </property>
    <property>
      <name>hive.metastore.local</name>
      <value>false</value>
    </property>
 </configuration>

3 .: Start Hive MetaStore service in the background

nohup /export/servers/hive/bin/hive --service metastore 2>&1 >> /var/log.log &

4. SparkSQL integration Hive MetaStore
Spark has a built-in MateStore, which uses Derby embedded database to save data, but this method is not suitable for production environment, because this mode can only be used by one SparkSession at the same time, so the production environment is more recommended to use Hive MetaStore
SparkSQL integrates Hive's MetaStore The main idea is to be able to access it through configuration, and be able to use HDFS to save WareHouse,

So you can directly copy the configuration files of Hadoop and Hive to the configuration directory of Spark
hive-site.xml The location of the metadata warehouse and other information
core-site.xml Security related configuration
hdfs-site.xml HDFS related configuration

Use IDEA local test to directly put the above configuration file in the resources directory

import org.apache.spark.sql.SparkSession
object day_hive01 {
// 4, SparkSQL integration Hive MetaStore
def main (args: Array [String]): Unit = {
// Create sparkSession
val spark: SparkSession = SparkSession.builder (). master (“local [*]”). appName (“day_hive01”). enableHiveSupport (). getOrCreate ()
// Call the hive command directly
// View the database
spark.sql (“show databases”). show ()
// Go in Database
spark.sql ("use bilibili"). Show ()
// View table
spark.sql ("show tables"). Show ()
// Query data
spark.sql ("select * from month limit 2"). Show ()
}
}

Published 238 original articles · praised 429 · 250,000 views

Guess you like

Origin blog.csdn.net/qq_45765882/article/details/105562220
Recommended