Hive: Three deployment methods for metadata

Three deployment methods for Hive metadata

Metabase Derby

1. Schematic diagram of embedded mode:
Insert image description here

2. Derby database:
Derby database is an in-memory database written in Java. It shares a JVM with the application in embedded mode, and the application is responsible for starting and stopping.

  1. Initialize the Derby database
    1) In the hive root directory, use the schematool command in the /bin directory to initialize the Derby metadatabase that comes with hive
    [atguigu@hadoop102 hive]$ bin/schematool -dbType derby -initSchema

2) When executing the above initialization metadata database, you will find that there is a jar package conflict problem. The phenomenon is as follows:
(This jar package exists under both hadoop and hive, mainly the underlying hadoop, and changed to hive)
SLF4J: Found binding in [jar :file:/opt/module/hive/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/module/ hadoop-3.1.3/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]

3) To solve the jar conflict problem, just rename log4j-slf4j-impl-2.10.0.jar in hive's /lib directory
[atguigu@hadoop102 hive]$ mv lib/log4j-slf4j-impl-2.10. 0.jar lib/log4j-slf4j-impl-2.10.0.back

4. Start Hive
1) Execute the hive command in the /bin directory to start hive and connect to hive through cli
[atguigu@hadoop102 hive]$ bin/hive

2) Using Hive

hive> show databases;                                      // 查看当前所有的数据库
OK
default
Time taken: 0.472 seconds, Fetched: 1 row(s)
hive> show tables;                                         // 查看当前所有的表
OK
Time taken: 0.044 seconds
hive> create table test_derby(id int);            // 创建表test_derby,表中只有一个字段,字段类型是int
OK
Time taken: 0.474 seconds
hive> insert into test_derby values(1001);                 // 向test_derby表中插入数据
Query ID = atguigu_20211018153727_586935da-100d-4d7e-8a94-063d373cc5dd
Total jobs = 3
……
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
……
Stage-Stage-1: Map: 1  Reduce: 1   Cumulative CPU: 6.19 sec   HDFS Read: 12769 HDFS Write: 208 SUCCESS
Total MapReduce CPU Time Spent: 6 seconds 190 msec
OK
Time taken: 31.901 second
hive> select * from test_derby;                                // 查看test_derby表中所有数据
OK
1001
Time taken: 0.085 seconds, Fetched: 1 row(s)
hive> exit;
  1. There is only one JVM process
    in embedded mode. In embedded mode, when you execute the jps –ml command on the command line, you can only see one CliDriver process.
    [atguigu@hadoop102 hive]$ jps –ml
    7170 sun.tools.jps.Jps -ml
    6127 org.apache.hadoop.util.RunJar /opt/module/hive/lib/hive-cli-3.1.2.jar org. apache.hadoop.hive.cli.CliDriver

6. Problems with the metadata database that comes with Hive
Demonstrate the problem of using Derby as the metadata database:
open another session window to run Hive, and monitor the hive.log file in the /tmp/atguigu directory at the same time. The following error message will be observed.
Caused by: ERROR XSDB6: Another instance of Derby may already booted the database /opt/module/hive/ metastore_db
.
iapi.error.StandardException.newException(Unknown Source)
at org.apache.derby.impl.store.raw.data.BaseDataFileFactory.privGetJBMSLockOnDB(Unknown Source)
at org.apache.derby.impl.store.raw.data.BaseDataFileFactory. run(Unknown Source)
...
The metadata database used by Hive by default is derby and the deployment method is embedded. After opening Hive, the metadata database will be exclusive and the data will not be shared with other clients. If you want to operate in multiple windows, an error will be reported. The operation is relatively limited. For this reason, Hive supports using MySQL as the metadata database, which can support multi-window operations.

Metabase Mysql

  1. Schematic diagram of direct connection mode:
    Insert image description here

2. MySQL installation and deployment
1) Check whether Mysql has been installed on the current system. If it exists, use the following command to remove it. If it does not exist, ignore it. (Be sure to do this step)
[atguigu@hadoop102 hive]$ rpm -qa|grep mariadb
mariadb-libs-5.5.56-2.el7.x86_64 //If it exists, uninstall it through the following command
[atguigu@hadoop102 hive]$ sudo rpm -e --nodeps mariadb-libs //Use this command to uninstall mariadb

2) Upload the MySQL installation package to the /opt/software directory
[atguigu@hadoop102 software]$ ll
total usage 528384
-rw-r–r–. 1 root root 609556480 March 21 15:41 mysql-5.7.28-1 .el7.x86_64.rpm-bundle.tar

3) Unzip the MySQL installation package resources to the newly created mysql_jars directory under /opt/software
[atguigu@hadoop102 software]$ mkdir /opt/software/mysql_jars
[atguigu@hadoop102 software]$ tar -xf /opt/software/mysql-5.7 .28-1.el7.x86_64.rpm-bundle.tar -C /opt/software/mysql_jars

4) Check the decompressed files in the mysql_jars directory as follows:
[atguigu@hadoop102 software]$ cd /opt/software/mysql_jars
[atguigu@hadoop102 mysql_jars]$ ll
total usage 595272
-rw-r–r–. 1 atguigu atguigu 45109364 9 September 30, 2019 mysql-community-client-5.7.28-1.el7.x86_64.rpm
-rw-r–r–. 1 atguigu atguigu 318768 September 30, 2019 mysql-community-common-5.7.28-1.el7. x86_64.rpm
-rw-r–r–. 1 atguigu atguigu 7037096 September 30 2019 mysql-community-devel-5.7.28-1.el7.x86_64.rpm
-rw-r–r–. 1 atguigu atguigu 49329100 September 30 2019 mysql-community-embedded-5.7.28-1.el7.x86_64.rpm
-rw-r–r–. 1 atguigu atguigu 23354908 September 30 2019 mysql-community-embedded-compat-5.7.28-1.el7 .x86_64.rpm
-rw-r–r–. 1 atguigu atguigu 136837816 September 30, 2019 mysql-community-embedded-devel-5.7.28-1.el7.x86_64.rpm
-rw-r–r–. 1 attribute attribute 4374364 9 Apr 30 2019 mysql-community-libs-5.7.28-1.el7.x86_64.rpm
-rw-r–r–. 1 atguigu atguigu 1353312 9May 30 2019 mysql-community-libs-compat-5.7.28-1.el7.x86_64.rpm
-rw-r–r–. 1 atguigu atguigu 208694824 9May 30 2019 mysql-community-server-5.7.28-1.el7.x86_64.rpm
-rw-r–r–. 1 atguigu atguigu 133129992 9/30/2019 mysql-community-test-5.7.28-1.el7.x86_64.rpm

5) Execute rpm installation in the /opt/software/mysql_jars directory, and execute
sudo rpm -ivh mysql-community-common-5.7.28-1.el7.x86_64.rpm in strict accordance with the following order
sudo rpm -ivh mysql-community-libs -5.7.28-1.el7.x86_64.rpm
sudo rpm -ivh mysql-community-libs-compat-5.7.28-1.el7.x86_64.rpm
sudo rpm -ivh mysql-community-client-5.7.28-1 .el7.x86_64.rpm
sudo rpm -ivh mysql-community-server-5.7.28-1.el7.x86_64.rpm
Note: There will be problems with minimal installation of Linux.

6) If there are files in the data storage path of mysql, you need to delete them all. The storage path address is the value corresponding to the datadir parameter in the /etc/my.cnf file: ·View the value of datadir: [atguigu@
hadoop102
etc ]$ vim my.cnf
……
[mysqld]
datadir=/var/lib/mysql
·Delete all contents in the /var/lib/mysql directory:
[atguigu@hadoop102 hive]$ cd /var/lib/mysql
[root@ hadoop102 mysql]$ sudo rm -rf ./* //Be sure to pay attention to the location of the command execution

7) Initialize the database (this is the initialization after the mysql database is installed), which will create the mysql internal database and tables.
[atguigu@hadoop102 module]$ sudo mysqld --initialize --user=mysql

8) After the initialization is completed, check the temporarily generated password of the root user, which is also the password for logging in to msql for the first time
[atguigu@hadoop102 module]$ sudo cat /var/log/mysqld.log
2021-10-18T08:50:32.172049Z 0 [ Warning] TIMESTAMP with implicit DEFAULT value is deprecated. Please use --explicit_defaults_for_timestamp server option (see documentation for more details).
2021-10-18T08:50:32.364322Z 0 [Warning] InnoDB: New log files created, LSN=45790
2021 -10-18T08:50:32.397350Z 0 [Warning] InnoDB: Creating foreign key constraint system tables.
2021-10-18T08:50:32.453522Z 0 [Warning] No existing UUID has been found, so we assume that this is the first time that this server has been started. Generating a new UUID: 73e2af3c-2ff0-11ec-af41-000c29830057.
2021-10-18T08:50:32.454765Z 0 [Warning] Gtid table is not ready to be used. Table ‘mysql.gtid_executed’ cannot be opened.
2021-10-18T08:50:32.978960Z 0 [Warning] CA certificate ca.pem is self signed.
2021-10-18T08:50:33.314317Z 1 [Note] A temporary password is generated for root@localhost: OU+*c.C9FZy;

9) Start the MySQL service
[atguigu@hadoop102 module]$ sudo systemctl start mysqld

10) Log in to the MySQL database
[atguigu@hadoop102 module]$ mysql -uroot -p
Enter password: (your temporary password) //Enter the temporarily generated password

11) The password of the root user must be modified first, otherwise an error will be reported when performing other operations
mysql> set password = password ("new password");

12) Modify the root user in the user table under the mysql library to allow any IP to connect to
mysql> update mysql.user set host='%' where user='root';

13) Refresh to make the modification effective
mysql> flush privileges;

3. Configure Hive metadata database as MySql

  1. Copy driver
    Hive needs to store metadata information in the metadata database mysql, and needs to use JDBC to connect to MySQL. Therefore, copy the JDBC driver of MySQL to the lib directory of Hive for hive to call.
    [atguigu@hadoop102 software]$ cp mysql-connector-java-5.1.37.jar /opt/module/hive/lib

2) Configure Metastore to MySql and
create a new hive-site.xml file in the /opt/module/hive/conf directory (the configuration in the new configuration file will overwrite the default configuration) [
atguigu@hadoop102 hive]$ vim conf/hive-site .xml

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
    <!-- jdbc连接的URL -->
    <property>
        <name>javax.jdo.option.ConnectionURL</name>
        <value>jdbc:mysql://hadoop102:3306/metastore?useSSL=false</value>
</property>
    <!-- jdbc连接的Driver-->
    <property>
        <name>javax.jdo.option.ConnectionDriverName</name>
        <value>com.mysql.jdbc.Driver</value>
</property>
    <!-- jdbc连接的username-->
    <property>
        <name>javax.jdo.option.ConnectionUserName</name>
        <value>root</value>
    </property>
    <!-- jdbc连接的password -->
    <property>
        <name>javax.jdo.option.ConnectionPassword</name>
        <value>123456</value>
</property>
    <!-- Hive默认在HDFS的工作目录 -->
    <property>
        <name>hive.metastore.warehouse.dir</name>
        <value>/user/hive/warehouse</value>
    </property>
   <!-- Hive元数据存储的验证 -->
    <property>
        <name>hive.metastore.schema.verification</name>
        <value>false</value>
    </property>
    <!-- 元数据存储授权  -->
    <property>
        <name>hive.metastore.event.db.notification.api.auth</name>
        <value>false</value>
    </property>
</configuration>

4.Hive initializes the metadata database.
Create a database metastore for hive to store metadata in mysql, and then create a table through hive's initialization metadata database operation.
1) Log in to MySQL
[atguigu@hadoop102 module]$ mysql -uroot -pyour password

2) Create a new Hive metadata database
mysql> create database metastore;
mysql> quit;

3) Initialize Hive metadata database
[atguigu@hadoop102 hive]$ bin/schematool -initSchema -dbType mysql -verbose

5. Start Hive
1) Start Hive
[atguigu@hadoop102 hive]$ bin/hive

2) Use Hive
hive> show databases; // View all databases
hive> show tables; // View all tables. Does the table test_derby just created exist? Why?
hive> create table test_mysql (id int); // Create test_mysql table, one field is id, type is int
hive> insert into test_mysql values(1002); // Insert data into table test_mysql
hive> select * from test_mysql; / / View test2 table

3) Open another window to test whether concurrent client access is supported
[atguigu@hadoop102 hvie]$ bin/hive
hive> show tables;
hive> select * from test_mysql;

5. Direct connection mode problem:
In the company's production environment, the network environment will be very complex. The environment where mysql is located may be network isolated and cannot be directly accessed. In addition, the root account and password of mysql will be exposed in this mode. There is a risk of leakage. Data security risks.
Thinking: When deploying hive on hadoop103, the metadata database still uses the Mysql instance of hadoop102. How to implement this?

Metadata MetaStore Server

1. Schematic diagram of metadata service model
Insert image description here

2. Metadata service mode:
Start the MetaStore service on the server side, and the client uses the Thrift protocol to access the metadata database through the MetaStore service.
The access method of the metadata service is more suitable for deployment and use in a production environment. Compared with the embedded method, this method is more flexible. (Cross-network, cross-language, and cross-platform)

3. Use Mysql as the metadata database and deploy the metadata service
1) First, configure the hive metadata database as Mysql
[atguigu@hadoop102 hive]$ vim conf/hive-site.xml

2) Add the following configuration information to the hive-site.xml file

    <!-- 指定存储元数据要连接的地址 -->
    <property>
        <name>hive.metastore.uris</name>
        <value>thrift://hadoop102:9083</value>
</property>

Note: After configuring this parameter, the metadata service must be started before starting hive. Otherwise, hive cannot connect to the metadata service after starting.

2) Starting the metadata service
[atguigu@hadoop102 hive]$ bin/hive --service metastore
2021-10-18 18:22:24: Starting Hive Metastore Server
Note: The window can no longer be operated after startup, and a new shell needs to be opened. Do other operations in the window
1) Start hive, check the table and the data in the table, and see if it is a table in the Mysql database.
2) Start hive in another window and test whether multiple clients can connect and operate at the same time.
Thinking: How to deploy hive using metadata service mode on hadoop103?

Guess you like

Origin blog.csdn.net/weixin_45427648/article/details/131819877