Hadoop deployment Hive (MySQL)

Hive installation address

1) Hive official website address
http://hive.apache.org/
2) Document viewing address
https://cwiki.apache.org/confluence/display/Hive/GettingStarted
3) Download address
http://archive.apache.org /dist/hive/
4) github address
https://github.com/apache/hive

Install MySQL

1. First check whether MySQL has been installed

rpm -qa|grep mariadb mariadb-libs-5.5.56-2.el7.x86_64 //如果存在通过如下命令卸载
sudo rpm -e --nodeps  mariadb-libs-5.5.56-2.el7.x86_64   //用此命令卸载mariadb

2. Copy the MySQL installation package to the /opt/software directory

3. Unzip the installation package

tar -xvf mysql-5.7.28-1.el7.x86_64.rpm-bundle.tar

Insert picture description here

4. Installation (install in the unzipped directory)

Pay attention to the order of installation, can't be wrong

sudo rpm -ivh mysql-community-common-5.7.28-1.el7.x86_64.rpm
sudo rpm -ivh mysql-community-libs-5.7.28-1.el7.x86_64.rpm
sudo rpm -ivh mysql-community-libs-compat-5.7.28-1.el7.x86_64.rpm
sudo rpm -ivh mysql-community-client-5.7.28-1.el7.x86_64.rpm
sudo rpm -ivh mysql-community-server-5.7.28-1.el7.x86_64.rpm

If Linux is installed minimally, the following error
warning may appear when installing mysql-community-server-5.7.28-1.el7.x86_64.rpm : mysql-community-server-5.7.28-1.el7.x86_64 .rpm: header V3 DSA/SHA1 Signature, key ID 5072e1f5: NOKEY
error: dependency check failed:
libaio.so.1()(64bit) is mysql-community-server-5.7.28-1.el7.x86_64 requires
libaio .so.1(LIBAIO_0.1)(64bit) is mysql-community-server-5.7.28-1.el7.x86_64 requires
libaio.so.1(LIBAIO_0.4)(64bit) is mysql-community-server-5.7 .28-1.el7.x86_64 requires a
Insert picture description here
solution:
install the missing dependencies through yum, and then reinstall mysql-community-server-5.7.28-1.el7.x86_64

sudo yum install -y libaio

5. Delete all contents in the directory pointed to by datadir in the /etc/my.cnf file

If there is content:
check the value of datadir:

[mysqld]
datadir=/var/lib/mysql

Delete all the contents in the /var/lib/mysql directory:

cd /var/lib/mysql
sudo rm -rf ./*    #注意执行命令的位置

6. Initialize the database

Execute the command to initialize the database

 sudo mysqld --initialize --user=mysql

7. View the temporarily generated password of the root user

sudo cat /var/log/mysqld.log 

Insert picture description here

8. Start the MySQL service

sudo systemctl start mysqld

9. Log in to the MySQL database

[atguigu @hadoop102 opt]$ mysql -uroot -p
Enter password:   输入临时生成的密码

10. The password of the root user must be changed first, otherwise an error will be reported when performing other operations

mysql> set password = password("123456");

11. Modify the root user in the user table under the mysql library to allow any ip connection

mysql> update mysql.user set host='%' where user='root';
mysql> flush privileges;

MySQL exception

When there is a problem with the MySQL installation and you need to reinstall it, you need to clear all the installation content.
Delete the script

#!/bin/bash
service mysql stop 2>/dev/null
service mysqld stop 2>/dev/null
rpm -qa | grep -i mysql | xargs -n1 rpm -e --nodeps 2>/dev/null
rpm -qa | grep -i mariadb | xargs -n1 rpm -e --nodeps 2>/dev/null
rm -rf /var/lib/mysql
rm -rf /usr/lib64/mysql
rm -rf /etc/my.cnf
rm -rf /usr/my.cnf

Hive installation

1. Upload the installation package

Upload apache-hive-3.1.2-bin.tar.gz to the /opt/software directory of Linux

2. Unzip

Unzip apache-hive-3.1.2-bin.tar.gz to the /opt/module/ directory

 tar -zxvf /opt/software/apache-hive-3.1.2-bin.tar.gz -C /opt/module/

3. Rename

Modify the name of apache-hive-3.1.2-bin.tar.gz to hive

[atguigu@hadoop102 software]$ mv /opt/module/apache-hive-3.1.2-bin/ /opt/module/hive

4. Modify /etc/profile.d/my_env.sh, add environment variables

 sudo vim /etc/profile.d/my_env.sh

5. Add content

Configure the following in the environment variables

#HIVE_HOME
export HIVE_HOME=/opt/module/hive
export PATH=$PATH:$HIVE_HOME/bin

6. Solve the log Jar package conflict

mv $HIVE_HOME/lib/log4j-slf4j-impl-2.10.0.jar $HIVE_HOME/lib/log4j-slf4j-impl-2.10.0.bak

Hive metadata configuration to MySql

1. Drive configuration

Copy the MySQL JDBC driver to the lib directory of Hive

cp /opt/software/mysql-connector-java-5.1.37.jar $HIVE_HOME/lib

2. Configure Metastore to MySql

Create a new hive-site.xml file in the $HIVE_HOME/conf directory

vim $HIVE_HOME/conf/hive-site.xml

Add the following content

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
    <!-- jdbc连接的URL -->
    <property>
        <name>javax.jdo.option.ConnectionURL</name>
        <value>jdbc:mysql://hadoop102:3306/metastore?useSSL=false</value>
</property>

    <!-- jdbc连接的Driver-->
    <property>
        <name>javax.jdo.option.ConnectionDriverName</name>
        <value>com.mysql.jdbc.Driver</value>
</property>

	<!-- jdbc连接的username-->
    <property>
        <name>javax.jdo.option.ConnectionUserName</name>
        <value>root</value>
    </property>

    <!-- jdbc连接的password -->
    <property>
        <name>javax.jdo.option.ConnectionPassword</name>
        <value>123456</value>
    </property>
    <!-- Hive默认在HDFS的工作目录 创建表的位置-->
    <property>
        <name>hive.metastore.warehouse.dir</name>
        <value>/user/hive/warehouse</value>
    </property>
    
    <!-- 指定hiveserver2连接的端口号 -->
    <property>
        <name>hive.server2.thrift.port</name>
        <value>10000</value>
    </property>
   <!-- 指定hiveserver2连接的host -->
    <property>
        <name>hive.server2.thrift.bind.host</name>
        <value>hadoop102</value>
</property>

    <!-- 指定存储元数据要连接的地址 -->
    <property>
        <name>hive.metastore.uris</name>
        <value>thrift://hadoop102:9083</value>
    </property>
    <!-- 元数据存储授权  -->
    <property>
        <name>hive.metastore.event.db.notification.api.auth</name>
        <value>false</value>
</property>
<!-- Hive元数据存储版本的验证 -->
    <property>
        <name>hive.metastore.schema.verification</name>
        <value>false</value>
</property>

<!-- hiveserver2的高可用参数,开启此参数可以提高hiveserver2的启动速度 -->
<property>
    <name>hive.server2.active.passive.ha.enable</name>
    <value>true</value>
</property>
</configuration>

Startup configuration

1. Log in to MySQL

mysql -uroot -p123456

2. Create a new Hive metadata database

mysql> create database metastore;
mysql> quit;

3. Initialize the Hive metadata database

schematool -initSchema -dbType mysql -verbose

4. Start metastore and hiveserver2

When using hive, you need to start the metastore and hiveserver2 processes

However, starting with a command will cause the window to no longer be operated after startup. You need to open a new shell window to do other operations. This method is not recommended

hive --service metastore 
hive --service hiveserver2

5. Write hive service startup script

The method of starting in the foreground leads to the need to open multiple shell windows. You can use the following method to start
nohup in the background mode : Put it at the beginning of the command to indicate that it does not hang, that is, close the terminal process and continue to run

2>&1: indicates that the error is redirected to standard output

&: placed at the end of the command, indicating that it is running in the background

Generally used in combination: nohup [xxx command operation]> file 2>&1 &, which means to run the xxx command

The result is output to the file, and the process started by the command is kept running in the background.

#!/bin/bash
HIVE_LOG_DIR=$HIVE_HOME/logs
if [ ! -d $HIVE_LOG_DIR ]
then
	mkdir -p $HIVE_LOG_DIR
fi
#检查进程是否运行正常,参数1为进程名,参数2为进程端口
function check_process()
{
    
    
    pid=$(ps -ef 2>/dev/null | grep -v grep | grep -i $1 | awk '{print $2}')
    ppid=$(netstat -nltp 2>/dev/null | grep $2 | awk '{print $7}' | cut -d '/' -f 1)
    echo $pid
    [[ "$pid" =~ "$ppid" ]] && [ "$ppid" ] && return 0 || return 1
}

function hive_start()
{
    
    
    metapid=$(check_process HiveMetastore 9083)
    cmd="nohup hive --service metastore >$HIVE_LOG_DIR/metastore.log 2>&1 &"
    cmd=$cmd" sleep 4; hdfs dfsadmin -safemode wait >/dev/null 2>&1"
    [ -z "$metapid" ] && eval $cmd || echo "Metastroe服务已启动"
    server2pid=$(check_process HiveServer2 10000)
    cmd="nohup hive --service hiveserver2 >$HIVE_LOG_DIR/hiveServer2.log 2>&1 &"
    [ -z "$server2pid" ] && eval $cmd || echo "HiveServer2服务已启动"
}

function hive_stop()
{
    
    
    metapid=$(check_process HiveMetastore 9083)
    [ "$metapid" ] && kill $metapid || echo "Metastore服务未启动"
    server2pid=$(check_process HiveServer2 10000)
    [ "$server2pid" ] && kill $server2pid || echo "HiveServer2服务未启动"
}

case $1 in
"start")
    hive_start
    ;;
"stop")
    hive_stop
    ;;
"restart")
    hive_stop
    sleep 2
    hive_start
    ;;
"status")
    check_process HiveMetastore 9083 >/dev/null && echo "Metastore服务运行正常" || echo "Metastore服务运行异常"
    check_process HiveServer2 10000 >/dev/null && echo "HiveServer2服务运行正常" || echo "HiveServer2服务运行异常"
    ;;
*)
    echo Invalid Args!
    echo 'Usage: '$(basename $0)' start|stop|restart|status'
    ;;
esac

6. Add execution permissions

chmod +x $HIVE_HOME/bin/hiveservices.sh

7, start

Start Hive background service (hadoop needs to be started first)

beeline client access

# 启动beeline客户端
beeline -u jdbc:hive2://hadoop102:10000 -n atguigu
# 退出客户端
!quit

hive client access

# 启动hive客户端
hive
# 退出客户端
quit
Hive configuration prints the current library and header

Add the following two configurations to hive-site.xml:

<property>
    <name>hive.cli.print.header</name>
    <value>true</value>
    <description>Whether to print the names of the columns in query output.</description>
</property>
<property>
    <name>hive.cli.print.current.db</name>
    <value>true</value>
    <description>Whether to include the current database in the Hive prompt.</description>
</property>

Guess you like

Origin blog.csdn.net/qq_38705144/article/details/111731445