Hive installation address
1) Hive official website address
http://hive.apache.org/
2) Document viewing address
https://cwiki.apache.org/confluence/display/Hive/GettingStarted
3) Download address
http://archive.apache.org /dist/hive/
4) github address
https://github.com/apache/hive
Install MySQL
1. First check whether MySQL has been installed
rpm -qa|grep mariadb mariadb-libs-5.5.56-2.el7.x86_64 //如果存在通过如下命令卸载
sudo rpm -e --nodeps mariadb-libs-5.5.56-2.el7.x86_64 //用此命令卸载mariadb
2. Copy the MySQL installation package to the /opt/software directory
3. Unzip the installation package
tar -xvf mysql-5.7.28-1.el7.x86_64.rpm-bundle.tar
4. Installation (install in the unzipped directory)
Pay attention to the order of installation, can't be wrong
sudo rpm -ivh mysql-community-common-5.7.28-1.el7.x86_64.rpm
sudo rpm -ivh mysql-community-libs-5.7.28-1.el7.x86_64.rpm
sudo rpm -ivh mysql-community-libs-compat-5.7.28-1.el7.x86_64.rpm
sudo rpm -ivh mysql-community-client-5.7.28-1.el7.x86_64.rpm
sudo rpm -ivh mysql-community-server-5.7.28-1.el7.x86_64.rpm
If Linux is installed minimally, the following error
warning may appear when installing mysql-community-server-5.7.28-1.el7.x86_64.rpm : mysql-community-server-5.7.28-1.el7.x86_64 .rpm: header V3 DSA/SHA1 Signature, key ID 5072e1f5: NOKEY
error: dependency check failed:
libaio.so.1()(64bit) is mysql-community-server-5.7.28-1.el7.x86_64 requires
libaio .so.1(LIBAIO_0.1)(64bit) is mysql-community-server-5.7.28-1.el7.x86_64 requires
libaio.so.1(LIBAIO_0.4)(64bit) is mysql-community-server-5.7 .28-1.el7.x86_64 requires a
solution:
install the missing dependencies through yum, and then reinstall mysql-community-server-5.7.28-1.el7.x86_64
sudo yum install -y libaio
5. Delete all contents in the directory pointed to by datadir in the /etc/my.cnf file
If there is content:
check the value of datadir:
[mysqld]
datadir=/var/lib/mysql
Delete all the contents in the /var/lib/mysql directory:
cd /var/lib/mysql
sudo rm -rf ./* #注意执行命令的位置
6. Initialize the database
Execute the command to initialize the database
sudo mysqld --initialize --user=mysql
7. View the temporarily generated password of the root user
sudo cat /var/log/mysqld.log
8. Start the MySQL service
sudo systemctl start mysqld
9. Log in to the MySQL database
[atguigu @hadoop102 opt]$ mysql -uroot -p
Enter password: 输入临时生成的密码
10. The password of the root user must be changed first, otherwise an error will be reported when performing other operations
mysql> set password = password("123456");
11. Modify the root user in the user table under the mysql library to allow any ip connection
mysql> update mysql.user set host='%' where user='root';
mysql> flush privileges;
MySQL exception
When there is a problem with the MySQL installation and you need to reinstall it, you need to clear all the installation content.
Delete the script
#!/bin/bash
service mysql stop 2>/dev/null
service mysqld stop 2>/dev/null
rpm -qa | grep -i mysql | xargs -n1 rpm -e --nodeps 2>/dev/null
rpm -qa | grep -i mariadb | xargs -n1 rpm -e --nodeps 2>/dev/null
rm -rf /var/lib/mysql
rm -rf /usr/lib64/mysql
rm -rf /etc/my.cnf
rm -rf /usr/my.cnf
Hive installation
1. Upload the installation package
Upload apache-hive-3.1.2-bin.tar.gz to the /opt/software directory of Linux
2. Unzip
Unzip apache-hive-3.1.2-bin.tar.gz to the /opt/module/ directory
tar -zxvf /opt/software/apache-hive-3.1.2-bin.tar.gz -C /opt/module/
3. Rename
Modify the name of apache-hive-3.1.2-bin.tar.gz to hive
[atguigu@hadoop102 software]$ mv /opt/module/apache-hive-3.1.2-bin/ /opt/module/hive
4. Modify /etc/profile.d/my_env.sh, add environment variables
sudo vim /etc/profile.d/my_env.sh
5. Add content
Configure the following in the environment variables
#HIVE_HOME
export HIVE_HOME=/opt/module/hive
export PATH=$PATH:$HIVE_HOME/bin
6. Solve the log Jar package conflict
mv $HIVE_HOME/lib/log4j-slf4j-impl-2.10.0.jar $HIVE_HOME/lib/log4j-slf4j-impl-2.10.0.bak
Hive metadata configuration to MySql
1. Drive configuration
Copy the MySQL JDBC driver to the lib directory of Hive
cp /opt/software/mysql-connector-java-5.1.37.jar $HIVE_HOME/lib
2. Configure Metastore to MySql
Create a new hive-site.xml file in the $HIVE_HOME/conf directory
vim $HIVE_HOME/conf/hive-site.xml
Add the following content
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<!-- jdbc连接的URL -->
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://hadoop102:3306/metastore?useSSL=false</value>
</property>
<!-- jdbc连接的Driver-->
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
</property>
<!-- jdbc连接的username-->
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>root</value>
</property>
<!-- jdbc连接的password -->
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>123456</value>
</property>
<!-- Hive默认在HDFS的工作目录 创建表的位置-->
<property>
<name>hive.metastore.warehouse.dir</name>
<value>/user/hive/warehouse</value>
</property>
<!-- 指定hiveserver2连接的端口号 -->
<property>
<name>hive.server2.thrift.port</name>
<value>10000</value>
</property>
<!-- 指定hiveserver2连接的host -->
<property>
<name>hive.server2.thrift.bind.host</name>
<value>hadoop102</value>
</property>
<!-- 指定存储元数据要连接的地址 -->
<property>
<name>hive.metastore.uris</name>
<value>thrift://hadoop102:9083</value>
</property>
<!-- 元数据存储授权 -->
<property>
<name>hive.metastore.event.db.notification.api.auth</name>
<value>false</value>
</property>
<!-- Hive元数据存储版本的验证 -->
<property>
<name>hive.metastore.schema.verification</name>
<value>false</value>
</property>
<!-- hiveserver2的高可用参数,开启此参数可以提高hiveserver2的启动速度 -->
<property>
<name>hive.server2.active.passive.ha.enable</name>
<value>true</value>
</property>
</configuration>
Startup configuration
1. Log in to MySQL
mysql -uroot -p123456
2. Create a new Hive metadata database
mysql> create database metastore;
mysql> quit;
3. Initialize the Hive metadata database
schematool -initSchema -dbType mysql -verbose
4. Start metastore and hiveserver2
When using hive, you need to start the metastore and hiveserver2 processes
However, starting with a command will cause the window to no longer be operated after startup. You need to open a new shell window to do other operations. This method is not recommended
hive --service metastore
hive --service hiveserver2
5. Write hive service startup script
The method of starting in the foreground leads to the need to open multiple shell windows. You can use the following method to start
nohup in the background mode : Put it at the beginning of the command to indicate that it does not hang, that is, close the terminal process and continue to run
2>&1: indicates that the error is redirected to standard output
&: placed at the end of the command, indicating that it is running in the background
Generally used in combination: nohup [xxx command operation]> file 2>&1 &, which means to run the xxx command
The result is output to the file, and the process started by the command is kept running in the background.
#!/bin/bash
HIVE_LOG_DIR=$HIVE_HOME/logs
if [ ! -d $HIVE_LOG_DIR ]
then
mkdir -p $HIVE_LOG_DIR
fi
#检查进程是否运行正常,参数1为进程名,参数2为进程端口
function check_process()
{
pid=$(ps -ef 2>/dev/null | grep -v grep | grep -i $1 | awk '{print $2}')
ppid=$(netstat -nltp 2>/dev/null | grep $2 | awk '{print $7}' | cut -d '/' -f 1)
echo $pid
[[ "$pid" =~ "$ppid" ]] && [ "$ppid" ] && return 0 || return 1
}
function hive_start()
{
metapid=$(check_process HiveMetastore 9083)
cmd="nohup hive --service metastore >$HIVE_LOG_DIR/metastore.log 2>&1 &"
cmd=$cmd" sleep 4; hdfs dfsadmin -safemode wait >/dev/null 2>&1"
[ -z "$metapid" ] && eval $cmd || echo "Metastroe服务已启动"
server2pid=$(check_process HiveServer2 10000)
cmd="nohup hive --service hiveserver2 >$HIVE_LOG_DIR/hiveServer2.log 2>&1 &"
[ -z "$server2pid" ] && eval $cmd || echo "HiveServer2服务已启动"
}
function hive_stop()
{
metapid=$(check_process HiveMetastore 9083)
[ "$metapid" ] && kill $metapid || echo "Metastore服务未启动"
server2pid=$(check_process HiveServer2 10000)
[ "$server2pid" ] && kill $server2pid || echo "HiveServer2服务未启动"
}
case $1 in
"start")
hive_start
;;
"stop")
hive_stop
;;
"restart")
hive_stop
sleep 2
hive_start
;;
"status")
check_process HiveMetastore 9083 >/dev/null && echo "Metastore服务运行正常" || echo "Metastore服务运行异常"
check_process HiveServer2 10000 >/dev/null && echo "HiveServer2服务运行正常" || echo "HiveServer2服务运行异常"
;;
*)
echo Invalid Args!
echo 'Usage: '$(basename $0)' start|stop|restart|status'
;;
esac
6. Add execution permissions
chmod +x $HIVE_HOME/bin/hiveservices.sh
7, start
Start Hive background service (hadoop needs to be started first)
beeline client access
# 启动beeline客户端
beeline -u jdbc:hive2://hadoop102:10000 -n atguigu
# 退出客户端
!quit
hive client access
# 启动hive客户端
hive
# 退出客户端
quit
Hive configuration prints the current library and header
Add the following two configurations to hive-site.xml:
<property>
<name>hive.cli.print.header</name>
<value>true</value>
<description>Whether to print the names of the columns in query output.</description>
</property>
<property>
<name>hive.cli.print.current.db</name>
<value>true</value>
<description>Whether to include the current database in the Hive prompt.</description>
</property>