Distributed data warehouse, structured data file is mapped to a table, and provides SQL-like query. Hive is designed to allow analysts proficient in Java programming skills SQL skills but relatively weak can perform queries against stored in HDFS massive data sets
Essentially: HQL converted to the MapReduce program
data storage 1) Hive processed in HDFS
2) analysis of the underlying data to achieve Hive is MapReduce
3) the implementation of a program running on Yarn
download:
http://archive.apache.org/dist/hive/
Hive cluster installation (based Mysql)
1) install jdk (slightly)
2) install hadoop (slightly)
3) install mysql database (slightly)
yum install wget
wget http://repo.mysql.com/mysql-community-release-el7-5.noarch.rpm
[root@hadoopNode1 soft]# rpm -ivh mysql-community-release-el7-5.noarch.rpm
[root@hadoopNode1 soft]# yum install mysql-server
[root@hadoopNode1 soft]# systemctl start mysqld
[root@hadoopNode1 soft]# systemctl enable mysqld #启用系统服务
Initialization root user
mysql -u root
mysql>use mysql;
mysql> update user set password=password('123456') where user='root'; //新密码为123456
mysql> select host,user from user;
mysql> GRANT ALL PRIVILEGES ON *.* TO root@"%" IDENTIFIED BY "123456";
mysql> flush privileges;
mysql> exit;
4) mysql database increases hive users and authorized
mysql> create user 'hive' identified by '123456';
mysql> CREATE DATABASE hive;
mysql> use hive;
mysql> GRANT ALL PRIVILEGES ON hive.* TO 'hive'@'%' IDENTIFIED BY '123456';
mysql>GRANT ALL PRIVILEGES ON hive.* TO 'hive'@'master' IDENTIFIED BY '123456'; //写为自己的集群的mysql所在节点
mysql>GRANT ALL PRIVILEGES ON hive.* TO 'hive'@'localhost' IDENTIFIED BY '123456';
mysql> flush privileges;
mysql>exit
systemctl restart mysqld
5) Verify proper configuration MYSQL
MySQL landed in the windows
- tar package:
[ambow@hadoopNode1 hive-2.3.2]$ tar -zxvf apache-hive-2.3.2-bin.tar.gz -C ~/app/
7) configuration environment variable
HIVE_HOME
PATH
HIVE_HOME=/home/ambow/app/hive-2.3.2
HBASE_HOME=/home/ambow/app/hbase-1.3.2
JAVA_HOME=/home/ambow/app/jdk1.8.0_121
HADOOP_HOME=/home/ambow/app/hadoop-2.7.3
PATH=$PATH:$HOME/.local/bin:$HOME/bin:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$ZOOKEEPER_HOME/bin:$HBASE_HOME/bin:$HIVE_HOME/bin
export PATH
export JAVA_HOME
export HADOOP_HOME
export HBASE_HOME
export HIVE_HOME
8) modify the configuration file:
(1) the next {HIVE_HOME / conf} directory, copy hive-env.sh.template
$>cp hive-env.sh.template hive-env.sh
$> Will hive-env.sh
添加:【 export HADOOP_HOME=/home/ambow/app/hadoop 】
(2) entering {HIVE_HOME / conf} directory, copy hive-default.xml.sh.template
[ambow@hadoopNode1 conf]$ cp hive-default.xml.template hive-site.xml
[ambow@hadoopNode1 conf]$ vi hive-site.xml
Hive2.3.2:
<property>
<name>hive.exec.scratchdir</name>
<value>/tmp/hive</value>
<description>HDFS路径,用于存储不同 map/reduce 阶段的执行计划和这些阶段的中间输出结果 会自动创建</description>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>hive</value>
<description>连接HIVE元数据存放的mysql使用用户名</description>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>123456</value>
</property>
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://hadoopNode1:3306/hive?createDatabaseIfNotExist=true</value>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
</property>
<property>
<name>hive.metastore.warehouse.dir</name>
<value>/user/hive/warehouse</value>
<description> Hive 默认的数据文件存储路径,通常为 HDFS 可写的路径 自动创建</description>
</property>
<property>
<name>datanucleus.readOnlyDatastore</name>
<value>false</value>
</property>
<property>
<name>datanucleus.fixedDatastore</name>
<value>false</value>
</property>
<property>
<name>datanucleus.autoCreateSchema</name>
<value>true</value>
</property>
<property>
<name>datanucleus.autoCreateTables</name>
<value>true</value>
</property>
<property>
<name>datanucleus.autoCreateColumns</name>
<value>true</value>
</property>
<property>
<name>hive.metastore.schema.verification</name>
<value>false</value>
</property>
<property>
<name>datanucleus.schema.autoCreateAll</name>
<value>true</value>
<description>creates necessary schema on a startup if one doesn't exist. set this to false, after creating it once
</description>
</property>
<property>
<name>datanucleus.schema.autoCreateAll</name>
<value>true</value>
</property>
9) connected to the driver package MySQL copied to {HIVE_HOME / lib} under
ambow@hadoopNode1 lib]$ cp mysql-connector-java-5.1.34.jar $HIVE_HOME/lib
10) start hadoop cluster
zkServer.sh start 各节点
hadoop-demon.sh start zkfc 两个NN
start-all.sh
11) Start hive client authentication
$> hive
show databases;