环境：Vmware 11， Ubuntu 15.10， Hadoop 2.7.1

Hive 构建在基于静态批处理的 Hadoop 之上，Hadoop 通常都有较高的延迟并且在作业提交和调度的时候需要大量的开销。因此，Hive 不适合在大规模数据集上实现低延迟快速的查询。例如，联机事务处理（OLTP）。Hive 查询操作过程严格遵守 Hadoop MapReduce 的作业执行模型，Hive 将用户的 HiveQL 语句通过解释器转换为 MapReduce 作业提交到 Hadoop 集群上，Hadoop 监控作业执行过程，然后返回作业执行结果给用户。

Hive 的最佳使用场合是大数据集的批处理作业，例如，网络日志分析。

Hive默认使用的是内嵌的Derby数据库，但是此数据库每次只能访问一个文件，所以需要使用其它关系型数据库替代Derby，此处演示了Mysql和PostgreSQL的安装。

1.1 安装Mysql

下载：sudo apt-get install mysql-server mysql-client libmysqlclient-dev，在安装过程中会提示输入密码，记得输入。

enter description here

检查安装是否成功：sudo netstat -tap | grep mysql，如果安装成功。

enter description here

或者输入sudo service mysql status查看：

enter description here

使用mysql -u root -p登录Mysql数据库

GRANT ALL PRIVILEGES ON *.* TO ‘root’@’%’ WITH GRANT OPTION;
或
GRANT ALL PRIVILEGES ON *.* TO ‘ROOT’@’%’ IDENTIFIED BY ‘ROOTPASSWD’ WITH GRANT OPTION;
此处 % 表示任意host
创建==hive==用户和==hive==数据库

use mysql
insert into user(host, user, password) values(‘%’, ‘hive’, password(‘hive’)); // host 值为%时即可以使用该用户进行远程访问mysql

enter description here

修改mysqld.cnf配置文件sudo vim /etc/mysql/mysql.conf.d/mysqld.cnf，将bind-address = 127.0.0.1注释掉，或者修改成bind-address=0.0.0.0。需要重启mysql：sudo service mysql restart，之后就可以进行远程访问mysql了

可以发现，mysql进程的Host发生了变化
创建系统用户以便用shell登录数据库

sudo adduser hive

enter description here

彻底删除MysqL

sudo apt-get autoremove –purge mysql-server
sudo apt-get remove mysql-common // 这个很重要
dpkg -l |grep ^rc|awk ‘{print $2}’ |sudo xargs dpkg -P // 清理残留数据

1.2 安装PostgreSQL

下载sudo apt-get install postgresl-9.4
使用默认用户登录sudo -u postgres psql postgres，创建Adminpack：CREATE EXTENSION adminpack;

enter description here

使用password postgres，设置默认用户==postgres==的密码，使用q退出pgsql shell：

enter description here

创建数据库用户==hive==（密码为：hive）和==hive==数据库

sudo -u postgres createuser -d -P -A hive
sudo -u postgres createdb -O hive hive // 第一个hive为用户名

enter description here

创建系统用户以便用shell登录数据库

sudo adduser hive
配置pg_hba.conf和postgresql.conf

sudo vi /etc/postgresql/9.4/main/pg_hba.conf
sudo vi /etc/postgresql/9.4/main/postgresql.conf

192.168.8.0是本机IP，其他项也要修改成md5验证

PostgreSQL默认只能本机访问，所以要配置listen_address为*

重启服务，全配置生效

sudo systemctl restart postgresql
或者
sudo service postgresql restart

参考：
http://www.unixmen.com/install-postgresql-9-4-and-phppgadmin-on-ubuntu-15-10/
https://help.ubuntu.com/lts/serverguide/mysql.html

2 安装Hive

环境变量设置：
enter description here

2.1 下载Mysql/PostgreSQL的JDBC包

http://dev.mysql.com/downloads/connector/j/
https://jdbc.postgresql.org/download.html
将驱动包放到Hive的==lib==目录下，此处演示的是Mysql

2.2 Hive配置

进入Hive的==conf==，复制以下几个文件：

cp hive-default.xml.template hive-site.xml
cp hive-env.sh.template hive-env.sh

配置hive-env.sh

      
      
      
      
      
      
       
       # Set HADOOP_HOME to point to a specific hadoop install directory
      
      
      
      
       
       HADOOP_HOME=/usr/
       
       local/hadoop
      
      
      
      
      
      
       
       # Hive Configuration Directory can be controlled by:
      
      
      
      
       
       export HIVE_CONF_DIR=/usr/
       
       local/hive/conf
      
      
      
      
      
      
       
       # Folder containing extra ibraries required for hive compilation/execution can be controlled by:
      
      
      
      
       
       export HIVE_AUX_JARS_PATH=/usr/
       
       local/hive/lib

配置hive-site.xml
Hive会加载两个文件hive-default.xml，一个是hive-site.xml文件，如果两个文件的配置参数不一致时，Hive会以用户配置的hive-site.xml为准。所以我们可以在hive-site.xml只保留下面这些参数

      
      
       
       xml version="1.0"?>
      
      
      
      
       
       xml-stylesheet type="text/xsl" href="configuration.xsl"?>
      
      
      
      
       
       <configuration>
      
      
      
          
       
       <property>
      
      
      
              
       
       <name>hive.metastore.warehouse.dir
       
       </name>
      
      
      
              
       
       <value>/user/hive/warehouse
       
       </value>
      
      
      
              
       
       <description>location of default database for the warehouse
       
       </description>
      
      
      
          
       
       </property>
      
      
      
          
       
       <property>
      
      
      
              
       
       <name>javax.jdo.option.ConnectionURL
       
       </name>
      
      
      
              
       
       <value>jdbc:mysql://node1:3306/hive?characterEncoding=UTF-8
       
       </value>
      
      
      
          
       
       </property>
      
      
      
          
       
       <property>
      
      
      
              
       
       <name>javax.jdo.option.ConnectionDriverName
       
       </name>
      
      
      
              
       
       <value>com.mysql.jdbc.Driver
       
       </value>
      
      
      
          
       
       </property>
      
      
      
          
       
       <property>
      
      
      
              
       
       <name>javax.jdo.option.ConnectionUserName
       
       </name>
      
      
      
              
       
       <value>hive
       
       </value>
      
      
      
          
       
       </property>
      
      
      
          
       
       <property>
      
      
      
              
       
       <name>javax.jdo.option.ConnectionPassword
       
       </name>
      
      
      
              
       
       <value>hive
       
       </value>
      
      
      
          
       
       </property>
      
      
      
      
       
       </configuration>

2.3 创建必要的目录

hdfs dfs -mkdir /user/
hdfs dfs -mkdir /user/hive/
hdfs dfs -mkdir /user/hive/warehouse
hdfs dfs -mkdir /tmp/
hdfs dfs -mkdir /tmp/hive
hdfs dfs -chmod 777 /user/hive/warehouse
hdfs dfs -chmod 777 /tmp/hive

记得修改文件夹权限，否则在JDBC访问时，可能会出现如下错误：

FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:Got exception: org.apache.hadoop.security.AccessControlException Permission denied: user=anonymous, access=WRITE, inode=”/user/hive/warehouse/loginfo”:phoenix:supergroup:drwxr-xr-x

enter description here
==loginfo==是使用JDBC创建的Hive表

2.3 启动

hive start

会抛出下面错误：
Exception in thread “main” java.lang.RuntimeException: Hive metastore database is not initialized. Please use schematool (e.g. ./schematool -initSchema -dbType …) to create the schema. If needed, don’t forget to include the option to auto-create the underlying database in your JDBC connection string (e.g. ?createDatabaseIfNotExist=true for mysql)

由于Hive默认的是 Derby 数据库，所以先要初始化Hive：

      
      
       
       schematool -dbType mysql -initSchema
      
      
      
      
      
      
       
       # 如果初始化过其他类型数据库，先删除/tmp/{usr_name}/metastore_db 文件夹，否则地初始化失败
      
      
      
      
       
       rm -R metastore_db

enter description here

Mysql的Hive库里生成一些元数据表

enter description here

JDBC接口

配置Hadoop的core-site.xml，以便允许匿名访问Hadoop

      
      
       
       // 将phoenix替换为你需要的用户名即可
      
      
      
      
       
       <property>
      
      
      
          
       
       <name>hadoop.proxyuser.phoenix.hosts
       
       </name>
      
      
      
          
       
       <value>*
       
       </value>
      
      
      
      
       
       </property>
      
      
      
      
       
       <property>
      
      
      
          
       
       <name>hadoop.proxyuser.phoenix.groups
       
       </name>
      
      
      
          
       
       <value>*
       
       </value>
      
      
      
      
       
       </property>

否则会遇到如下错误：
org.apache.hadoop.ipc.RemoteException: User: phoenix is not allowed to impersonate anonymous

设置完成后需要重启hadoop，然后运行==hive/bin==下的hiveserver2

在Eclipse项目中的包如下图产，这些jar包可以在==hive/lib==目录下找到，==slf4j==这两个包可以在hadoop中找到：

enter description here

测试类：

      
      
       
       import java.sql.Connection;
      
      
      
      
       
       import java.sql.DriverManager;
      
      
      
      
       
       import java.sql.ResultSet;
      
      
      
      
       
       import java.sql.SQLException;
      
      
      
      
       
       import java.sql.Statement;
      
      
      
      
      
      
       
       public 
       
       class  {
      
      
      
      	
       
       private 
       
       static String driverName = 
       
       "org.apache.hive.jdbc.HiveDriver";
      
      
      
      
      
      	
       
       public static void main(String[] args) throws SQLException {
      
      
      
      		
       
       try {
      
      
      
      
       
       			Class.forName(driverName);
      
      
      
      
       
       		} 
       
       catch (ClassNotFoundException e) {
      
      
      
      
       
       			e.printStackTrace();
      
      
      
      
       
       			System.exit(
       
       1);
      
      
      
      
       
       		}
      
      
      
      
      
      
       
       		Connection con = DriverManager.getConnection(
       
       "jdbc:hive2://master:10000/default", 
       
       "", 
       
       "");
      
      
      
      
       
       		Statement stmt = con.createStatement();
      
      
      
      
       
       		String tableName = 
       
       "loginfo";
      
      
      
      
       
       		stmt.execute(
       
       "drop table if exists " + tableName);
      
      
      
      
       
       		stmt.execute(
       
       "create table " + tableName + 
       
       " (key int, value string)");
      
      
      
      
       
       		System.out.println(
       
       "Create table success!");
      
      
      
      		
      
      
      
      
       
       		String sql = 
       
       "show tables '" + tableName + 
       
       "'";
      
      
      
      
       
       		System.out.println(
       
       "Running: " + sql);
      
      
      
      
       
       		ResultSet res = stmt.executeQuery(sql);
      
      
      
      		
       
       if (res.next()) {
      
      
      
      
       
       			System.out.println(res.getString(
       
       1));
      
      
      
      
       
       		}
      
      
      
      
       
       	}
      
      
      
      
       
       }

参考：
https://www.shiyanlou.com/courses/document/766
http://blog.csdn.net/nengyu/article/details/51620760
http://www.cnblogs.com/linjiqin/archive/2013/03/04/2943025.html

原文:大专栏 Hive 2.0.1 的安装部署

Hive 2.0.1 的安装部署

1.1 安装Mysql

1.2 安装PostgreSQL

2 安装Hive

2.1 下载Mysql/PostgreSQL的JDBC包

2.2 Hive配置

2.3 创建必要的目录

2.3 启动

JDBC接口

猜你喜欢