Hive-0.12.0-cdh5.0.1 安装[metasore 内嵌模式、本地模式、远程模式]

概述:

      基于Hadoop的一个数据仓库工具,可以将结构化的数据文件映射为一张数据库表,通过类SQL语句快速实现简单的MapReduce统计.

组成:

(1)用户接口:主要是cli , beeline , hiveserver2 client(thrift客户端);用于接受用户任务。

(2)元数据存储:表结构和元数据存储于关系型数据库中,客户端通过访问metastore 服务获取元数据。

(3)解释器、编译器、优化器、执行器:HQL转换为作业。

(4)Hadoop:数据存储与HDFS中,查询操作转换成MapReduce作业。

安装:

       Hive的安装主要区别在于元数据存储位置,针对不同的元数据存储,分为如下情况:

 

一 Metastore 内嵌模式(embeded)



 

说明:

     本模式使用Derby 服务器存储,能提供单进程存储服务,无法启动多个客户端(注:测试cli不能启动多个,但是使用hiveserver2可以使用多个beeline开多个会话访问),多用户时并发访问,不适合使用

     Derby默认会在调用 hive 命令所在目录metastore_db持久化元数据,建议修改。

准备:

     Hadoop 安装 (略),本次使用的是Hadoop伪分布式模式,保证有如下进程:

  • namenode 
  • datanode
  • resourcemanager
  • nodemanager

安装:

1)解压:tar -xvf hive-0.12.0-cdh5.0.1.tar.gz

2)配置环境变量:vi ~/.bashrc

#Hive
export HIVE_HOME=/home/zero/hive/hive-0.12.0-cdh5.0.1
export PATH=$PATH:$HIVE_HOME/bin

 source ~/.bashrc

3)修改配置文件:

(1)修改hive-env.sh 指定hadoop 位置

cp hive-env.sh.template hive-env.sh

vi hive-env.sh

# Set HADOOP_HOME to point to a specific hadoop install directory
HADOOP_HOME=/home/zero/hadoop/hadoop-2.3.0-cdh5.0.1

 (2)修改hive-site.xml指定仓库目录,元数据存储目录等:

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
     <property>
          <!--hive仓库地址,会使用hadoop的fs.default.name指定的模式存储数据,这里相当于hdfs://mycluster/user/hive/warehouse -->
          <name>hive.metastore.warehouse.dir</name>
          <value>/user/hive/warehouse</value>
     </property>
     <property>
          <!--改属性代替hive.metastore.local为空表示嵌入模式或本地模式,否则为远程模式 -->
          <name>hive.metastore.uris</name>
          <value></value>
     </property>
     <property>
          <!--JDBC URL 使用derby时可以指定metastroe存储位置,否则默认相对与命令路径 -->
          <name>javax.jdo.option.ConnectionURL</name>
          <value>jdbc:derby:;databaseName=/home/zero/hive/hive-0.12.0-cdh5.0.1/metastore_db;create=true</value>
     </property>
     <!-- ================================================================== -->
     <!-- HiveServer2 使用的基于Zookeeper的表锁管理器 -->
     <property>
          <name>hive.support.concurrency</name>
          <description>Enable Hive's Table Lock Manager Service</description>
          <value>true</value>
     </property>
     <!--zookeeper 指定,使用默认端口 -->
     <property>
          <name>hive.zookeeper.quorum</name>
          <value>CentOS-StandAlone</value>
     </property>
     <!--zookeeper 端口指定 -->
     <property>
          <name>hive.zookeeper.client.port</name>
          <value>2181</value>
     </property>
</configuration>

 (3)修改 hive-log4j.properties 指定日志输出路径

cp hive-log4j.properties.template hive-log4j.properties

hive.root.logger=info,DRFA
hive.log.dir=/home/zero/hive/hive-0.12.0-cdh5.0.1/logs

  4)启动cli服务

------------------------------------------------------------------------------------------------------

 [zero@CentOS-StandAlone conf]$ hive

......

hive> create table test (id int);

OK

Time taken: 0.419 seconds

hive> show tables;

OK

test

Time taken: 0.064 seconds, Fetched: 1 row(s)

------------------------------------------------------------------------------------------------------

注:cli服务只能有一个会话使用,另启动一个会话会出现连接不上metastore错误

------------------------------------------------------------------------------------------------------

[zero@CentOS-StandAlone ~]$ hive

......

hive> show tables;

FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.metastore.HiveMetaStoreClient

hive> 

------------------------------------------------------------------------------------------------------

 5)使用beeline

(1)后台启动hiveserver2:nohup hive --service hiveserver2 &

------------------------------------------------------------------------------------------------------

[zero@CentOS-StandAlone ~]$ nohup hive --service hiveserver2 &

[1] 28026

[zero@CentOS-StandAlone ~]$ nohup: ignoring input and appending output to `nohup.out'

^C

[zero@CentOS-StandAlone ~]$ jps -lm

28096 sun.tools.jps.Jps -lm

28026 org.apache.hadoop.util.RunJar /home/zero/hive/hive-0.12.0-cdh5.0.1/lib/hive-service-0.12.0-cdh5.0.1.jar org.apache.hive.service.server.HiveServer2

10594 org.apache.zookeeper.server.quorum.QuorumPeerMain /home/zero/zookeeper/zookeeper-3.4.5-cdh5.0.1/bin/../conf/zoo.cfg

4835 org.apache.hadoop.hdfs.server.namenode.NameNode

5239 org.apache.hadoop.yarn.server.nodemanager.NodeManager

5010 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager

4889 org.apache.hadoop.hdfs.server.datanode.DataNode

------------------------------------------------------------------------------------------------------

注:beeline依赖hiveserver2提供的thirft服务,必须启动

       hiveserver2默认提供端口为1000

(2)启动beeline:

------------------------------------------------------------------------------------------------------

[zero@CentOS-StandAlone conf]$ beeline

Beeline version 0.12.0-cdh5.0.1 by Apache Hive

beeline> !connect jdbc:hive2://localhost:10000

scan complete in 6ms

Connecting to jdbc:hive2://localhost:10000

Enter username for jdbc:hive2://localhost:10000: zero   注这里要写hadoop的用户,否则连接上没有hdfs操作权限,密码没有直接回车即可。

Enter password for jdbc:hive2://localhost:10000: 

SLF4J: Class path contains multiple SLF4J bindings.

SLF4J: Found binding in [jar:file:/home/zero/hadoop/hadoop-2.3.0-cdh5.0.1/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: Found binding in [jar:file:/home/zero/hive/hive-0.12.0-cdh5.0.1/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.

SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]

Connected to: Apache Hive (version 0.12.0-cdh5.0.1)

Driver: Hive JDBC (version 0.12.0-cdh5.0.1)

Transaction isolation: TRANSACTION_REPEATABLE_READ

0: jdbc:hive2://localhost:10000> show tables;

+-----------+

| tab_name  |

+-----------+

| test      |

+-----------+

1 row selected (2.679 seconds)

------------------------------------------------------------------------------------------------------

注:beeline连接到hiveserver2 , hiveserver2 通过api访问metastore,貌似beeline可以起多个会话,不影响。因为hiveserver2服务是唯一的。

二 Metastore 本地模式(local)

 

说明:

       本地模式与内嵌模式最大的区别在与数据库有内嵌于hive服务变成独立部署,hive服务使用jdbc访问元数据,多个服务可以同时访问。

安装:

1)安装mysql (略)

2)安装mysql connector:下载 mysql-connector-java-5.1.31.jar 放置 $HIVE_HOME/lib

3)创建数据库及用户

(1)创建数据库:

------------------------------------------------------------------------------------------------------

[zero@CentOS-StandAlone ~]$ mysql -u root -p -h 127.0.0.1

Enter password:

......

mysql> CREATE DATABASE metastore;

Query OK, 1 row affected (0.11 sec)

 

mysql> USE metastore;

Database changed

 

mysql> SOURCE /home/zero/hive/hive-0.12.0-cdh5.0.1/scripts/metastore/upgrade/mysql/hive-schema-0.12.0.mysql.sql;

------------------------------------------------------------------------------------------------------

 (2)创建hive用户并授权

------------------------------------------------------------------------------------------------------

mysql> CREATE USER 'hive'@'%' IDENTIFIED BY '1234_qwer';

Query OK, 0 rows affected (0.05 sec)

 

mysql> CREATE USER 'hive'@'127.0.0.1' IDENTIFIED BY '1234_qwer';

Query OK, 0 rows affected (0.04 sec)

 

mysql> CREATE USER 'hive'@'CentOS-StandAlone' IDENTIFIED BY '1234_qwer';

Query OK, 0 rows affected (0.01 sec)

 

mysql> REVOKE ALL PRIVILEGES, GRANT OPTION FROM 'hive'@'%';

Query OK, 0 rows affected (0.00 sec)

 

mysql> GRANT SELECT,INSERT,UPDATE,DELETE,LOCK TABLES,EXECUTE ON metastore.* TO 'hive'@'%';

Query OK, 0 rows affected (0.03 sec)

 

mysql> FLUSH PRIVILEGES;

Query OK, 0 rows affected (0.02 sec)

------------------------------------------------------------------------------------------------------

4)解压:同上

5)配置环境变量(同上)

6)修改配置文件:

(1)修改hive-env.sh(同上)

(2)修改hive-site.xml

<?xml version="1.0"?>                                                                                                                                                  
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
        <property>
                <!--hive仓库地址,会使用hadoop的fs.default.name指定的模式存储数据,这里相当于hdfs://mycluster/user/hive/warehouse -->
                <name>hive.metastore.warehouse.dir</name>                                                                                       
                <value>/user/hive/warehouse</value>
        </property>
        <property>
                <!--改属性代替hive.metastore.local为空表示嵌入模式或本地模式,否则为远程模式 -->
                <name>hive.metastore.uris</name>
                <value></value>
        </property>
        <!-- ================================================================== -->
        <!-- 连接mysql -->
        <property>
                <name>javax.jdo.option.ConnectionURL</name>
                <value>jdbc:mysql://CentOS-StandAlone:3306/metastore</value>
                <description>the URL of the MySQL database</description>
        </property>
        <property>
                <name>javax.jdo.option.ConnectionDriverName</name>
                <value>com.mysql.jdbc.Driver</value>
        </property>
        <property>
                <name>javax.jdo.option.ConnectionUserName</name>
                <value>hive</value>
        </property>
        <property>
                <name>javax.jdo.option.ConnectionPassword</name>
                <value>1234_qwer</value>
        </property>
        <property>
                <name>datanucleus.autoCreateSchema</name>
                <value>false</value>
        </property>
        <property>
                <name>datanucleus.fixedDatastore</name>
                <value>true</value>
        </property>
        <property>
                <name>datanucleus.autoStartMechanism</name>
                <value>SchemaTable</value>
        </property>
        <!-- ================================================================== -->
        <!-- HiveServer2 使用的基于Zookeeper的表锁管理器 -->
        <property>
                <name>hive.support.concurrency</name>
                <description>Enable Hive's Table Lock Manager Service</description>
                <value>true</value>
        </property>
        <!--zookeeper 指定,使用默认端口 -->
        <property>
                <name>hive.zookeeper.quorum</name>
                <value>CentOS-StandAlone</value>
        </property>
        <!--zookeeper 端口指定 -->
        <property>
                <name>hive.zookeeper.client.port</name>
                <value>2181</value>
        </property>
</configuration>

 7)启动和使用方法同上

三 Metastroe 远程模式(remote)



 说明:

       远程模式,原内嵌与hive服务的metastore服务独立出来单独运行,hive服务通过thrift访问metastore,这种模式可以控制到数据库的连接等。

 部署规划:

(1)元数据服务器:部署metastore 服务和mysql

(2)hiveserver服务器:部署hiveserver2服务,通过thrift访问metastore

(3)客户服务器:部署hive客户端脚本,可以基于cli或beeline或直接使用thrift访问hiveserver2

 安装:

    安装过程同本地模式,只需要修改本地模式中的hive-site.xml 增加metastroe的访问地址如下,并在不同的服务器上部署。

 

<property>
  <name>hive.metastore.uris</name>
  <value>thrift://CentOS-StandAlone:9083</value>
  <description>IP address (or fully-qualified domain name) and port of the metastore host</description>
</property>
启动:

 (1)元数据服务器:

service mysql start

nohup hive --service metastore &

(2)hiveserver服务器:

nohup hive --service hiveserver2 &

(3)客户端服务器:

脚本:showtables.sql

 

!connect jdbc:hive2://CentOS-StandAlone:10000
zero

show tables;
 其中
第一行:连接
第二行:用户名
第三行:密码
第四行:命令
执行:
------------------------------------------------------------------------------------------------------
[zero@CentOS-StandAlone test]$ beeline -f test.sql
Beeline version 0.12.0-cdh5.0.1 by Apache Hive
beeline> !connect jdbc:hive2://localhost:10000
scan complete in 5ms
Connecting to jdbc:hive2://localhost:10000
Enter username for jdbc:hive2://localhost:10000: zero
Enter password for jdbc:hive2://localhost:10000:
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/home/zero/hadoop/hadoop-2.3.0-cdh5.0.1/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/zero/hive/hive-0.12.0-cdh5.0.1/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
Connected to: Apache Hive (version 0.12.0-cdh5.0.1)
Driver: Hive JDBC (version 0.12.0-cdh5.0.1)
Transaction isolation: TRANSACTION_REPEATABLE_READ
0: jdbc:hive2://localhost:10000> show tables;
+-----------+
| tab_name  |
+-----------+
| test      |
| test2     |
+-----------+
2 rows selected (0.583 seconds)
0: jdbc:hive2://localhost:10000> Closing: org.apache.hive.jdbc.HiveConnection
------------------------------------------------------------------------------------------------------
 其中 第一行:连接 第二行:用户名 第三行:密码 第四行:命令 执行: ------------------------------------------------------------------------------------------------------
[zero@CentOS-StandAlone test]$ beeline -f test.sql
Beeline version 0.12.0-cdh5.0.1 by Apache Hive
beeline> !connect jdbc:hive2://localhost:10000
scan complete in 5ms
Connecting to jdbc:hive2://localhost:10000
Enter username for jdbc:hive2://localhost:10000: zero
Enter password for jdbc:hive2://localhost:10000:
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/home/zero/hadoop/hadoop-2.3.0-cdh5.0.1/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/zero/hive/hive-0.12.0-cdh5.0.1/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
Connected to: Apache Hive (version 0.12.0-cdh5.0.1)
Driver: Hive JDBC (version 0.12.0-cdh5.0.1)
Transaction isolation: TRANSACTION_REPEATABLE_READ
0: jdbc:hive2://localhost:10000> show tables;
+-----------+
| tab_name  |
+-----------+
| test      |
| test2     |
+-----------+
2 rows selected (0.583 seconds)
0: jdbc:hive2://localhost:10000> Closing: org.apache.hive.jdbc.HiveConnection
[zero@CentOS-StandAlone test]$ beeline -f test.sql Beeline version 0.12.0-cdh5.0.1 by Apache Hive beeline> !connect jdbc:hive2://localhost:10000 scan complete in 5ms Connecting to jdbc:hive2://localhost:10000 Enter username for jdbc:hive2://localhost:10000: zero Enter password for jdbc:hive2://localhost:10000: SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/home/zero/hadoop/hadoop-2.3.0-cdh5.0.1/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/home/zero/hive/hive-0.12.0-cdh5.0.1/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] Connected to: Apache Hive (version 0.12.0-cdh5.0.1) Driver: Hive JDBC (version 0.12.0-cdh5.0.1) Transaction isolation: TRANSACTION_REPEATABLE_READ 0: jdbc:hive2://localhost:10000> show tables; +-----------+ | tab_name  | +-----------+ | test      | | test2     | +-----------+ 2 rows selected (0.583 seconds) 0: jdbc:hive2://localhost:10000> Closing: org.apache.hive.jdbc.HiveConnection ------------------------------------------------------------------------------------------------------  

猜你喜欢

转载自shihlei.iteye.com/blog/2089174