大数据生态系统基础： HIVE（一）：HIVE 介绍及安装、配置

Apache Hive数据仓库软件可以使用SQL方便地阅读、编写和管理分布在分布式存储中的大型数据集。结构可以投射到已经存储的数据上。提供了一个命令行工具和JDBC驱动程序来将用户连接到Hive。

基于Hadoop的一个数据仓库工具,可以将结构化的数据文件映射为一张数据库表,并提供简单的sql查询功能,可以将sql语句转换为MapReduce任务进行运行。

其优点是学习成本低,可以通过类SQL语句快速实现简单的MapReduce统计,不必开发专门的MapReduce应用,十分适合数据仓库的统计分析。

因为使用到 JDBC，所以这里使用 Mysql 数据库作为 DB。

一、安装条件

1、Java 1.7

2、Hadoop Hdfs ，创建/tmp 目录

设置hadoop fs -chmod 777 /tmp

3、安装 mysql server

二、安装 mysql

1、下载安装

这里是 Mac OS 环境，所以直接使用 brew 安装，其它 linux 使用 yum install安装即可。

brew install mysql

/usr/local/Cellar/mysql/5.7.19/bin/mysqld --initialize-insecure --user=wangxinnian --basedir=/usr/local/Cellar/mysql/5.

==> Caveats

We've installed your MySQL database without a root password. To secure it run:

mysql_secure_installation

MySQL is configured to only allow connections from localhost by default

2、配置

export MYSQL_HOME=/usr/local/Cellar/mysql/5.7.19/
export PATH=$MYSQL_HOME/bin:$PATH

记得 source ~/.bash_profile 或者source /etc/profile

3、启动 mysql

配置 export

mysql.server start

4、运行 mysql -uroot，能链接上，证明安装成功

三、安装 HIVE

1、使用 brew安装，其它 linux 平台使用 yum install 安装

brew install hive

==> Installing dependencies for hive: hadoop

==> Installing hive dependency: hadoop

==> Using the sandbox

==> Downloading https://www.apache.org/dyn/closer.cgi?path=hadoop/common/hadoop-

==> Best Mirror http://mirrors.tuna.tsinghua.edu.cn/apache/hadoop/common/hadoop-

######################################################################## 100.0%

==> Caveats

In Hadoop's config file:

/usr/local/opt/hadoop/libexec/etc/hadoop/hadoop-env.sh,

/usr/local/opt/hadoop/libexec/etc/hadoop/mapred-env.sh and

/usr/local/opt/hadoop/libexec/etc/hadoop/yarn-env.sh

$JAVA_HOME has been set to be the output of:

/usr/libexec/java_home

==> Summary

�� /usr/local/Cellar/hadoop/2.8.0: 25,169 files, 2.1GB, built in 2 minutes 40 seconds

==> Installing hive

==> Downloading https://www.apache.org/dyn/closer.cgi?path=hive/hive-2.1.1/apach

==> Best Mirror http://mirrors.tuna.tsinghua.edu.cn/apache/hive/hive-2.1.1/apach

######################################################################## 100.0%

==> Caveats

Hadoop must be in your path for hive executable to work.

If you want to use HCatalog with Pig, set $HCAT_HOME in your profile:

export HCAT_HOME=/usr/local/opt/hive/libexec/hcatalog

==> Summary

�� /usr/local/Cellar/hive/2.1.1: 962 files, 148.6MB, built in 3 minutes 18 seconds

2、编辑运行环境

###配置 hive

export HIVE_HOME=/usr/local/opt/hive/libexec

export PATH=$HIVE_HOME/bin:$PATH

记得 source 命令哦

3、配置

进入到 conf目录，对 hive-site.xml 进行配置

   <?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
   <property>
      <name>hive.metastore.local</name>
      <value>true</value>
   </property>
   <property>
      <name>javax.jdo.option.ConnectionURL</name>
      <value>jdbc:mysql://localhost:3306/hive?characterEncoding=UTF-8&amp;useSSL=true</value>
   </property>
   <property>
      <name>javax.jdo.option.ConnectionDriverName</name>
      <value>com.mysql.jdbc.Driver</value>
   </property>
   <property>
      <name>javax.jdo.option.ConnectionUserName</name>
      <value>hadoop</value>
   </property>
   <property>
     <name>javax.jdo.option.ConnectionPassword</name>
     <value>hadoop</value> </property>
####


    <property>
       <name>hive.exec.scratchdir</name>
       <value>/tmp</value>
       <description>HDFS root scratch dir for Hive jobs which gets created with write all (733) permission. For each connecting user, an HDFS scratch dir: ${hive.exec.scratchdir}/&lt;username&gt; is created, with ${hive.scratch.dir.permission}.</description>
     </property>
    <property>
       <name>hive.metastore.warehouse.dir</name>
       <value>/user/hive/warehouse</value>
       <description>location of default database for the warehouse</description>
    </property>
    <property>
       <name>hive.querylog.location</name>
       <value>/user/hive/log</value>
       <description>Location of Hive run time structured log file</description>
    </property>


   <property>
      <name>hive.exec.local.scratchdir</name>
      <value>/tmp</value>
    <description>Local scratch space for Hive jobs</description>
  </property>
<property>
    <name>hive.downloaded.resources.dir</name>
    <value>/tmp</value>
    <description>Temporary local directory for added resources in the remote file system.</description>
</property>


</configuration>

4、运行

初始化操作：schema tool -dbType mysql -initSchema

直接运行 hive，出现 hive> 表明运行正常。

四、测试

1、准备数据

在 home 目录准备 student.txt 文件，内容为：

1001zhangyi

1002 wangxinna

1003 zhangsan

1005 wangwu

2、开始操作

hive> show databases;

OK

default

Time taken: 0.969 seconds, Fetched: 1 row(s)

hive> create database db_hive_test;

OK

Time taken: 0.146 seconds

hive> use db_hive_test;

OK

Time taken: 0.014 seconds

hive> create table student(id int, name string) row format delimited fields terminated by '\t';

OK

Time taken: 0.135 seconds

hive> load data local inpath '/Users/wangxinnian/student.txt' into table db_hive_test.student;

Loading data to table db_hive_test.student

OK

Time taken: 0.345 seconds

hive> select * from student;

OK

1001 zhangyi

1002 wangxinna

1003 zhangsan

1005 wangwu

Time taken: 0.86 seconds, Fetched: 5 row(s)

hive> desc formatted student;

OK

# col_name data_type comment

id int

name string

# Detailed Table Information

Database: db_hive_test

Owner: wangxinnian

CreateTime: Mon Aug 21 12:15:54 CST 2017

LastAccessTime: UNKNOWN

Retention: 0

Location: hdfs://mymac:9000/user/hive/warehouse/db_hive_test.db/student

Table Type: MANAGED_TABLE

Table Parameters:

numFiles 1

numRows 0

rawDataSize 0

totalSize 59

transient_lastDdlTime1503289152

# Storage Information

SerDe Library: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe

InputFormat: org.apache.hadoop.mapred.TextInputFormat

OutputFormat: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat

Compressed: No

Num Buckets: -1

Bucket Columns: []

Sort Columns: []

Storage Desc Params:

field.delim \t

serialization.format\t

Time taken: 0.053 seconds, Fetched: 31 row(s)

hive>

以上操作的内容可以通过 HDFS 查看，通过ui页面查看创建的数据位置http://mymac:50070，选择 Utilities -> Browse the file system, 点 user，点 hive，点 warehouse，能看到 db_hive_test.db,点击可以看到 student表，再点可以看到装入的文件。

当然也可以使用 fs 命令查看

mymac:~ wangxinnian$ hadoop fs -lsr /user

lsr: DEPRECATED: Please use 'ls -R' instead.

drwxr-xr-x - wangxinnian supergroup 0 2017-07-25 11:54 /user/hive

drwxrwxr-x - wangxinnian supergroup 0 2017-07-25 11:54 /user/hive/log

drwxrwxr-x - wangxinnian supergroup 0 2017-07-25 12:40 /user/hive/warehouse

drwxrwxr-x - wangxinnian supergroup 0 2017-07-25 12:41 /user/hive/warehouse/db_hive_test.db

drwxrwxr-x - wangxinnian supergroup 0 2017-07-25 12:59 /user/hive/warehouse/db_hive_test.db/student

-rwxrwxr-x 1 wangxinnian supergroup 54 2017-07-25 12:59 /user/hive/warehouse/db_hive_test.db/student/student.txt

drwxr-xr-x - wangxinnian supergroup 0 2017-07-23 16:25 /user/wangxinnian

mymac:~ wangxinnian$ hadoop fs -cat /user/hive/warehouse/db_hive_test.db/student/student.txt

1001 zhangyi

1002 wangxinna

1003 zhangsan

1005 wangwu

caridle

发布了52 篇原创文章 · 获赞 4 · 访问量 5万+

私信关注

大数据生态系统基础： HIVE（一）：HIVE 介绍及安装、配置

猜你喜欢