大数据生态系统基础: HIVE(一):HIVE 介绍及安装、配置

       Apache Hive数据仓库软件可以使用SQL方便地阅读、编写和管理分布在分布式存储中的大型数据集。结构可以投射到已经存储的数据上。提供了一个命令行工具和JDBC驱动程序来将用户连接到Hive。

    基于Hadoop的一个数据仓库工具,可以将结构化的数据文件映射为一张数据库表,并提供简单的sql查询功能,可以将sql语句转换为MapReduce任务进行运行。 

     其优点是学习成本低,可以通过类SQL语句快速实现简单的MapReduce统计,不必开发专门的MapReduce应用,十分适合数据仓库的统计分析。 


      因为使用到 JDBC,所以这里使用 Mysql 数据库作为 DB。

一、安装条件

     1、Java 1.7

     2、Hadoop Hdfs ,创建/tmp 目录

          设置hadoop fs -chmod 777 /tmp

     3、安装 mysql server

二、安装 mysql 

        1、下载安装

        这里是 Mac OS 环境,所以直接使用 brew 安装,其它 linux 使用 yum install安装即可。

           

brew install mysql

/usr/local/Cellar/mysql/5.7.19/bin/mysqld --initialize-insecure --user=wangxinnian --basedir=/usr/local/Cellar/mysql/5.

==> Caveats

We've installed your MySQL database without a root password. To secure it run:

    mysql_secure_installation

MySQL is configured to only allow connections from localhost by default



2、配置

 export MYSQL_HOME=/usr/local/Cellar/mysql/5.7.19/
 export PATH=$MYSQL_HOME/bin:$PATH

 记得 source ~/.bash_profile 或者source /etc/profile

         3、启动 mysql
               配置 export 
              
               mysql.server start

         4、运行 mysql -uroot,能链接上,证明安装成功


三、安装 HIVE 

       1、使用 brew安装,其它 linux 平台使用 yum install 安装

          

brew install hive

==> Installing dependencies for hive: hadoop

==> Installing hive dependency: hadoop

==> Using the sandbox

==> Downloading https://www.apache.org/dyn/closer.cgi?path=hadoop/common/hadoop-

==> Best Mirror http://mirrors.tuna.tsinghua.edu.cn/apache/hadoop/common/hadoop-

######################################################################## 100.0%

==> Caveats

In Hadoop's config file:

  /usr/local/opt/hadoop/libexec/etc/hadoop/hadoop-env.sh,

  /usr/local/opt/hadoop/libexec/etc/hadoop/mapred-env.sh and

  /usr/local/opt/hadoop/libexec/etc/hadoop/yarn-env.sh

$JAVA_HOME has been set to be the output of:

  /usr/libexec/java_home

==> Summary

��  /usr/local/Cellar/hadoop/2.8.0: 25,169 files, 2.1GB, built in 2 minutes 40 seconds

==> Installing hive

==> Downloading https://www.apache.org/dyn/closer.cgi?path=hive/hive-2.1.1/apach

==> Best Mirror http://mirrors.tuna.tsinghua.edu.cn/apache/hive/hive-2.1.1/apach

######################################################################## 100.0%

==> Caveats

Hadoop must be in your path for hive executable to work.


If you want to use HCatalog with Pig, set $HCAT_HOME in your profile:

  export HCAT_HOME=/usr/local/opt/hive/libexec/hcatalog

==> Summary

��  /usr/local/Cellar/hive/2.1.1: 962 files, 148.6MB, built in 3 minutes 18 seconds

             2、编辑运行环境

   ###配置 hive

  export HIVE_HOME=/usr/local/opt/hive/libexec

  export PATH=$HIVE_HOME/bin:$PATH
  记得 source 命令哦

3、配置

      进入到 conf目录,对 hive-site.xml 进行配置

   

   <?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
   <property>
      <name>hive.metastore.local</name>
      <value>true</value>
   </property>
   <property>
      <name>javax.jdo.option.ConnectionURL</name>
      <value>jdbc:mysql://localhost:3306/hive?characterEncoding=UTF-8&amp;useSSL=true</value>
   </property>
   <property>
      <name>javax.jdo.option.ConnectionDriverName</name>
      <value>com.mysql.jdbc.Driver</value>
   </property>
   <property>
      <name>javax.jdo.option.ConnectionUserName</name>
      <value>hadoop</value>
   </property>
   <property>
     <name>javax.jdo.option.ConnectionPassword</name>
     <value>hadoop</value> </property>
####


    <property>
       <name>hive.exec.scratchdir</name>
       <value>/tmp</value>
       <description>HDFS root scratch dir for Hive jobs which gets created with write all (733) permission. For each connecting user, an HDFS scratch dir: ${hive.exec.scratchdir}/&lt;username&gt; is created, with ${hive.scratch.dir.permission}.</description>
     </property>
    <property>
       <name>hive.metastore.warehouse.dir</name>
       <value>/user/hive/warehouse</value>
       <description>location of default database for the warehouse</description>
    </property>
    <property>
       <name>hive.querylog.location</name>
       <value>/user/hive/log</value>
       <description>Location of Hive run time structured log file</description>
    </property>


   <property>
      <name>hive.exec.local.scratchdir</name>
      <value>/tmp</value>
    <description>Local scratch space for Hive jobs</description>
  </property>
<property>
    <name>hive.downloaded.resources.dir</name>
    <value>/tmp</value>
    <description>Temporary local directory for added resources in the remote file system.</description>
</property>


</configuration>

4、运行

      初始化操作:schema tool -dbType mysql -initSchema

      直接运行 hive,出现 hive> 表明运行正常。

四、测试       

      1、准备数据       

      在 home 目录准备 student.txt 文件,内容为:

          1001zhangyi

1002 wangxinna

1003 zhangsan

1005 wangwu

     2、开始操作

 hive> show databases;

OK

default

Time taken: 0.969 seconds, Fetched: 1 row(s)

hive> create database db_hive_test;

OK

Time taken: 0.146 seconds

hive> use db_hive_test;

OK

Time taken: 0.014 seconds

hive> create table student(id int, name string) row format delimited fields terminated by '\t';

OK

Time taken: 0.135 seconds

hive> load data local inpath '/Users/wangxinnian/student.txt' into table db_hive_test.student;

Loading data to table db_hive_test.student

OK

Time taken: 0.345 seconds

hive> select * from student;

OK

1001 zhangyi

1002 wangxinna

1003 zhangsan

1005 wangwu

Time taken: 0.86 seconds, Fetched: 5 row(s)

hive> desc formatted student;

OK

# col_name             data_type          comment

id                   int

name                 string

# Detailed Table Information

Database:           db_hive_test

Owner:               wangxinnian

CreateTime:         Mon Aug 21 12:15:54 CST 2017

LastAccessTime:     UNKNOWN

Retention:           0

Location:           hdfs://mymac:9000/user/hive/warehouse/db_hive_test.db/student

Table Type:         MANAGED_TABLE

Table Parameters:

numFiles            1

numRows             0

rawDataSize         0

totalSize           59

transient_lastDdlTime1503289152



# Storage Information

SerDe Library:       org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe

InputFormat:         org.apache.hadoop.mapred.TextInputFormat

OutputFormat:       org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat

Compressed:         No

Num Buckets:         -1

Bucket Columns:     []

Sort Columns:       []

Storage Desc Params:

field.delim         \t

serialization.format\t

Time taken: 0.053 seconds, Fetched: 31 row(s)

hive>

         以上操作的内容可以通过 HDFS 查看,通过ui页面查看创建的数据位置http://mymac:50070选择 Utilities -> Browse the file system, user,点 hive,点 warehouse,能看到 db_hive_test.db,点击可以看到 student表,再点可以看到装入的文件。

         

当然也可以使用 fs 命令查看
 

mymac:~ wangxinnian$ hadoop fs -lsr /user

lsr: DEPRECATED: Please use 'ls -R' instead.

drwxr-xr-x   - wangxinnian supergroup          0 2017-07-25 11:54 /user/hive

drwxrwxr-x   - wangxinnian supergroup          0 2017-07-25 11:54 /user/hive/log

drwxrwxr-x   - wangxinnian supergroup          0 2017-07-25 12:40 /user/hive/warehouse

drwxrwxr-x   - wangxinnian supergroup          0 2017-07-25 12:41 /user/hive/warehouse/db_hive_test.db

drwxrwxr-x   - wangxinnian supergroup          0 2017-07-25 12:59 /user/hive/warehouse/db_hive_test.db/student

-rwxrwxr-x   1 wangxinnian supergroup         54 2017-07-25 12:59 /user/hive/warehouse/db_hive_test.db/student/student.txt

drwxr-xr-x   - wangxinnian supergroup          0 2017-07-23 16:25 /user/wangxinnian

mymac:~ wangxinnian$ hadoop fs -cat /user/hive/warehouse/db_hive_test.db/student/student.txt

1001 zhangyi

1002 wangxinna

1003 zhangsan

1005 wangwu






发布了52 篇原创文章 · 获赞 4 · 访问量 5万+

猜你喜欢

转载自blog.csdn.net/caridle/article/details/77447921