Apache Hive数据仓库软件可以使用SQL方便地阅读、编写和管理分布在分布式存储中的大型数据集。结构可以投射到已经存储的数据上。提供了一个命令行工具和JDBC驱动程序来将用户连接到Hive。
基于Hadoop的一个数据仓库工具,可以将结构化的数据文件映射为一张数据库表,并提供简单的sql查询功能,可以将sql语句转换为MapReduce任务进行运行。
其优点是学习成本低,可以通过类SQL语句快速实现简单的MapReduce统计,不必开发专门的MapReduce应用,十分适合数据仓库的统计分析。
因为使用到 JDBC,所以这里使用 Mysql 数据库作为 DB。
一、安装条件
1、Java 1.7
2、Hadoop Hdfs ,创建/tmp 目录
设置hadoop fs -chmod 777 /tmp
3、安装 mysql server
二、安装 mysql
1、下载安装
这里是 Mac OS 环境,所以直接使用 brew 安装,其它 linux 使用 yum install安装即可。
brew install mysql
/usr/local/Cellar/mysql/5.7.19/bin/mysqld --initialize-insecure --user=wangxinnian --basedir=/usr/local/Cellar/mysql/5.
==> Caveats
We've installed your MySQL database without a root password. To secure it run:
mysql_secure_installation
2、配置MySQL is configured to only allow connections from localhost by default
export MYSQL_HOME=/usr/local/Cellar/mysql/5.7.19/
export PATH=$MYSQL_HOME/bin:$PATH记得 source ~/.bash_profile 或者source /etc/profile
4、运行 mysql -uroot,能链接上,证明安装成功
三、安装 HIVE
1、使用 brew安装,其它 linux 平台使用 yum install 安装
brew install hive
==> Installing dependencies for hive: hadoop
==> Installing hive dependency: hadoop
==> Using the sandbox
==> Downloading https://www.apache.org/dyn/closer.cgi?path=hadoop/common/hadoop-
==> Best Mirror http://mirrors.tuna.tsinghua.edu.cn/apache/hadoop/common/hadoop-
######################################################################## 100.0%
==> Caveats
In Hadoop's config file:
/usr/local/opt/hadoop/libexec/etc/hadoop/hadoop-env.sh,
/usr/local/opt/hadoop/libexec/etc/hadoop/mapred-env.sh and
/usr/local/opt/hadoop/libexec/etc/hadoop/yarn-env.sh
$JAVA_HOME has been set to be the output of:
/usr/libexec/java_home
==> Summary
�� /usr/local/Cellar/hadoop/2.8.0: 25,169 files, 2.1GB, built in 2 minutes 40 seconds
==> Installing hive
==> Downloading https://www.apache.org/dyn/closer.cgi?path=hive/hive-2.1.1/apach
==> Best Mirror http://mirrors.tuna.tsinghua.edu.cn/apache/hive/hive-2.1.1/apach
######################################################################## 100.0%
==> Caveats
Hadoop must be in your path for hive executable to work.
If you want to use HCatalog with Pig, set $HCAT_HOME in your profile:
export HCAT_HOME=/usr/local/opt/hive/libexec/hcatalog
==> Summary
�� /usr/local/Cellar/hive/2.1.1: 962 files, 148.6MB, built in 3 minutes 18 seconds
###配置 hiveexport HIVE_HOME=/usr/local/opt/hive/libexecexport PATH=$HIVE_HOME/bin:$PATH记得 source 命令哦3、配置
进入到 conf目录,对 hive-site.xml 进行配置
<?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <configuration> <property> <name>hive.metastore.local</name> <value>true</value> </property> <property> <name>javax.jdo.option.ConnectionURL</name> <value>jdbc:mysql://localhost:3306/hive?characterEncoding=UTF-8&useSSL=true</value> </property> <property> <name>javax.jdo.option.ConnectionDriverName</name> <value>com.mysql.jdbc.Driver</value> </property> <property> <name>javax.jdo.option.ConnectionUserName</name> <value>hadoop</value> </property> <property> <name>javax.jdo.option.ConnectionPassword</name> <value>hadoop</value> </property> #### <property> <name>hive.exec.scratchdir</name> <value>/tmp</value> <description>HDFS root scratch dir for Hive jobs which gets created with write all (733) permission. For each connecting user, an HDFS scratch dir: ${hive.exec.scratchdir}/<username> is created, with ${hive.scratch.dir.permission}.</description> </property> <property> <name>hive.metastore.warehouse.dir</name> <value>/user/hive/warehouse</value> <description>location of default database for the warehouse</description> </property> <property> <name>hive.querylog.location</name> <value>/user/hive/log</value> <description>Location of Hive run time structured log file</description> </property> <property> <name>hive.exec.local.scratchdir</name> <value>/tmp</value> <description>Local scratch space for Hive jobs</description> </property> <property> <name>hive.downloaded.resources.dir</name> <value>/tmp</value> <description>Temporary local directory for added resources in the remote file system.</description> </property> </configuration>
4、运行
初始化操作:schema tool -dbType mysql -initSchema
直接运行 hive,出现 hive> 表明运行正常。
四、测试
1、准备数据
在 home 目录准备 student.txt 文件,内容为:
1001zhangyi
1002 wangxinna
1003 zhangsan
1005 wangwu
2、开始操作
hive> show databases;
OK
default
Time taken: 0.969 seconds, Fetched: 1 row(s)
hive> create database db_hive_test;
OK
Time taken: 0.146 seconds
hive> use db_hive_test;
OK
Time taken: 0.014 seconds
hive> create table student(id int, name string) row format delimited fields terminated by '\t';
OK
Time taken: 0.135 seconds
hive> load data local inpath '/Users/wangxinnian/student.txt' into table db_hive_test.student;
Loading data to table db_hive_test.student
OK
Time taken: 0.345 seconds
hive> select * from student;
OK
1001 zhangyi
1002 wangxinna
1003 zhangsan
1005 wangwu
Time taken: 0.86 seconds, Fetched: 5 row(s)
hive> desc formatted student;
OK
# col_name data_type comment
id int
name string
# Detailed Table Information
Database: db_hive_test
Owner: wangxinnian
CreateTime: Mon Aug 21 12:15:54 CST 2017
LastAccessTime: UNKNOWN
Retention: 0
Location: hdfs://mymac:9000/user/hive/warehouse/db_hive_test.db/student
Table Type: MANAGED_TABLE
Table Parameters:
numFiles 1
numRows 0
rawDataSize 0
totalSize 59
transient_lastDdlTime1503289152
# Storage Information
SerDe Library: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
InputFormat: org.apache.hadoop.mapred.TextInputFormat
OutputFormat: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
Compressed: No
Num Buckets: -1
Bucket Columns: []
Sort Columns: []
Storage Desc Params:
field.delim \t
serialization.format\t
Time taken: 0.053 seconds, Fetched: 31 row(s)
hive>
以上操作的内容可以通过 HDFS 查看,通过ui页面查看创建的数据位置http://mymac:50070,选择 Utilities -> Browse the file system, 点 user,点 hive,点 warehouse,能看到 db_hive_test.db,点击可以看到 student表,再点可以看到装入的文件。
mymac:~ wangxinnian$ hadoop fs -lsr /user
lsr: DEPRECATED: Please use 'ls -R' instead.
drwxr-xr-x - wangxinnian supergroup 0 2017-07-25 11:54 /user/hive
drwxrwxr-x - wangxinnian supergroup 0 2017-07-25 11:54 /user/hive/log
drwxrwxr-x - wangxinnian supergroup 0 2017-07-25 12:40 /user/hive/warehouse
drwxrwxr-x - wangxinnian supergroup 0 2017-07-25 12:41 /user/hive/warehouse/db_hive_test.db
drwxrwxr-x - wangxinnian supergroup 0 2017-07-25 12:59 /user/hive/warehouse/db_hive_test.db/student
-rwxrwxr-x 1 wangxinnian supergroup 54 2017-07-25 12:59 /user/hive/warehouse/db_hive_test.db/student/student.txt
drwxr-xr-x - wangxinnian supergroup 0 2017-07-23 16:25 /user/wangxinnian
mymac:~ wangxinnian$ hadoop fs -cat /user/hive/warehouse/db_hive_test.db/student/student.txt
1001 zhangyi
1002 wangxinna
1003 zhangsan
1005 wangwu