HIVE简单快速入门——[开端篇]

1/ HIVE是什么？
HIVE是一个可以将sql翻译为MR程序的工具
HIVE支持用户将HDFS上的文件映射为表结构，然后用户就可以输入SQL对这些表（HDFS上的文件）进行查询分析
HIVE将用户定义的库、表结构等信息存储hive的元数据库（可以是本地derby，也可以是远程mysql）中

2/ HIVE的用途？
解放大数据分析程序员，不用自己写大量的mr程序来分析数据，只需要写sql脚本即可
HIVE可用于构建大数据体系下的数据仓库

3/ HIVE的使用方式？
方式1：可以交互式查询：
** bin/hive -----> hive>select * from t_test;

** 将hive启动为一个服务： bin/hiveserver ，然后可以在任意一台机器上使用beeline客户端连接hive服务，进行交互式查询

方式2：可以将hive作为命令一次性运行：
** bin/hive -e "sql1;sql2;sql3;sql4"
** 事先将sql语句写入一个文件比如 q.hql ，然后用hive命令执行：　　bin/hive -f q.hql


方式3：可以将方式2写入一个xxx.sh脚本中


4/ HIVE的DDL语法
建库： create database db1; ---> hive就会在/user/hive/warehouse/下建一个文件夹： db1.db
建内部表： use db1;
create table t_test1(id int,name string,age int,create_time bigint)
row format delimited
fields terminated by '\001';

建表后，hive会在仓库目录中建一个表目录： /user/hive/warehouse/db1.db/t_test1

建外部表：
create external table t_test1(id int,name string,age int,create_time bigint)
row format delimited
fields terminated by '\001'
location '/external/t_test';

导入数据：

本质上就是把数据文件放入表目录；
可以用hive命令来做：
hive> load data [local] inpath '/data/path' [overwrite] into table t_test;

**建分区表：
分区的意义在于可以将数据分子目录存储，以便于查询时让数据读取范围更精准；
create table t_test1(id int,name string,age int,create_time bigint)
partitioned by (day string,country string)
row format delimited
fields terminated by '\001';

插入数据到指定分区：
hive> load data [local] inpath '/data/path1' [overwrite] into table t_test partition(day='2017-06-04',country='China');
hive> load data [local] inpath '/data/path2' [overwrite] into table t_test partition(day='2017-06-05',country='China');
hive> load data [local] inpath '/data/path3' [overwrite] into table t_test partition(day='2017-06-04',country='England');

导入完成后，形成的目录结构如下：
/user/hive/warehouse/db1.db/t_test1/day=2017-06-04/country=China/...
/user/hive/warehouse/db1.db/t_test1/day=2017-06-04/country=England/...
/user/hive/warehouse/db1.db/t_test1/day=2017-06-05/country=China/...

HIVE简单快速入门——[开端篇]

猜你喜欢