Hive 基本操作总结

使用Hive之前要做的操作：
(1) 启动dfs
linux命令：# start-dfs.sh
(2) 启动yarn
linux命令：# start-yarn.sh
(3)进入hive
linux命令：# hive

如下图所示：
在这里插入图片描述

创建表
create table t_order(id int, name string, rongliang string, price double)
ROW FORMAT DELIMITED #表明一行是一条记录
FIELDS TERMINATED BY ‘\t’#表示字段间用tab隔开
STORED AS SEQUENCEFILE; #可以保存为二进制文件也可以保存为普通文本文件，不写的话默认是普通的文本
向表中导入数据
load data local inpath ‘/usr/local/hadoop/hiveData/xxx.txt’ into table t_order;
查询表中所有数据
select * from t_order;
创建外部表
create external table stubak (id int, name string)
row format delimited
fields terminated by ‘\t’
location ‘/stubak’;#指定外部表用的数据所在位置
内部表 && 外部表
无external修饰的是内部表（managed table），被external修饰的为外部表（external table）；
区别：
（1）内部表数据由Hive自身管理；外部表数据由HDFS管理。
（2）内部表数据存储的位置是hive.metastore.warehouse.dir（默认：/user/hive/warehouse）；外部表数据的存储位置由自己制定。
（3）删除内部表会直接删除元数据（metadata）及存储数据；删除外部表仅仅会删除元数据，HDFS上的文件并不会被删除；
（4）对内部表的修改会将修改直接同步给元数据，而对外部表的表结构和分区进行修改，则需要修复（MSCK REPAIR TABLE table_name;）
创建分区表
普通表和分区表区别：有大量数据增加的需要建分区表
create table book (id bigint, name string)
partitioned by (pubdate string)
row format delimited
fields terminated by ‘\t’;
根据select语句建表结构，并且里面有数据（经常作为中间表来用）
CREATE TABLE tab_ip_ctas
AS
SELECT id new_id, name new_name, ip new_ip,country new_country
FROM tab_ip_ext
SORT BY new_id;
通过select语句批量插入数据到别的表
insert overwrite table tab_ip_like
select * from tab_ip;
将查询结果写入到指定的路径中
insert overwrite local directory ‘/home/hadoop/hivetemp/test.txt’ select *
from tab_ip_part where part_flag=‘part1’;
array类型表
create table tab_array(a array,b array)
row format delimited
fields terminated by ‘\t’#字段间用‘\t’分隔
collection items terminated by ‘,’;#数组内容用逗号分隔
map类型表
create table tab_map(name string,info map<string,string>)
row format delimited
fields terminated by ‘\t’
collection items terminated by ‘,’
map keys terminated by ‘:’;#map中的键和健值用：分隔
通过shell执行hive的批量hql语句（类似于sql里面的存储过程）
hive -S -e ‘select country,count(*) from tab_ext’ > /home/hadoop/hivetemp/e.txt
select * from tab_ext sort by id desc limit 5;
select a.ip,b.book from tab_ext a join tab_ip_book b on(a.name=b.name);

Hive 基本操作 总结

猜你喜欢

Hive 基本操作总结