安装教程hive的安装
hive在大数据生态中扮演着数据库的角色,上层用sql封装,下层运行时转化为mapreudce的任务进行执行.
hive的基本操作和mysql差不多,但还是略有些不同,hive在存储结构化的数据时需要执行分割符.
hive的数据类型
hive和mysql的不同之处一就是集合类型
案例 根据如下json格式创建hive表
{
"name": "songsong",
"friends": ["bingbing" , "lili"] , //列表Array,
"children": {
//键值Map,
"xiao song": 18 ,
"xiaoxiao song": 19
},
"address": {
//结构Struct,
" street": "hui long guan" ,
"city": "beijing"
}
]
本地文件存储格式如下
songsong,bingbing_lili,xiao song:18_xiaoxiao song:19,hui long guan_beijingyangyang,caicai_susu,xiao yang:18_xiaoxiao yang:19,chao yang_beijing
hive sql如下
create table test(
name string,
friends array,
child map<string,int>,
address struct<street:string,city:string>
)
row format delimited fields terminated by ','
collection items terminated by '_'
map keys terminated by ':'
lines terminated by '\n';
字段解释
row format delimited fields terminated by ‘,’ --列分割符
collection items terminated by ‘_’ --MAP STRUCT 和ARRAY 的分隔符(数据分割符号)
map keys terminated by ‘:’–MAP中的key与value的分隔符
lines terminated by ‘\n’;–行分隔符
导入本地文件到测试表
load data loca inpath '/usr/local/data/text.txt' into table test
访问集合的形式
select friends[1],children['xiao song'],address.city from test where name ="songsong"
修改数据库的信息(dbproperties是描述数据库信息)
alter database db_hive set dbproperties('createtime'='20170830');
创建表的语法
1. CREATE [EXTERNAL] TABLE [IF NOT EXISTS] table_name
2. [(col_name data_type [COMMENT col_comment], ...)]
3. [COMMENT table_comment] ## 加注释
4. [PARTITIONED BY (col_name data_type [COMMENT col_comment], ...)] #创建分区表
5. [CLUSTERED BY (col_name, col_name, ...) #创建分桶表
6. [SORTED BY (col_name [ASC|DESC], ...)] INTO num_buckets BUCKETS]
7. [ROW FORMAT row_format]
# ROW FORMAT DELIMITED [FIELDS TERMINATED BY char] 分割符
# [COLLECTION ITEMS TERMINATED BY char]
# [MAP KEYS TERMINATED BY char]
# [LINES TERMINATED BY char]
9. [STORED AS file_format] # 存储文件类型
10. [LOCATION hdfs_path] # 指定表在HDFS上的存储位置
11. [TBLPROPERTIES (property_name=property_value, ...)]
12. [AS select_statement] # 后跟查询语句,根据查询结果创建表
2. EXTERNAL关键字可以让用户创建一个外部表,在建表的同时可以指定一个指向实际数据的路径(LOCATION),在删除表的时候,内部表的元数据和数据会被一起删除,而外部表只删除元数据,不删除数据。
创建分区表
create table dept_partition(
deptno int,
dname string,
loc string
)
partitioned by (month string)
row format delimited fields terminated by '\t';
加载数据到分区表中(default是数据库名)
load data local inpath '/usr/local/datas/dept.txt' into table default.dept_partition partition(month='201709');
创建分区
alter table dept_partition add partition(month='201706') ;
删除多个分区
alter table dept_partition drop partition (month='201705'), partition (month='201706');
查看分区表多少分区
show partitions dept_partition;
创建二级分区
create table dept_partition2(
deptno int, dname string, loc string
)
partitioned by (month string, day string)
row formatdelimited fields terminated by '\t';
加载数据到二级分区
load data local inpath '/usr/local/datas/dept.txt' into table default.dept_partition2 partition(month='201709', day='13');
查询分区数据
select * from dept_partition2 where month='201709' and day='13';
把数据直接上传到分区目录上,让分区表和数据产生关联的三种方式
- 上传数据后修复(注意上传数据的位置是hive的默认位置)
dfs mkdir -p /usr/data/month=201709/day=12;
dfs -put /usr/local/dept.txt /usr/data/month=201709/day=12;
查询数据(查询不到)
select * from dept_partition2 where month=‘201709’ and day=‘12’;
执行修复命令
msck repair table dept_partition2;
再次查询数据可以查询到 - 上传数据后添加分区
dfs -mkdir -p /user/hive/warehouse/dept_partition2/month=201709/day=11;
dfs -put /opt/module/datas/dept.txt /user/hive/warehouse/dept_partition2/month=201709/day=11;
执行添加分区
alter table dept_partition2 add partition(month='201709',day='11');
可以查询到数据 - 创建文件夹后load数据到分区
创建目录
dfs -mkdir -p/user/hive/warehouse/dept_partition2/month=201709/day=10;
上传数据
load data local inpath '/opt/module/datas/dept.txt' into table dept_partition2 partition(month='201709',day='10');
可以查询到数据
将内部表改为外部表
alter table student2 set tblproperties('EXTERNAL'='TRUE');