Hive安装与知识总结

安装教程hive的安装
hive在大数据生态中扮演着数据库的角色,上层用sql封装,下层运行时转化为mapreudce的任务进行执行.
hive的基本操作和mysql差不多，但还是略有些不同，hive在存储结构化的数据时需要执行分割符．

hive的数据类型

在这里插入图片描述
hive和mysql的不同之处一就是集合类型

在这里插入图片描述
案例　根据如下json格式创建hive表

{
    
    
	"name": "songsong",
	"friends": ["bingbing" , "lili"] ,       //列表Array, 
	"children": {
    
                          //键值Map,
			"xiao song": 18 ,
			"xiaoxiao song": 19
	},
	"address": {
    
                          //结构Struct,
		"	street": "hui long guan" ,
			"city": "beijing" 
		}
]

本地文件存储格式如下

songsong,bingbing_lili,xiao song:18_xiaoxiao song:19,hui long guan_beijingyangyang,caicai_susu,xiao yang:18_xiaoxiao yang:19,chao yang_beijing

hive sql如下

create  table  test(
	name string,
	friends array,
	child map<string,int>,
	address struct<street:string,city:string>
)
row format delimited fields terminated by ','
collection items terminated by '_'
map keys terminated by ':'
lines terminated by '\n';

字段解释　
row format delimited fields terminated by ‘,’ --列分割符
collection items terminated by ‘_’ 　　 --MAP STRUCT 和ARRAY 的分隔符(数据分割符号)
map keys terminated by ‘:’–MAP中的key与value的分隔符
lines terminated by ‘\n’;–行分隔符

导入本地文件到测试表

load data loca inpath '/usr/local/data/text.txt' into table test

访问集合的形式

select friends[1],children['xiao song'],address.city from test where name ="songsong"

修改数据库的信息(dbproperties是描述数据库信息)

alter database db_hive set dbproperties('createtime'='20170830');

创建表的语法

1. CREATE [EXTERNAL] TABLE [IF NOT EXISTS] table_name 
2. [(col_name data_type [COMMENT col_comment], ...)] 
3. [COMMENT table_comment] 　　## 加注释
4. [PARTITIONED BY (col_name data_type [COMMENT col_comment], ...)]   #创建分区表
5. [CLUSTERED BY (col_name, col_name, ...)  #创建分桶表
 6. [SORTED BY (col_name [ASC|DESC], ...)] INTO num_buckets BUCKETS]
 7. [ROW FORMAT row_format]  
 #  ROW FORMAT DELIMITED   [FIELDS   TERMINATED   BY   char]   分割符
 # [COLLECTION   ITEMS TERMINATED BY char]
 # [MAP KEYS TERMINATED BY char] 
 # [LINES TERMINATED BY char] 
 9. [STORED AS file_format] 　　# 存储文件类型
 10. [LOCATION hdfs_path]　　　# 指定表在HDFS上的存储位置
 11. [TBLPROPERTIES (property_name=property_value, ...)]
 12. [AS select_statement]  # 后跟查询语句，根据查询结果创建表

2.　EXTERNAL关键字可以让用户创建一个外部表，在建表的同时可以指定一个指向实际数据的路径（LOCATION），在删除表的时候，内部表的元数据和数据会被一起删除，而外部表只删除元数据，不删除数据。

创建分区表

create table dept_partition(
	deptno int, 
	dname string, 
	loc string
)
partitioned by (month string)
row format delimited fields terminated by '\t';

加载数据到分区表中(default是数据库名)
load data local inpath '/usr/local/datas/dept.txt' into table default.dept_partition partition(month='201709');
创建分区
alter table dept_partition add partition(month='201706') ;
删除多个分区
alter table dept_partition drop partition (month='201705'), partition (month='201706');
查看分区表多少分区
show partitions dept_partition;

创建二级分区

create table dept_partition2(
	deptno int, dname string, loc string
)
partitioned by (month string, day string)
row formatdelimited fields terminated by '\t';

加载数据到二级分区
load data local inpath '/usr/local/datas/dept.txt' into table default.dept_partition2 partition(month='201709', day='13');
查询分区数据
select * from dept_partition2 where month='201709' and day='13';

把数据直接上传到分区目录上，让分区表和数据产生关联的三种方式

上传数据后修复（注意上传数据的位置是hive的默认位置）
　dfs mkdir -p /usr/data/month=201709/day=12;
　dfs -put /usr/local/dept.txt /usr/data/month=201709/day=12;
　查询数据（查询不到）
　select * from dept_partition2 where month=‘201709’ and day=‘12’;
　执行修复命令
　msck repair table dept_partition2;
　再次查询数据可以查询到
上传数据后添加分区
dfs -mkdir -p /user/hive/warehouse/dept_partition2/month=201709/day=11;
dfs -put /opt/module/datas/dept.txt /user/hive/warehouse/dept_partition2/month=201709/day=11;
执行添加分区
alter table dept_partition2 add partition(month='201709',day='11');
可以查询到数据
创建文件夹后load数据到分区
创建目录
dfs -mkdir -p/user/hive/warehouse/dept_partition2/month=201709/day=10;
上传数据
load data local inpath '/opt/module/datas/dept.txt' into table dept_partition2 partition(month='201709',day='10');
可以查询到数据

将内部表改为外部表

alter table student2 set tblproperties('EXTERNAL'='TRUE');