hive官网链接点击打开链接
摘要
hive的基础数据类型
基本类型
tinyint smallint int bigint boolean float double string
复杂类型
array type map type struct typeCreate/Drop/Alter/Use Database
创建数据库
CREATE (DATABASE|SCHEMA) [IF NOT EXISTS] database_name
[COMMENT database_comment]
[LOCATION hdfs_path]
[WITH DBPROPERTIES (property_name=property_value, ...)];
例子
create database if not exists mybase
comment "this is mybase"
location "hdfs:/usre/";
location 默认,在hive-site.xml中,由参数hive.metastore.warehouse.dir指定。默认值为/user/hive/warehouse.
CREATE DATABASE <DB_NAME> WITH DBPROPERTIES ('key1' = 'value1', 'key2' = 'value2');
The
DESC DATABASE EXTENDED <DB_NAME>;
删除数据库
Drop Database
DROP (DATABASE|SCHEMA) [IF EXISTS] database_name [RESTRICT|CASCADE];
例子
drop database if exists mybase cascade;
默认情况下,Hive不允许删除一个里面有表存在的数据库,如果想删除数据库,要么先将数据库中的表全部删除,要么可以使用CASCADE关键字,使用该关键字后,Hive会自己将数据库下的表全部删除。RESTRICT关键字就是默认情况,即如果有表存在,则不允许删除数据库。
修改数据库
Alter Database
ALTER (DATABASE|SCHEMA) database_name SET DBPROPERTIES (property_name=property_value, ...); -- (Note: SCHEMA added in Hive
0.14
.
0
)
ALTER (DATABASE|SCHEMA) database_name SET OWNER [USER|ROLE] user_or_role; -- (Note: Hive
0.13
.
0
and later; SCHEMA added in Hive
0.14
.
0
)
ALTER (DATABASE|SCHEMA) database_name SET LOCATION hdfs_path; -- (Note: Hive
2.2
.
1
,
2.4
.
0
and later)
|
The uses of SCHEMA and DATABASE are interchangeable – they mean the same thing. ALTER SCHEMA was added in Hive 0.14 (HIVE-6601).
The ALTER DATABASE ... SET LOCATION statement does not move the contents of the database's current directory to the newly specified location. It does not change the locations associated with any tables/partitions under the specified database. It only changes the default parent-directory where new tables will be added for this database. This behaviour is analogous to how changing a table-directory does not move existing partitions to a different location.
No other metadata about a database can be changed.
说明:改变数据库……SET LOCATION语句不会将数据库当前目录的内容移动到新指定的位置。它不会更改与指定数据库下的任何表/分区相关联的位置。它只更改将为该数据库添加新表的默认父目录。这种行为与更改表目录不将现有分区移动到不同位置类似
使用数据库
Use Database
USE database_name;
USE DEFAULT;
查看数据库
show databases;
查看数据库具体描述
desc database databasename;
desc database extended databasename; 查看带有with dbproperties的数据库的详细信息
----------------------------------------------------------------------------------------------------------------------------------
Create/Drop/Truncate Table
创建表
CREATE [TEMPORARY] [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.]table_name -- (Note: TEMPORARY available in Hive
0.14
.
0
and later)
[(col_name data_type [COMMENT col_comment], ... [constraint_specification])]
[COMMENT table_comment]
[PARTITIONED BY (col_name data_type [COMMENT col_comment], ...)]
[CLUSTERED BY (col_name, col_name, ...) [SORTED BY (col_name [ASC|DESC], ...)] INTO num_buckets BUCKETS]
[STORED AS DIRECTORIES]
[
[ROW FORMAT row_format]
[STORED AS file_format]
| STORED BY
'storage.handler.class.name'
[WITH SERDEPROPERTIES (...)] -- (Note: Available in Hive
0.6
.
0
and later)
]
[LOCATION hdfs_path]
[AS select_statement]; -- (Note: Available in Hive
0.5
.
0
and later; not supported
for
external tables)
CREATE [TEMPORARY] [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.]table_name
LIKE existing_table_or_view_name
[LOCATION hdfs_path];
data_type
: primitive_type
| array_type
| map_type
| struct_type
| union_type -- (Note: Available in Hive
0.7
.
0
and later)
primitive_type
: TINYINT
| SMALLINT
| INT
| BIGINT
| BOOLEAN
| FLOAT
| DOUBLE
| DOUBLE PRECISION -- (Note: Available in Hive
2.2
.
0
and later)
| STRING
| BINARY -- (Note: Available in Hive
0.8
.
0
and later)
| TIMESTAMP -- (Note: Available in Hive
0.8
.
0
and later)
array_type
: ARRAY < data_type >
map_type
: MAP < primitive_type, data_type >
struct_type
: STRUCT < col_name : data_type [COMMENT col_comment], ...>
union_type
: UNIONTYPE < data_type, data_type, ... > -- (Note: Available in Hive
0.7
.
0
and later)
row_format
: DELIMITED [FIELDS TERMINATED BY
char
[ESCAPED BY
char
]] [COLLECTION ITEMS TERMINATED BY
char
]
[MAP KEYS TERMINATED BY
char
] [LINES TERMINATED BY
char
]
| SERDE serde_name [WITH SERDEPROPERTIES (property_name=property_value, property_name=property_value, ...)]
file_format:
: SEQUENCEFILE
| TEXTFILE -- (Default, depending on hive.
default
.fileformat configuration)
| RCFILE -- (Note: Available in Hive
0.6
.
0
and later)
| INPUTFORMAT input_format_classname OUTPUTFORMAT output_format_classname
constraint_specification:
: [, PRIMARY KEY (col_name, ...) DISABLE NOVALIDATE ]
例子
数据
1,xiaoming,book-tv-code,beijing:sanyuanqiao-shanghai:pudong
2,zhangxianyu,game-code,beijing:tiananmen-shanghai:hupu
3,xiaowang,daiwa-tv,shenyang:hepng-huoxing:xxx
hql
建表方法一
create table psn1 (
id int ,
name string,
likes array<string>,
address map<string,string>
)
row format delimited
fields terminated by ','
collection items terminated by '-'
map keys terminated by ':';
建表方法二
create table psn2 like psn1;
如果psn1有数据,这条语句也只是创建psn1的表结构到psn2,不复制数据
建表方法三
create table psn3
as
select id,name,address from psn2;
常在创建中间表时候用,数据也会到新表中去
---------------------------------------------------------------------------------------------------------------------------------
创建外部表
需要指定文件存储位置
create external table psn11 (
id int ,
name string,
likes array<string>,
address map<string,string>
)
row format delimited
fields terminated by ','
collection items terminated by '-'
map keys terminated by ':'
location '/user/xxx';
外部表 在drop操作的时候不会删除数据文件
内部表会
---------------------------------------------------------------------------------------------------------------------------------
插入数据:基本不用insert 因为一条数据生成一个mr的任务,麻烦
There are multiple ways to modify data in Hive:
load
在将数据加载到表中时,Hive不做任何转换。负载操作目前是纯复制/移动操作,将数据文件移动到与Hive表对应的位置。
LOAD DATA [LOCAL] INPATH
'filepath'
[OVERWRITE] INTO TABLE tablename [PARTITION (partcol1=val1, partcol2=val2 ...)]
----------------------------------------------------------------------------------------------------------------------------------
分区
分区字段不能在表里,load data 是 也需要指定分区字段
添加分区只是修改你的元数据信息
create table comouter (
id int,
name string comment 'this is name',
intr array<string>,
detail map<string,string>
)
comment 'computer'
partitioned by (price float)
row format delimited
fields terminated by ','
collection items terminated by '-'
map keys terminated by ':';
测试
目标表已经分区,load的时候需要提供分区字段
load data local inpath '/home/data' into table computer partition (price=1111);
Add Partitions
ALTER TABLE table_name ADD [IF NOT EXISTS] PARTITION partition_spec [LOCATION
'location'
][, PARTITION partition_spec [LOCATION
'location'
], ...];
partition_spec:
: (partition_column = partition_col_value, partition_column = partition_col_value, ...)
Drop Partitions
ALTER TABLE table_name DROP [IF EXISTS] PARTITION partition_spec[, PARTITION partition_spec, ...]
[IGNORE PROTECTION] [PURGE]; -- (Note: PURGE available in Hive
1.2
.
0
and later, IGNORE PROTECTION not available
2.0
.
0
and later)
: (partition_column = partition_col_value, partition_column = partition_col_value, ...)
常用的插入操作
from psn1
insert into table psn7
select count(name);