1. The basic data types
Hive Data Types |
Java data types |
length |
example |
TINYINT |
byte |
1byte signed integer |
20 |
SMALINT |
short |
2byte signed integer |
20 |
INT |
int |
4byte signed integer |
20 |
BIGINT |
long |
8byte signed integer |
20 |
BOOLEAN |
boolean |
Boolean, true or false |
TRUE FALSE |
FLOAT |
float |
Single-precision floating-point number |
3.14159 |
DOUBLE |
double |
Double-precision floating-point number |
3.14159 |
STRING |
string |
Character series. You can specify the character set. You may be used single or double quotes. |
‘now is the time’ “for all good men” |
TIMESTAMP |
|
Time Type |
|
BINARY |
|
Byte array |
|
Note: Hive of type String equivalent of varchar type database, which is a variable of type string, but it can not declare which can hold up to the number of characters, in theory, it can store 2GB of characters.
2. collection of data types
type of data |
description |
Syntax Example |
STRUCT |
And similar c language struct, are available through the "dot" notation access element content. For example, if the data type of a column is STRUCT {first STRING, last STRING}, then the first element 1 can be referenced by a field .first. |
struct() |
MAP |
MAP is a set of keys - tuples of values using array notation can access the data. For example, if the data type of a column is the MAP, wherein the key -> value pair is the 'first' -> 'John' and 'last' -> 'Doe', it may be obtained by a last name field [ 'last'] element |
map() |
ARRAY |
An array is a collection of variables having the same name and type. These variables are called elements of the array, each array element is given a number, numbers from scratch. For example, an array value of [ 'John', 'Doe'], then the second element can be referenced by the name of the array [1]. |
Array() |
注:Hive有三种复杂数据类型ARRAY、MAP 和 STRUCT。ARRAY和MAP与Java中的Array和Map类似,而STRUCT与C语言中的Struct类似,它封装了一个命名字段集合,复杂数据类型允许任意层次的嵌套。
示例:
1.创建本地测试文件 test.txt
songsong,bingbing_lili,xiao song:18_xiaoxiao song:19,hui long guan_beijing
yangyang,caicai_susu,xiao yang:18_xiaoxiao yang:19,chao yang_beijing
注意:MAP,STRUCT和ARRAY里的元素间关系都可以用同一个字符表示,这里用“_”。
2.Hive上创建测试表test
create table test(
name string,
friends array<string>,
children map<string, int>,
address struct<street:string, city:string>
)
row format delimited
fields terminated by ','
collection items terminated by '_'
map keys terminated by ':'
lines terminated by '\n';
字段解释:分隔符均是自己定义,不要重复!同种类型的数据分隔符只能指定一次:如map指定了为冒号 ‘:’,之后用map分隔符必须使用冒号
row format delimited
fields terminated by ',' -- 逗号 为列分隔符
collection items terminated by '_' -- MAP STRUCT 和 ARRAY 的分隔符(数据分割符号) ,它们都是容器
map keys terminated by ':' -- MAP中的key与value的分隔符
lines terminated by '\n'; -- 行分隔符
3.导入文本数据到测试表
hive (default)> load data local inpath ‘/opt/module/datas/test.txt’into table test;
4.访问三种集合列里的数据,以下分别是ARRAY,MAP,STRUCT的访问方式(各个数据类型查询也不同!)
hive (default)> select friends[1],children['xiao song'],address.city from test where name="songsong"; OK _c0 _c1 city lili 18 beijing Time taken: 0.076 seconds, Fetched: 1 row(s)