1, the basic data types
Hive Data Types |
Java data types |
length |
example |
TINYINT |
byte |
1byte signed integer |
20 |
SMALINT |
short |
2byte signed integer |
20 |
INT |
int |
4byte signed integer |
20 |
BIGINT |
long |
8byte signed integer |
20 |
BOOLEAN |
boolean |
Boolean, true or false |
TRUE FALSE |
FLOAT |
float |
Single-precision floating-point number |
3.14159 |
DOUBLE |
double |
Double-precision floating-point number |
3.14159 |
STRING |
string |
Character series. You can specify the character set. You may be used single or double quotes. |
‘now is the time’ “for all good men” |
TIMESTAMP |
|
Time Type |
|
BINARY |
|
Byte array |
|
For a String Hive equivalent varchar type database, which is a variable of type string, but it can not declare which can hold up to the number of characters, in theory, it can store 2GB of characters.
2, a set of data types
type of data |
description |
Syntax Example |
STRUCT |
And similar c language struct, are available through the "dot" notation access element content. For example, if the data type of a column is STRUCT {first STRING, last STRING}, then the first element 1 can be referenced by a field .first. |
struct() |
MAP |
MAP is a set of keys - tuples of values using array notation can access the data. For example, if the data type of a column is the MAP, wherein the key -> value pair is the 'first' -> 'John' and 'last' -> 'Doe', it may be obtained by a last name field [ 'last'] element |
map() |
ARRAY |
An array is a collection of variables having the same name and type. These variables are called elements of the array, each array element is given a number, numbers from scratch. For example, an array value of [ 'John', 'Doe'], then the second element can be referenced by the name of the array [1]. |
Array() |
There are three types of complex data Hive ARRAY, MAP and STRUCT. ARRAY and Java with the MAP Map Array and the like, and the C language STRUCT Struct Similarly, it encapsulates a collection of named fields, complex data type allows any level of nesting.
The actual case are as follows
1) Suppose the following table for a row, we use JSON format to represent its data structure. Format is accessible at Hive
{ "name": "songsong", "friends": ["bingbing" , "lili"] , //列表Array, "Children": {// keys Map, "xiao song": 18 , "xiaoxiao song": 19 } "Address": {// structure Struct, "street": "hui long guan" , "city": "beijing" } } |
2) Based on the above data structures, we have created a corresponding table in Hive and import data.
Create a local test file test.txt
songsong, bingbing_lili, xiao song: 18_xiaoxiao song: 19, hui long guan_beijing yangyang, caicai_susu, xiao yang: 18_xiaoxiao yang: 19, chao yang_beijing |
Note : between MAP, STRUCT and ARRAY element in the relationship can be represented by the same character, here "_."
3)Hive上创建测试表test
create table test(
name string,
friends array<string>,
children map<string, int>,
address struct<street:string, city:string>
)
row format delimited
fields terminated by ','
collection items terminated by '_'
map keys terminated by ':'
lines terminated by '\n';
字段解释:
row format delimited fields terminated by ',' -- 列分隔符
collection items terminated by '_' --MAP STRUCT 和 ARRAY 的分隔符(数据分割符号)
map keys terminated by ':' -- MAP中的key与value的分隔符
lines terminated by '\n'; -- 行分隔符
4)导入文本数据到测试表 或者直接put到这个目录也可以
hive > load data local inpath ‘/opt/module/datas/test.txt’into table test
5)访问三种集合列里的数据,以下分别是ARRAY,MAP,STRUCT的访问方式
hive > select friends[1],children['xiao song'],address.city from test
where name="songsong";
OK
_c0 _c1 city
lili 18 beijing
Time taken: 0.076 seconds, Fetched: 1 row(s)
3、类型转化
Hive的原子数据类型是可以进行隐式转换的,类似于Java的类型转换,例如某表达式使用INT类型,TINYINT会自动转换为INT类型,但是Hive不会进行反向转化,例如,某表达式使用TINYINT类型,INT不会自动转换为TINYINT类型,它会返回错误,除非使用CAST操作
1.隐式类型转换规则如下
(1)任何整数类型都可以隐式地转换为一个范围更广的类型,如TINYINT可以转换成INT,INT可以转换成BIGINT。
(2)所有整数类型、FLOAT和STRING类型都可以隐式地转换成DOUBLE。
(3)TINYINT、SMALLINT、INT都可以转换为FLOAT。
(4)BOOLEAN类型不可以转换为任何其它的类型。
2.可以使用CAST操作显示进行数据类型转换
例如CAST('1' AS INT)将把字符串'1' 转换成整数1;如果强制类型转换失败,如执行CAST('X' AS INT),表达式返回空值 NULL。