Basic and complex data types of Hive

 

1. The basic data types

Hive Data Types

Java data types

length

example

TINYINT

byte

1byte signed integer

20

SMALINT

short

2byte signed integer

20

INT

int

4byte signed integer

20

BIGINT

long

8byte signed integer

20

BOOLEAN

boolean

Boolean, true or false

TRUE  FALSE

FLOAT

float

Single-precision floating-point number

3.14159

DOUBLE

double

Double-precision floating-point number

3.14159

STRING

string

Character series. You can specify the character set. You may be used single or double quotes.

‘now is the time’ “for all good men”

TIMESTAMP

 

Time Type

 

BINARY

 

Byte array

 

    Note: Hive of type String equivalent of varchar type database, which is a variable of type string, but it can not declare which can hold up to the number of characters, in theory, it can store 2GB of characters.

 

2. collection of data types

 

type of data

description

Syntax Example

STRUCT

And similar c language struct, are available through the "dot" notation access element content. For example, if the data type of a column is STRUCT {first STRING, last STRING}, then the first element 1 can be referenced by a field .first.

struct()

MAP

MAP is a set of keys - tuples of values ​​using array notation can access the data. For example, if the data type of a column is the MAP, wherein the key -> value pair is the 'first' -> 'John' and 'last' -> 'Doe', it may be obtained by a last name field [ 'last'] element

map()

ARRAY

An array is a collection of variables having the same name and type. These variables are called elements of the array, each array element is given a number, numbers from scratch. For example, an array value of [ 'John', 'Doe'], then the second element can be referenced by the name of the array [1].

Array()

    注:Hive有三种复杂数据类型ARRAY、MAP 和 STRUCT。ARRAY和MAP与Java中的Array和Map类似,而STRUCT与C语言中的Struct类似,它封装了一个命名字段集合,复杂数据类型允许任意层次的嵌套。

示例:

    1.创建本地测试文件 test.txt

  

 
 

 songsong,bingbing_lili,xiao song:18_xiaoxiao song:19,hui long guan_beijing

yangyang,caicai_susu,xiao yang:18_xiaoxiao yang:19,chao yang_beijing

注意:MAP,STRUCT和ARRAY里的元素间关系都可以用同一个字符表示,这里用“_”。

    2.Hive上创建测试表test

create table test(
name string,
friends array<string>,
children map<string, int>,
address struct<street:string, city:string>
)
row format delimited 
fields terminated
by ',' collection items terminated by '_' map keys terminated by ':' lines terminated by '\n';

 

 

字段解释:分隔符均是自己定义,不要重复!同种类型的数据分隔符只能指定一次:如map指定了为冒号 ‘:’,之后用map分隔符必须使用冒号

row format delimited

fields terminated by ','     -- 逗号 为列分隔符

collection items terminated by '_'        --    MAP STRUCT 和 ARRAY 的分隔符(数据分割符号) ,它们都是容器

map keys terminated by ':'                          --   MAP中的key与value的分隔符

lines terminated by '\n';                              --  行分隔符

 

     3.导入文本数据到测试表

hive (default)> load data local inpath ‘/opt/module/datas/test.txt’into table test;

 

    4.访问三种集合列里的数据,以下分别是ARRAY,MAP,STRUCT的访问方式(各个数据类型查询也不同!

hive (default)> select friends[1],children['xiao song'],address.city from test
where name="songsong";
OK
_c0     _c1     city
lili    18      beijing
Time taken: 0.076 seconds, Fetched: 1 row(s)

  

 

 

 

 

 

Guess you like

Origin www.cnblogs.com/Mark-blog/p/11872853.html