A preliminary understanding of the Hive

Because the demand for data to be operated within the hive work, so look at the hive knowledge database, to prevent yourself delete library on foot

hive data types

Integer:

TINYINT micro-byte integer 1

Small Integer SMALLINT 2 bytes

INT 4 bytes Integer

BIGINT 8 bytes

Boolean:

BOOLEAN TRUE/FALSE

Float:

Single-precision FLOAT

DOUBLE double

String type:

No set length STRING

Complex data type:

ARRAY ordered set type field, the field must be the same

MAP an unordered set of key-value pairs

STUCTS a named set of fields

1 CREATE TABLE complex(
2 
3          col1 ARRAY< INT>,
4 
5          col2 MAP< STRING,INT>,
6 
7          col3 STRUCT< a:STRING,b:INT,c:DOUBLE>
8 
9 )

hive storage type

There are two main stores data hive in columns and rows of memory storage

 

hive common storage format

1.textfile

The default format for the row storage format

2.ORCFile 

hive / spark support this storage format, which stores data in the embodiment is the use of block rows, each block is stored in columns, wherein each block is stored an index. It characterized by very high data compression ratio

3.Parquet

Parquet also a storage line, while having good compression properties; also can reduce the number of table scans and deserialization time.

sql how to define the storage format

CREATE TABLE IF NOT EXIST testname(
    ..........
)STORED AS PARQUET

 

Reference https://www.jianshu.com/p/694f044d1c34

Guess you like

Origin www.cnblogs.com/newliu/p/11727789.html