Article directory
This chapter mainly explains the data classes and field types of hive. For the official website document address, seehttps://cwiki.apache.org/confluence/display/Hive/LanguageManual+Types
1. Hive data types
The data type mainly refers to the type of fields in the table when creating the table, such as int, string, decimal, etc.
create table test_user
(
id int comment '主键',
name string comment '姓名',
score struct<math:int,computer:int>
)
comment '测试用户表'
row format delimited fields terminated by ','
collection items terminated by '_'
lines terminated by '\n';
1.1. Numeric type
Hive data types | Java data types | length | scope | example |
---|---|---|---|---|
TINYINT | byte | 1byte signed integer | -128 to 127 | 10 |
SMALINT | short | 2byte signed integer | -32,768 to 32,767 | 10 |
INT | int | 4byte signed integer | -2,147,483,648 to 2,147,483,647 | 10 |
BIGINT | long | 8byte signed integer | -9,223,372,036,854,775,808 to 9,223,372,036,854,775,807 | 20 |
FLOAT | float | Single precision floating point number | 3.1415 | |
DOUBLE | double | Double precision floating point number | 3.1419 | |
DECIMAL | BigDecimal | 17 bytes | 38 bits, storing decimals | 10.20 |
1.2. Character type
Hive’s payment type is also similar to the relational database MySQL. 实际使用中,String使用得最多
.
type of data | describe |
---|---|
String | When used, it is usually quoted with single quotes (‘’) or double quotes (””). Hive uses c-style escaping in String. |
varchar | Variable length string, the maximum length is 65535 |
char | Fixed-length string, maximum length 255 |
Hive's STRING type is equivalent to the varchar type of the MySQL database. This type is a variable string, but it does not limit the maximum number of characters it can store. In theory, it can store 2GB of characters.
1.3. Date and time type
Timestamp has high precision, and Timestamp precision is 9, which can meet the requirements for time fields. If you want to use date and time calculations, you can use interval.
Hive data types | Java data types | describe |
---|---|---|
TIMESTAMP | Maintains traditional UNIX timestamps with optional nanosecond precision and a precision of 9 | |
Date | Store year, month and day in YYYY-MM-DD format | |
interval | INTERVAL ‘1’ DAY Add 1 day INTERVAL ‘1-2’ YEAR TO MONTH Add 1 year and 2 months |
1.4. Other types
The Boolean type represents true or false.
type of data | describe |
---|---|
Boolean | true/false |
BINARY | byte array |
1.5. Collection data type
Columns in Hive support struct, map, and array collection data types.
type of data | describe | Syntax example |
---|---|---|
STRUCT | Similar to struct in C language, element content can be accessed through "dot" notation. For example, if the data type of a column is STRUCT{first STRING, lastSTRING}, then the first element can be referenced through the field .first. | struct(‘tom’,15) struct<name:string,age:int> |
MAP | MAP is a collection of key-value pair tuples, elements can be accessed by key. For example, if the data type of a column is MAP, where the key->value pairs are 'first'->'John' and 'last'->'Doe', then you can pass the field name ['last'] Get the last element | map<string, int> |
ARRAY | ARRAY is a collection of elements of the same data type that can be accessed through subscripts. For example, there is an ARRAY type variable fruits, which is composed of ['apple', 'orange', 'mango'], then we can access the element orange through fruits[1], because the subscript of the ARRAY type starts from 0 of. | Array(‘John’, ‘Doe’) |
ARRAY and MAP are similar to Array and Map in Java, while STRUCT is similar to Struct in C language. It encapsulates a collection of named fields, and complex data types allow any level of nesting.
1.5.1. Struct example
(1) Suppose there are two pieces of data as follows. For ease of understanding, its data structure is represented in JSON format:
[
{
"stuid": 1,
"stuname":'alan',
"score":{
"math":98,
"computer":89
}
},
{
"stuid": 2,
"stuname":'john',
"score":{
"math":95,
"computer":97
}
}
]
(2) Create a local test file struct.txt in the directory /root/data and save the following data.
1,alan,98_89
2,john,95_97
(3) Create the test table test_struct on Hive
create table test_struct
(
stuid int,
stuname string,
score struct<math:int,computer:int>
)
row format delimited fields terminated by ','
collection items terminated by '_'
lines terminated by '\n';
Field explanation:
row format delimited fields terminated by ',' -- 列分隔符
collection items terminated by '_' -- MAP STRUCT和ARRAY的分隔符(数据分割符号)
lines terminated by '\n'; -- 行分隔符
(4) Next, import the text data in struct.txt into the test table test_struct
load data local inpath '/root/data/struct.txt' into table test_struct;
(5) Access data in table test_struct
select * from test_struct;
(6) Access data in the structure
select stuname,score.math,score.computer from test_struct;
1.5.2. Array example
(1) Suppose there are two pieces of data as follows. For ease of understanding, its data structure is represented in JSON format:
[
{
"stuid": 1,
"stuname":'alan',
"hobbys":["music","sports"]
},
{
"stuid": 2,
"stuname":'john',
"hobbys":["music","travel"]
}
]
(2) Create a local test file array.txt in the directory /root/data and save the following data.
1,alan,music_sports
2,john,music_travel
(3) Create the test table test_array on Hive
create table test_array
(
stuid int,
stuname string,
hobbys array<string>
)
row format delimited fields terminated by ','
collection items terminated by '_'
lines terminated by '\n';
(4) Next, import the text data in array.txt into the test table test_array
load data local inpath '/root/data/array.txt' into table test_array;
(5) Access data in table test_array
select * from test_array;
(6) Access data in the array
set hive.cli.print.header=true;
select stuname,hobbys[0] from test_array;
1.5.3. Map example
(1) Suppose there are two pieces of data as follows. For ease of understanding, its data structure is represented in JSON format:
[
{
"stuid": 1,
"stuname":'alan',
"score":{
"math":98,
"computer":89
}
},
{
"stuid": 2,
"stuname":'john',
"score":{
"math":95,
"computer":97
}
}
]
(2) Create a local test file map.txt in the directory /root/data and save the following data.
1,alan,math:98_computer:89
2,john,math:95_computer:97
3) Create the test table test_map on Hive
create table test_map
(
stuid int,
stuname string,
score map<string,int>
)
row format delimited fields terminated by ','
collection items terminated by '_'
map keys terminated by ':'
lines terminated by '\n';
Field explanation:
row format delimited fields terminated by ',' -- 列分隔符
collection items terminated by '_' --MAP STRUCT 和 ARRAY 的分隔符(数据分割符号)
map keys terminated by ':' -- MAP 中的 key 与 value 的分隔符
lines terminated by '\n'; -- 行分隔符
(4) Next, import the text data in map.txt to the test table test_map
load data local inpath '/root/data/map.txt' into table test_map;
(5) Access data in table test_map
set hive.cli.print.header=true;
select * from test_map;
(6) Access data in map
select stuname,score['math'] as math,score['computer'] as computer from test_map;
2. Data type conversion
Hive's atomic data types can be implicitly converted, similar to Java's type conversion. The principle of conversion is to convert from a type with a small data range to a type with a large data range, or from a type with low data precision to a type with high data precision, to ensure that data and precision are not lost. For example, if an expression uses the BIGINT type, INT will be automatically converted to the BIGINT type, but Hive will not perform the reverse conversion. For example, if an expression uses the INT type, BIGINT is not automatically converted to the INT type, and it returns an error unless a CAST operation is used.
2.1. Implicit conversion
(1) Any integer type can be implicitly converted to a wider type, such as TINYINT can be converted to INT, and INT can be converted to BIGINT.
(2) All integer types, FLOAT and STRING types can be implicitly converted to DOUBLE.
(3) TINYINT, SMALLINT, and INT can all be converted to FLOAT.
(4) The BOOLEAN type cannot be converted to any other type.
2.2. Display conversion
You can use the CAST operation to perform explicit data type conversion. For example, CAST('1' AS INT) will convert the string '1' into the integer 1; if the forced type conversion fails, such as executing CAST('X' AS INT), the expression Returns the empty value NULL.
select '2'+3,cast('2' as int)+1;
3. Use of field types
3.1、DECIMAL(precision,scale)
The DECIMAL type in Hive is based on Java's BigDecimal, which is used to represent immutable arbitrary-precision decimal numbers in Java. All regular numeric operations (e.g. +, -, *, /) and related UDFs (e.g. Floor, Ceil, Round, etc.) handle decimal types. You can convert decimal types to and from decimal types just like you do with other numeric types. The decimal type persistence format supports scientific and non-scientific notation. So, regardless of whether the data set contains 4.004E+3 (scientific notation) or 4004 (non-scientific notation) or a combination of both, DECIMAL can be used with it.
从Hive 0.13开始,用户可以在使用DECIMAL(precision,scale)语法创建DECIMAL数据类型的表时指定scale和precision。 如果未指定小数位数,则默认为0(无小数位数)。如果未指定精度,则默认为10。
CREATE TABLE foo (
a DECIMAL, -- Defaults to decimal(10,0)
b DECIMAL(9, 7)
)
DECIMAL (precision, scale) Description:
precision-precision: The length of the integer + scale (that is, the length of the integer part cannot exceed precision -scale digits)
scale-decimal digits: The length of the decimal part (if the length after the decimal point is less than scale, it will be automatically filled to the scale digits; if the decimal point If the subsequent length is greater than the scale bit, the scale bit will be intercepted and rounded off)
Reference article: https://blog.csdn.net/W_chuanqi/article/details/131101265