Hive database series--Hive data type/Hive field type/Hive type conversion

This chapter mainly explains the data classes and field types of hive. For the official website document address, seehttps://cwiki.apache.org/confluence/display/Hive/LanguageManual+Types

1. Hive data types

The data type mainly refers to the type of fields in the table when creating the table, such as int, string, decimal, etc.

create table test_user
(
    id   int  comment '主键',
    name string comment '姓名',
    score   struct<math:int,computer:int>
)
comment '测试用户表'
row format delimited fields terminated by ','
collection items terminated by '_'
lines terminated by '\n';

1.1. Numeric type

Hive data types Java data types length scope example
TINYINT byte 1byte signed integer -128 to 127 10
SMALINT short 2byte signed integer -32,768 to 32,767 10
INT int 4byte signed integer -2,147,483,648 to 2,147,483,647 10
BIGINT long 8byte signed integer -9,223,372,036,854,775,808 to 9,223,372,036,854,775,807 20
FLOAT float Single precision floating point number 3.1415
DOUBLE double Double precision floating point number 3.1419
DECIMAL BigDecimal 17 bytes 38 bits, storing decimals 10.20

1.2. Character type

Hive’s payment type is also similar to the relational database MySQL. 实际使用中,String使用得最多.

type of data describe
String When used, it is usually quoted with single quotes (‘’) or double quotes (””). Hive uses c-style escaping in String.
varchar Variable length string, the maximum length is 65535
char Fixed-length string, maximum length 255

Hive's STRING type is equivalent to the varchar type of the MySQL database. This type is a variable string, but it does not limit the maximum number of characters it can store. In theory, it can store 2GB of characters.

1.3. Date and time type

Timestamp has high precision, and Timestamp precision is 9, which can meet the requirements for time fields. If you want to use date and time calculations, you can use interval.

Hive data types Java data types describe
TIMESTAMP Maintains traditional UNIX timestamps with optional nanosecond precision and a precision of 9
Date Store year, month and day in YYYY-MM-DD format
interval INTERVAL ‘1’ DAY Add 1 day
INTERVAL ‘1-2’ YEAR TO MONTH Add 1 year and 2 months

1.4. Other types

The Boolean type represents true or false.

type of data describe
Boolean true/false
BINARY byte array

1.5. Collection data type

Columns in Hive support struct, map, and array collection data types.

type of data describe Syntax example
STRUCT Similar to struct in C language, element content can be accessed through "dot" notation. For example, if the data type of a column is STRUCT{first STRING, lastSTRING}, then the first element can be referenced through the field .first. struct(‘tom’,15) struct<name:string,age:int>
MAP MAP is a collection of key-value pair tuples, elements can be accessed by key. For example, if the data type of a column is MAP, where the key->value pairs are 'first'->'John' and 'last'->'Doe', then you can pass the field name ['last'] Get the last element map<string, int>
ARRAY ARRAY is a collection of elements of the same data type that can be accessed through subscripts. For example, there is an ARRAY type variable fruits, which is composed of ['apple', 'orange', 'mango'], then we can access the element orange through fruits[1], because the subscript of the ARRAY type starts from 0 of. Array(‘John’, ‘Doe’)

ARRAY and MAP are similar to Array and Map in Java, while STRUCT is similar to Struct in C language. It encapsulates a collection of named fields, and complex data types allow any level of nesting.

1.5.1. Struct example

(1) Suppose there are two pieces of data as follows. For ease of understanding, its data structure is represented in JSON format:

[
{
    
    
	"stuid": 1,
	"stuname":'alan',
	"score":{
    
    
		"math":98,
		"computer":89
	}
},
{
    
    
	"stuid": 2,
	"stuname":'john',
	"score":{
    
    
		"math":95,
		"computer":97
	}
}
]

(2) Create a local test file struct.txt in the directory /root/data and save the following data.

1,alan,98_89
2,john,95_97

Insert image description here
(3) Create the test table test_struct on Hive

create table test_struct
(
    stuid   int,
    stuname string,
    score   struct<math:int,computer:int>
)
    row format delimited fields terminated by ','
        collection items terminated by '_'
        lines terminated by '\n';

Insert image description here
Field explanation:

row format delimited fields terminated by ',' -- 列分隔符
collection items terminated by '_' -- MAP STRUCT和ARRAY的分隔符(数据分割符号)
lines terminated by '\n'; -- 行分隔符

(4) Next, import the text data in struct.txt into the test table test_struct

load data local inpath '/root/data/struct.txt' into table test_struct;

(5) Access data in table test_struct

select * from test_struct;

Insert image description here
(6) Access data in the structure

select stuname,score.math,score.computer from test_struct;

Insert image description here

1.5.2. Array example

(1) Suppose there are two pieces of data as follows. For ease of understanding, its data structure is represented in JSON format:

[
{
    
    
	"stuid": 1,
	"stuname":'alan',
	"hobbys":["music","sports"]
},
{
    
    
	"stuid": 2,
	"stuname":'john',
	"hobbys":["music","travel"]
}
]

(2) Create a local test file array.txt in the directory /root/data and save the following data.

1,alan,music_sports
2,john,music_travel

Insert image description here
(3) Create the test table test_array on Hive

create table test_array
(
    stuid   int,
    stuname string,
    hobbys  array<string>
)
    row format delimited fields terminated by ','
        collection items terminated by '_'
        lines terminated by '\n';

(4) Next, import the text data in array.txt into the test table test_array

load data local inpath '/root/data/array.txt' into table test_array;

(5) Access data in table test_array

select * from test_array;

Insert image description here
(6) Access data in the array

set hive.cli.print.header=true;
select stuname,hobbys[0] from test_array;

Insert image description here

1.5.3. Map example

(1) Suppose there are two pieces of data as follows. For ease of understanding, its data structure is represented in JSON format:

[
{
    
    
	"stuid": 1,
	"stuname":'alan',
	"score":{
    
    
		"math":98,
		"computer":89
	}
},
{
    
    
	"stuid": 2,
	"stuname":'john',
	"score":{
    
    
		"math":95,
		"computer":97
	}
}
]

(2) Create a local test file map.txt in the directory /root/data and save the following data.

1,alan,math:98_computer:89
2,john,math:95_computer:97

Insert image description here

3) Create the test table test_map on Hive

create table test_map
(
    stuid   int,
    stuname string,
    score   map<string,int>
)
    row format delimited fields terminated by ','
        collection items terminated by '_'
        map keys terminated by ':'
        lines terminated by '\n';

Field explanation:

row format delimited fields terminated by ',' -- 列分隔符
collection items terminated by '_' --MAP STRUCT 和 ARRAY 的分隔符(数据分割符号)
map keys terminated by ':' -- MAP 中的 key 与 value 的分隔符
lines terminated by '\n'; -- 行分隔符

(4) Next, import the text data in map.txt to the test table test_map

load data local inpath '/root/data/map.txt' into table test_map;

(5) Access data in table test_map

set hive.cli.print.header=true;
select * from test_map;

Insert image description here
(6) Access data in map

select stuname,score['math'] as math,score['computer'] as computer from test_map;

Insert image description here

2. Data type conversion

Hive's atomic data types can be implicitly converted, similar to Java's type conversion. The principle of conversion is to convert from a type with a small data range to a type with a large data range, or from a type with low data precision to a type with high data precision, to ensure that data and precision are not lost. For example, if an expression uses the BIGINT type, INT will be automatically converted to the BIGINT type, but Hive will not perform the reverse conversion. For example, if an expression uses the INT type, BIGINT is not automatically converted to the INT type, and it returns an error unless a CAST operation is used.

2.1. Implicit conversion

(1) Any integer type can be implicitly converted to a wider type, such as TINYINT can be converted to INT, and INT can be converted to BIGINT.

(2) All integer types, FLOAT and STRING types can be implicitly converted to DOUBLE.

(3) TINYINT, SMALLINT, and INT can all be converted to FLOAT.

(4) The BOOLEAN type cannot be converted to any other type.

2.2. Display conversion

You can use the CAST operation to perform explicit data type conversion. For example, CAST('1' AS INT) will convert the string '1' into the integer 1; if the forced type conversion fails, such as executing CAST('X' AS INT), the expression Returns the empty value NULL.

select '2'+3,cast('2' as int)+1;

Insert image description here

3. Use of field types

3.1、DECIMAL(precision,scale)

The DECIMAL type in Hive is based on Java's BigDecimal, which is used to represent immutable arbitrary-precision decimal numbers in Java. All regular numeric operations (e.g. +, -, *, /) and related UDFs (e.g. Floor, Ceil, Round, etc.) handle decimal types. You can convert decimal types to and from decimal types just like you do with other numeric types. The decimal type persistence format supports scientific and non-scientific notation. So, regardless of whether the data set contains 4.004E+3 (scientific notation) or 4004 (non-scientific notation) or a combination of both, DECIMAL can be used with it.

从Hive 0.13开始,用户可以在使用DECIMAL(precision,scale)语法创建DECIMAL数据类型的表时指定scale和precision。 如果未指定小数位数,则默认为0(无小数位数)。如果未指定精度,则默认为10。

CREATE TABLE foo (
  a DECIMAL, -- Defaults to decimal(10,0)
  b DECIMAL(9, 7)
)

DECIMAL (precision, scale) Description:
precision-precision: The length of the integer + scale (that is, the length of the integer part cannot exceed precision -scale digits)
scale-decimal digits: The length of the decimal part (if the length after the decimal point is less than scale, it will be automatically filled to the scale digits; if the decimal point If the subsequent length is greater than the scale bit, the scale bit will be intercepted and rounded off)









Reference article: https://blog.csdn.net/W_chuanqi/article/details/131101265

Guess you like

Origin blog.csdn.net/weixin_49114503/article/details/134737578