Introduction to 13 common data types in Huawei gauss database

Here I summarize the 13 common types of Huawei Gaussian database. In addition to the time and date, string, and number types that everyone is familiar with, there are also binary types, text search types, HLL types, and JSON types.

a. Time and date type

--We now create a test_date table, which contains 8 different time and date formats

--Add different time data to the table

 e2265bcf762fe8ff2e03e2c0f1d57a07.png

-- Here you can see that there are many different types of time:

date year, month, day, hour, minute, and second without time zone

time without time zone hours minutes seconds without time zone

time with time zone The hour, minute and second of the time zone information

timestamp without time zone year, month, day, hour, minute, and second without time zone

timestamp with time zone The year, month, day, hour, minute, and second of the time zone

smalldatetime The time when the seconds will be rounded (when it is greater than or equal to 30 seconds, the minute + 1 second is cleared; when it is less than or equal to 29 seconds, the second is cleared)

interval day to second(4) time interval, followed by second is the precision of the number of seconds, the range is 0~6, it is said that this type is to adapt to the oracle database, but it does not implement any specific functions

reltime Relative time interval, divide one month into 30 days to output the time, for example, if you enter '60', it will display 2 mons; if you enter '-100', it will display -3 mons -10 days

b. String type

varchar variable length character string, the maximum length is 10485760

char fixed-length character string, the maximum length is 10485760

text variable length character string, the maximum length is 1073733621

c. Number type

tinyint tiny integer, 0~255

smallint small integer, -32768~32767

int integer type, -2147483648~2147483647 (common types)

bigint large integer, -9223372036854775808~9223372036854775807

float Floating-point decimal type. When there are decimals, integers and decimals can save a total of 15 digits. When the number exceeds 15 digits, the decimals will not be displayed, and the intercepted part will be rounded;

The two types of decimal and number are the same and there is no difference. They are both custom number types. The range of the integer part is 1~1000, and the range of decimal precision is 0~1000. If the precision is not specified, the maximum number before the decimal point is 131072 digits. Maximum 16383 digits after the decimal point;

money The currency amount type, when displayed, there will be a currency symbol displayed in front of it. The data range of the money type is: -92233720368547758.08~92233720368547758.07

 e14098c6ce21ca7e29d9f62a85950092.png

Numbers of money type can only be converted to decimal and number, and an error will be reported when converting to other types.

9958842f0ecfd87d928838926788960c.png

d. Binary data type

blob binary large object (column storage does not support)

raw Variable-length hexadecimal data (column storage does not support)

bytea Variable-length binary string

The three types of storage space all support a maximum of 1G~8203 bytes

--Prepare a table with three different types and add data

2e18d9b24ec75995f29a725581573a23.png

The thing to pay attention to here is the data addition method of binary large objects such as blobs:

1. First use empty_blob() to insert a null value into the blob field

2. Then use for update to lock this row of data

3. According to the id column, etc., update this data field and write binary character data

e6b36af8ad607e487728d4808615c6a8.png

e. Sequence type

smallserial Two-byte sequence integer, 1~32767

serial Four-byte serial integer, 1~2147483647

bigserial Eight-byte serial integer, 1~9223372036854775807

The sequence type is not a data type in the true sense, but only to set a unique identifier in the table. Therefore, create an integer field and set its default value to the field content read from the sequencer.

We set a NOT NULL constraint to this field to ensure that NULL will not be inserted, and we can also attach a UNIQUE or PRIMARY KEY constraint to avoid accidentally inserting duplicate values, but this is not mandatory.

Currently, it only supports specifying the SERIAL column when creating a table. It is not possible to add a SERIAL column to an existing table, nor to convert the type of a column existing in a table to SERIAL.

--Create a table with serial type

--Add data to the table, the serial type field will maintain the number auto-increment, more like the auto_increment attribute in mysql or the sequence serial number in oracle

f. Boolean type

boolean is true and false. In the database, t and f are used to represent true and false.

Valid text values ​​for "true" values ​​are: TRUE, 't', 'true', 'y', 'yes', '1';

Valid text values ​​for 'false' value are: FALSE, 'f', 'false', 'n', 'no', '0';

Using TRUE and FALSE is a relatively standard usage, and it is a general way of writing database SQL statements.

 72a48f77b16c5400d0d2171bf1607590.png

g. Network address type

Specially store ipv4, ipv6 and MAC address information

cidr stores IPv4 or IPv6 network

inet stores IPv4 or IPv6 hosts and networks, and can store subnet masks together

macaddr store MAC address data

Use the network address data to store the address, you can check the input error when storing, and there are some special operations and functions that can directly support the network type.

--Create a table with network type

-- Add three different network address data to the table

h. Bit string type

It is used to store the bitmask. The bitmask is to use a string of 0 and 1 binary to represent the actual content. For example, I have 4 different permissions, each permission is 1 or 0, then this user has Which permissions, we can use 0011 or 1110 to represent.

bit(n) is a fixed-length bit string type. The data of the bit type must accurately match the length n. If short or long data is stored, an error will be reported. A bit with no length is equal to bit(1)

bit varying(n) variable length bit string type, bit varying type data is a variable length type with a maximum length of n, data exceeding n will be rejected, and bit varying without length means that there is no length limit

--Create a table of bit string type and add data

 a0dd2f4e30c06b1390252633985fb213.png

i. Text search type

tsvector The tsvector type represents a retrieval unit, and the value of tsvector is a classification list of unique word segmentation, which formats the words of a sentence into different entries. When performing word segmentation processing, tsvector will automatically remove duplicate entries in the word segmentation. The sequence entry, to_tsvector function is usually used to parse and normalize the document string;

tsquery The tsquery type represents a retrieval condition, stores vocabulary for retrieval, and uses Boolean operations

The symbols & (AND), | (OR) and ! (NOT) are used to combine them, and parentheses are used to emphasize the grouping of operators. The to_tsquery function and plainto_tsquery function will normalize words before converting them to tsquery type.

Use tsvector to segment a string according to spaces, and the order of word segmentation is sorted by length and alphabet:

4862d743474f61a39886d997a8f0d486.png

If an entry needs to contain spaces or punctuation, it can be marked with quotes:

 75420f3b3e3433179371754aa52de5be.png

Constants for term positions can also be placed in vocabularies: positional constants usually indicate the position of a keyword in a document. Location information can be used for ranking purposes. The range of the position constant is 1 to 16383, and the maximum value is 16383 by default.

Words that have a position can even be marked with a weight, which can be A, B, C or D. The default is D, so it will not appear in the output, and weights can be used to reflect the document structure:

89d01f4b4bdb5d985b4a38391bd05a1a.png

In addition to using ::tsvector to convert the data type, you can also use the to_tsvector() function to convert, but there is a difference between the two.

to_tsvector() will query the serial number position after the word after the word is retrieved:

6dfbec7b43a325fca03e7da3c4db7042.png

The tsquery type represents a search condition, stores the vocabulary used for the search, and uses the Boolean operators & (AND), | (OR) and ! (NOT) to combine them. Parentheses are used to emphasize the grouping of operators:

You can also add one or more weight letters to identify the vocabulary:

f35e5fb8ef1fc59aa82d67a087325d86.png

The tsquery type can logically retrieve and judge the data of tsvector:

The first is to judge: whether there is Huawei, Apple or oppo data in the result

The second is to judge: whether there is Huawei + oppo or Apple data in the result

446604522abb7f7f4fe5d11cb048586d.png

Try text search tokenizer processing: search if the following word exists in the preceding string:

string::tsvector@@'keyword to search for'::tsquery

The formula returns a Boolean value, t if the key was found, or f otherwise.

j. UUID type

Stores a generic unique identifier.

UUID is a 128-bit identifier generated by an algorithm, ensuring that it is impossible to use the same algorithm to generate the same identifier in known modules, and it can guarantee the uniqueness of data better than sequences. UUID is represented as a sequence of lowercase hexadecimal numbers, consisting of a set of 8-digit numbers + three sets of 4-digit numbers + a set of 12-digit numbers, a total of 32 numbers.

--First create a table with uuid type

--Add uuid data in different formats to the table

65b05ac2eddacb02257771b2f6fbcda6.png

k. JSON data type

Data of the json type is more convenient than data of the text type to perform available checks on the stored json data.

1511f47b35ac1f2ab8326f12f37b9537.png

If the written data conforms to the json format, then inserting data can be successful:

d64a713ffa35f9b2f419ee88e40612ab.png

Otherwise, it will prompt that the json format is wrong: for example, the following xiaohong does not have quotation marks to represent the string, so it prompts that the input of xiaohong is wrong.

678cc240f5ef47553f61929d7b33bc29.png

l. HLL data type

It is used to store the HLL structure (HyperLogLog), with a fixed size of 1280 bytes, which can be directly calculated to obtain the distinct value, and the maximum error rate of data calculation is about 2.3%. HyperLogLog is a fixed-size, set-like structure for counting distinct values ​​with adjustable precision.

Its function is to count the number of unique elements in a collection.

-- Let's see how to use the hll type first

After creating the table, we add an empty hll data collection to the hll field, and hll_empty() is the method to create an empty collection:

852a7b498e13f1b64142dbf6aefada72.png

3c6450ae7ab64485a7478821d39ffc44.png

Then insert the data into the collection by means of update, let's try to add three values ​​to the collection:

30c228a9af3610200b73d8464595cb22.png

Finally, check the cardinality of the hll type data, and the result is 2, indicating that there are two unique data in this collection:

e31942e5d1fd8e1463ec4aa1087f5b07.png

Calculate a number or text or any type into a hash value:

hll_hash_integer()

hll_hash_text()

hll_hash_any()

We can look at a practical example, an example of "statistics of the number of website visitors":

--We first create a visit fact table to store the information that a user has visited the website at a certain time

3363f8ef2ef2aee59e8d94e66e68d984.png

-- Add data to the table in batches

2f06ba4a4773ca619cce7536183a8ccf.png

--Then we create a user daily access table, and specify the column as hll type

9324ee0173da13a8a8e5d3ce197d94c3.png

--Now group the data according to the date and store the data in the data table of hll

-- Calculate the number of different users who visit the website every day

--Query how many different users visited the website each month

The data of this DV value can be calculated very quickly with hll.

m. enumerated type

  1. Create an enumeration type

  1. Create a table, using the newly created enumeration type gender

  1. Add data to the table, if the added data is not in the enumeration type, then an error will be reported

The above is the introduction of commonly used data types. In the next issue, we will talk about the frequently used function operations of various data types. 

Guess you like

Origin blog.csdn.net/adamconan/article/details/127553151