Here I summarize the 13 common types of Huawei Gaussian database. In addition to the time and date, string, and number types that everyone is familiar with, there are also binary types, text search types, HLL types, and JSON types.
a. Time and date type
--We now create a test_date table, which contains 8 different time and date formats
--Add different time data to the table
-- Here you can see that there are many different types of time:
date year, month, day, hour, minute, and second without time zone
time without time zone hours minutes seconds without time zone
time with time zone The hour, minute and second of the time zone information
timestamp without time zone year, month, day, hour, minute, and second without time zone
timestamp with time zone The year, month, day, hour, minute, and second of the time zone
smalldatetime The time when the seconds will be rounded (when it is greater than or equal to 30 seconds, the minute + 1 second is cleared; when it is less than or equal to 29 seconds, the second is cleared)
interval day to second(4) time interval, followed by second is the precision of the number of seconds, the range is 0~6, it is said that this type is to adapt to the oracle database, but it does not implement any specific functions
reltime Relative time interval, divide one month into 30 days to output the time, for example, if you enter '60', it will display 2 mons; if you enter '-100', it will display -3 mons -10 days
b. String type
varchar variable length character string, the maximum length is 10485760
char fixed-length character string, the maximum length is 10485760
text variable length character string, the maximum length is 1073733621
c. Number type
tinyint tiny integer, 0~255
smallint small integer, -32768~32767
int integer type, -2147483648~2147483647 (common types)
bigint large integer, -9223372036854775808~9223372036854775807
float Floating-point decimal type. When there are decimals, integers and decimals can save a total of 15 digits. When the number exceeds 15 digits, the decimals will not be displayed, and the intercepted part will be rounded;
The two types of decimal and number are the same and there is no difference. They are both custom number types. The range of the integer part is 1~1000, and the range of decimal precision is 0~1000. If the precision is not specified, the maximum number before the decimal point is 131072 digits. Maximum 16383 digits after the decimal point;
money The currency amount type, when displayed, there will be a currency symbol displayed in front of it. The data range of the money type is: -92233720368547758.08~92233720368547758.07
Numbers of money type can only be converted to decimal and number, and an error will be reported when converting to other types.
d. Binary data type
blob binary large object (column storage does not support)
raw Variable-length hexadecimal data (column storage does not support)
bytea Variable-length binary string
The three types of storage space all support a maximum of 1G~8203 bytes
--Prepare a table with three different types and add data
The thing to pay attention to here is the data addition method of binary large objects such as blobs:
1. First use empty_blob() to insert a null value into the blob field
2. Then use for update to lock this row of data
3. According to the id column, etc., update this data field and write binary character data
e. Sequence type
smallserial Two-byte sequence integer, 1~32767
serial Four-byte serial integer, 1~2147483647
bigserial Eight-byte serial integer, 1~9223372036854775807
The sequence type is not a data type in the true sense, but only to set a unique identifier in the table. Therefore, create an integer field and set its default value to the field content read from the sequencer.
We set a NOT NULL constraint to this field to ensure that NULL will not be inserted, and we can also attach a UNIQUE or PRIMARY KEY constraint to avoid accidentally inserting duplicate values, but this is not mandatory.
Currently, it only supports specifying the SERIAL column when creating a table. It is not possible to add a SERIAL column to an existing table, nor to convert the type of a column existing in a table to SERIAL.
--Create a table with serial type
--Add data to the table, the serial type field will maintain the number auto-increment, more like the auto_increment attribute in mysql or the sequence serial number in oracle
f. Boolean type
boolean is true and false. In the database, t and f are used to represent true and false.
Valid text values for "true" values are: TRUE, 't', 'true', 'y', 'yes', '1';
Valid text values for 'false' value are: FALSE, 'f', 'false', 'n', 'no', '0';
Using TRUE and FALSE is a relatively standard usage, and it is a general way of writing database SQL statements.
g. Network address type
Specially store ipv4, ipv6 and MAC address information
cidr stores IPv4 or IPv6 network
inet stores IPv4 or IPv6 hosts and networks, and can store subnet masks together
macaddr store MAC address data
Use the network address data to store the address, you can check the input error when storing, and there are some special operations and functions that can directly support the network type.
--Create a table with network type
-- Add three different network address data to the table
h. Bit string type
It is used to store the bitmask. The bitmask is to use a string of 0 and 1 binary to represent the actual content. For example, I have 4 different permissions, each permission is 1 or 0, then this user has Which permissions, we can use 0011 or 1110 to represent.
bit(n) is a fixed-length bit string type. The data of the bit type must accurately match the length n. If short or long data is stored, an error will be reported. A bit with no length is equal to bit(1)
bit varying(n) variable length bit string type, bit varying type data is a variable length type with a maximum length of n, data exceeding n will be rejected, and bit varying without length means that there is no length limit
--Create a table of bit string type and add data
i. Text search type
tsvector The tsvector type represents a retrieval unit, and the value of tsvector is a classification list of unique word segmentation, which formats the words of a sentence into different entries. When performing word segmentation processing, tsvector will automatically remove duplicate entries in the word segmentation. The sequence entry, to_tsvector function is usually used to parse and normalize the document string;
tsquery The tsquery type represents a retrieval condition, stores vocabulary for retrieval, and uses Boolean operations
The symbols & (AND), | (OR) and ! (NOT) are used to combine them, and parentheses are used to emphasize the grouping of operators. The to_tsquery function and plainto_tsquery function will normalize words before converting them to tsquery type.
Use tsvector to segment a string according to spaces, and the order of word segmentation is sorted by length and alphabet:
If an entry needs to contain spaces or punctuation, it can be marked with quotes:
Constants for term positions can also be placed in vocabularies: positional constants usually indicate the position of a keyword in a document. Location information can be used for ranking purposes. The range of the position constant is 1 to 16383, and the maximum value is 16383 by default.
Words that have a position can even be marked with a weight, which can be A, B, C or D. The default is D, so it will not appear in the output, and weights can be used to reflect the document structure:
In addition to using ::tsvector to convert the data type, you can also use the to_tsvector() function to convert, but there is a difference between the two.
to_tsvector() will query the serial number position after the word after the word is retrieved:
The tsquery type represents a search condition, stores the vocabulary used for the search, and uses the Boolean operators & (AND), | (OR) and ! (NOT) to combine them. Parentheses are used to emphasize the grouping of operators:
You can also add one or more weight letters to identify the vocabulary:
The tsquery type can logically retrieve and judge the data of tsvector:
The first is to judge: whether there is Huawei, Apple or oppo data in the result
The second is to judge: whether there is Huawei + oppo or Apple data in the result
Try text search tokenizer processing: search if the following word exists in the preceding string:
string::tsvector@@'keyword to search for'::tsquery
The formula returns a Boolean value, t if the key was found, or f otherwise.
j. UUID type
Stores a generic unique identifier.
UUID is a 128-bit identifier generated by an algorithm, ensuring that it is impossible to use the same algorithm to generate the same identifier in known modules, and it can guarantee the uniqueness of data better than sequences. UUID is represented as a sequence of lowercase hexadecimal numbers, consisting of a set of 8-digit numbers + three sets of 4-digit numbers + a set of 12-digit numbers, a total of 32 numbers.
--First create a table with uuid type
--Add uuid data in different formats to the table
k. JSON data type
Data of the json type is more convenient than data of the text type to perform available checks on the stored json data.
If the written data conforms to the json format, then inserting data can be successful:
Otherwise, it will prompt that the json format is wrong: for example, the following xiaohong does not have quotation marks to represent the string, so it prompts that the input of xiaohong is wrong.
l. HLL data type
It is used to store the HLL structure (HyperLogLog), with a fixed size of 1280 bytes, which can be directly calculated to obtain the distinct value, and the maximum error rate of data calculation is about 2.3%. HyperLogLog is a fixed-size, set-like structure for counting distinct values with adjustable precision.
Its function is to count the number of unique elements in a collection.
-- Let's see how to use the hll type first
After creating the table, we add an empty hll data collection to the hll field, and hll_empty() is the method to create an empty collection:
Then insert the data into the collection by means of update, let's try to add three values to the collection:
Finally, check the cardinality of the hll type data, and the result is 2, indicating that there are two unique data in this collection:
Calculate a number or text or any type into a hash value:
hll_hash_integer()
hll_hash_text()
hll_hash_any()
We can look at a practical example, an example of "statistics of the number of website visitors":
--We first create a visit fact table to store the information that a user has visited the website at a certain time
-- Add data to the table in batches
--Then we create a user daily access table, and specify the column as hll type
--Now group the data according to the date and store the data in the data table of hll
-- Calculate the number of different users who visit the website every day
--Query how many different users visited the website each month
The data of this DV value can be calculated very quickly with hll.
m. enumerated type
- Create an enumeration type
- Create a table, using the newly created enumeration type gender
- Add data to the table, if the added data is not in the enumeration type, then an error will be reported
The above is the introduction of commonly used data types. In the next issue, we will talk about the frequently used function operations of various data types.