Doris partition table usage

1. Dynamic partitioning

Official website link
Dynamic partitioning is a new feature introduced in Doris version 0.12. It is designed to implement life cycle management (TTL) for table-level partitions and reduce user burden.

Currently, the functions of dynamically adding partitions and dynamically deleting partitions are implemented.

Dynamic partitioning only supports Range partitioning.

Note: This feature will be disabled when synchronized by CCR. If this table is copied from CCR, that is, when PROPERTIES contains is_being_synced = true, the enabled status will be displayed in show create table, but it will not actually take effect. When is_being_synced is set to false, these functions will be restored to take effect, but the is_being_synced attribute is only used by CCR peripheral modules and should not be set manually during CCR synchronization.

2. Principle

In some usage scenarios, users will partition the table by day and perform routine tasks regularly every day. In this case, the user needs to manually manage the partitions. Otherwise, the data import may fail because the user has not created a partition, which will cause problems for the user. Brings additional maintenance costs.

Through the dynamic partitioning function, users can set dynamic partitioning rules when creating tables. FE starts a background thread to create or delete partitions based on user-specified rules. Users can also make changes to existing rules at runtime.

Usage
The rules of dynamic partitioning can be specified when creating a table, or modified at runtime. Currently, only dynamic partitioning rules are supported for partitioned tables with a single partition column.

Specify when creating the table:

CREATE TABLE tbl1
(...)
PROPERTIES
(
    "dynamic_partition.prop1" = "value1",
    "dynamic_partition.prop2" = "value2",
    ...
)

Runtime modification

ALTER TABLE tbl1 SET
(
    "dynamic_partition.prop1" = "value1",
    "dynamic_partition.prop2" = "value2",
    ...
)

3. Dynamic partitioning rule parameters

The rule parameters of dynamic partitioning are all prefixed with dynamic_partition.:

dynamic_partition.enable

Whether to enable dynamic partitioning feature. Can be specified as TRUE or FALSE. If left blank, defaults to TRUE. If FALSE, Doris ignores dynamic partitioning rules for the table.

dynamic_partition.time_unit

The unit of dynamic partition scheduling. Can be specified as HOUR, DAY, WEEK, MONTH, YEAR. Indicates partition creation or deletion by hour, day, week, month, and year respectively.

When HOUR is specified, the suffix format of the dynamically created partition name is yyyyMMddHH, for example, 2020032501. Hourly partition column data type cannot be DATE.

When DAY is specified, the suffix format of the dynamically created partition name is yyyyMMdd, for example, 20200325.

When WEEK is specified, the suffix format of the dynamically created partition name is yyyy_ww. That is, the current date belongs to the week of this year. For example, the partition name suffix created on 2020-03-25 is 2020_13, indicating that it is the 13th week of 2020.

When MONTH is specified, the suffix format of the dynamically created partition name is yyyyMM, for example, 202003.

When YEAR is specified, the suffix format of the dynamically created partition name is yyyy, for example, 2020.

dynamic_partition.time_zone

The time zone of the dynamic partition. If not filled in, it defaults to the time zone of the current machine system, such as Asia/Shanghai. If you want to obtain the currently supported time zone settings, you can refer to https://en.wikipedia.org/wiki/List_of_tz_database_time_zones.

dynamic_partition.start

The starting offset of the dynamic partition, which is a negative number. Depending on the time_unit attribute, based on the current day (week/month), the partition range before this offset will be deleted. If not filled in, the default is -2147483648, which means the historical partition will not be deleted.

dynamic_partition.end

The end offset of the dynamic partition, which is a positive number. Depending on the time_unit attribute, partitions of the corresponding range are created in advance based on the current day (week/month).

dynamic_partition.prefix

Dynamically created partition name prefix.

dynamic_partition.buckets

The number of buckets corresponding to the dynamically created partition.

dynamic_partition.replication_num

The number of copies corresponding to the dynamically created partition. If not filled in, the number of copies specified when the table is created will default.

dynamic_partition.start_day_of_week

When time_unit is WEEK, this parameter specifies the starting point of each week. Values ​​are 1 to 7. Where 1 means Monday and 7 means Sunday. The default is 1, which means every week starts on Monday.

dynamic_partition.start_day_of_month

When time_unit is MONTH, this parameter is used to specify the starting day of each month. Values ​​are 1 to 28. Among them, 1 means the 1st of each month, and 28 means the 28th of each month. The default is 1, which means that each month starts with position 1. Currently, it is not supported to use the 29th, 30th, and 31st as the starting date to avoid ambiguity caused by leap years or leap months.

dynamic_partition.create_history_partition

Default is false. When set to true, Doris will automatically create all partitions. See below for specific creation rules. At the same time, the parameter max_dynamic_partition_num of FE will limit the total number of partitions to avoid creating too many partitions at once. When the number of partitions expected to be created is greater than the max_dynamic_partition_num value, the operation will be prohibited.

This parameter does not take effect when the start attribute is not specified.

dynamic_partition.history_partition_num

When create_history_partition is true, this parameter is used to specify the number of historical partitions to create. The default value is -1, which is not set.

dynamic_partition.hot_partition_num

Specify the latest number of partitions as hot partitions. For hot partitions, the system automatically sets its storage_medium parameter to SSD and sets storage_cooldown_time.

Note: If there is no SSD disk path under the storage path, configuring this parameter will cause the dynamic partition creation to fail.

hot_partition_num is n days before and all partitions in the future

Let's give an example. Assume that today is 2021-05-20, partition by day, and the attributes of the dynamic partition are set to: hot_partition_num=2, end=3, start=-3. The system will automatically create the following partitions and set the storage_medium and storage_cooldown_time parameters:

p20210517:["2021-05-17", "2021-05-18") storage_medium=HDD storage_cooldown_time=9999-12-31 23:59:59
p20210518:["2021-05-18", "2021-05-19") storage_medium=HDD storage_cooldown_time=9999-12-31 23:59:59
p20210519:["2021-05-19", "2021-05-20") storage_medium=SSD storage_cooldown_time=2021-05-21 00:00:00
p20210520:["2021-05-20", "2021-05-21") storage_medium=SSD storage_cooldown_time=2021-05-22 00:00:00
p20210521:["2021-05-21", "2021-05-22") storage_medium=SSD storage_cooldown_time=2021-05-23 00:00:00
p20210522:["2021-05-22", "2021-05-23") storage_medium=SSD storage_cooldown_time=2021-05-24 00:00:00
p20210523:["2021-05-23", "2021-05-24") storage_medium=SSD storage_cooldown_time=2021-05-25 00:00:00

dynamic_partition.reserved_history_periods

The time range of historical partitions that need to be preserved. When dynamic_partition.time_unit is set to "DAY/WEEK/MONTH/YEAR", it needs to be set in the format of [yyyy-MM-dd,yyyy-MM-dd],[…,…]. When dynamic_partition.time_unit is set to "HOUR", it needs to be set in the format of [yyyy-MM-dd HH:mm:ss,yyyy-MM-dd HH:mm:ss],[…,…]. If not set, the default is "NULL".

Let's give an example. Assume that today is 2021-09-06, classified by day, and the attributes of the dynamic partition are set to:

time_unit=“DAY/WEEK/MONTH/YEAR”, end=3, start=-3, reserved_history_periods=“[2020-06-01,2020-06-20],[2020-10-31,2020-11-15]”。

The system will automatically retain:

["2020-06-01","2020-06-20"],
["2020-10-31","2020-11-15"]

or

time_unit=“HOUR”, end=3, start=-3, reserved_history_periods=“[2020-06-01 00:00:00,2020-06-01 03:00:00]”.

The system will automatically retain:

["2020-06-01 00:00:00","2020-06-01 03:00:00"]

partition between these two time periods. Among them, each […,…] of reserved_history_periods is a pair of setting items, both of which need to be set at the same time, and the first time cannot be greater than the second time.

dynamic_partition.storage_medium

Since Version 1.2.3
Specifies the default storage medium for dynamic partitions created. Default is HDD, optional SSD.

Note that when set to SSD, the hot_partition_num attribute will no longer take effect, all partitions will default to SSD storage media and the cooling time is 9999-12-31 23:59:59.

Create historical partition rules
When create_history_partition is true, that is, when the function of creating historical partitions is enabled, Doris will determine the number of historical partitions to create based on dynamic_partition.start and dynamic_partition.history_partition_num.

Assume that the number of historical partitions that need to be created is expect_create_partition_num. The specific number according to different settings is as follows:

create_history_partition = true
dynamic_partition.history_partition_num 未设置,即 -1. expect_create_partition_num = end - start;
dynamic_partition.history_partition_num 已设置 expect_create_partition_num = end - max(start, -histoty_partition_num);
create_history_partition = false 不会创建历史分区,expect_create_partition_num = end - 0;
当 expect_create_partition_num 大于 max_dynamic_partition_num(默认500)时,禁止创建过多分区。

for example:

1. Assume that today is 2021-05-20, partition by day, and the attributes of the dynamic partition are set to: create_history_partition=true, end=3, start=-3, history_partition_num=1, then the system will automatically create the following partitions:

p20210519
p20210520
p20210521
p20210522
p20210523

2. History_partition_num=5, and the other attributes remain the same as in 1, the system will automatically create the following partitions:

p20210517
p20210518
p20210519
p20210520
p20210521
p20210522
p20210523

3. history_partition_num=-1 means that the number of historical partitions is not set, and the other attributes remain the same as in 1. The system will automatically create the following partitions:

p20210517
p20210518
p20210519
p20210520
p20210521
p20210522
p20210523

Notes
During the use of dynamic partitioning, if some partitions between dynamic_partition.start and dynamic_partition.end are lost due to some unexpected circumstances, then the current time will be the same as dynamic_partition.end Lost partitions between dynamic_partition.start and the current time will not be recreated.

Example
1. The type of partition column k1 of table tbl1 is DATE, and a dynamic partition rule is created. Partition by day, only keep the partitions for the last 7 days, and create partitions for the next 3 days in advance.

CREATE TABLE tbl1
(
    k1 DATE,
    ...
)
PARTITION BY RANGE(k1) ()
DISTRIBUTED BY HASH(k1)
PROPERTIES
(
    "dynamic_partition.enable" = "true",
    "dynamic_partition.time_unit" = "DAY",
    "dynamic_partition.start" = "-7",
    "dynamic_partition.end" = "3",
    "dynamic_partition.prefix" = "p",
    "dynamic_partition.buckets" = "32"
);

Assume that the current date is 2020-05-29. According to the above rules, tbl1 will generate the following partitions:

p20200529: ["2020-05-29", "2020-05-30")
p20200530: ["2020-05-30", "2020-05-31")
p20200531: ["2020-05-31", "2020-06-01")
p20200601: ["2020-06-01", "2020-06-02")

On the next day, 2020-05-30, a new partition p20200602: ["2020-06-02", "2020-06-03") will be created

At 2020-06-06, because dynamic_partition.start is set to 7, the partition 7 days ago will be deleted, that is, partition p20200529 will be deleted.

2. The type of partition column k1 of table tbl1 is DATETIME, and a dynamic partitioning rule is created. Partition by week, keep only the partitions for the last 2 weeks, and create partitions for the next 2 weeks in advance.

CREATE TABLE tbl1
(
    k1 DATETIME,
    ...
)
PARTITION BY RANGE(k1) ()
DISTRIBUTED BY HASH(k1)
PROPERTIES
(
    "dynamic_partition.enable" = "true",
    "dynamic_partition.time_unit" = "WEEK",
    "dynamic_partition.start" = "-2",
    "dynamic_partition.end" = "2",
    "dynamic_partition.prefix" = "p",
    "dynamic_partition.buckets" = "8"
);

Assume that the current date is 2020-05-29, which is the 22nd week of 2020. The default start of each week is Monday. Based on the above rules, tbl1 will generate the following partitions:

p2020_22: ["2020-05-25 00:00:00", "2020-06-01 00:00:00")
p2020_23: ["2020-06-01 00:00:00", "2020-06-08 00:00:00")
p2020_24: ["2020-06-08 00:00:00", "2020-06-15 00:00:00")

The start date of each partition is Monday of the current week. At the same time, because the type of partition column k1 is DATETIME, the partition value will complete the hour, minute and second part, and both will be 0.

On 2020-06-15, which is the 25th week, the partition 2 weeks ago will be deleted, that is, p2020_22 will be deleted.

In the above example, assume that the user specifies the starting day of the week as "dynamic_partition.start_day_of_week" = "3", that is, every Wednesday is the starting day. The partitions are as follows:

p2020_22: ["2020-05-27 00:00:00", "2020-06-03 00:00:00")
p2020_23: ["2020-06-03 00:00:00", "2020-06-10 00:00:00")
p2020_24: ["2020-06-10 00:00:00", "2020-06-17 00:00:00")

That is, the partition range is from Wednesday of the current week to Tuesday of the next week.

Note: 2019-12-31 and 2020-01-01 are in the same week. If the starting date of the partition is 2019-12-31, the partition name is p2019_53. If the starting date of the partition is 2020 -01-01, the partition name is p2020_01.
3. The type of partition column k1 of table tbl1 is DATE, and a dynamic partitioning rule is created. Partition by month, historical partitions are not deleted, and partitions for the next 2 months are created in advance. At the same time, set the 3rd of each month as the starting date.

CREATE TABLE tbl1
(
    k1 DATE,
    ...
)
PARTITION BY RANGE(k1) ()
DISTRIBUTED BY HASH(k1)
PROPERTIES
(
    "dynamic_partition.enable" = "true",
    "dynamic_partition.time_unit" = "MONTH",
    "dynamic_partition.end" = "2",
    "dynamic_partition.prefix" = "p",
    "dynamic_partition.buckets" = "8",
    "dynamic_partition.start_day_of_month" = "3"
);

Assume that the current date is 2020-05-29. Based on the above rules, tbl1 will generate the following partitions:

p202005: ["2020-05-03", "2020-06-03")
p202006: ["2020-06-03", "2020-07-03")
p202007: ["2020-07-03", "2020-08-03")

Because dynamic_partition.start is not set, historical partitions will not be deleted.

Assuming that today is 2020-05-20, and the 28th of each month is set as the starting day, the partition range is:

p202004: ["2020-04-28", "2020-05-28")
p202005: ["2020-05-28", "2020-06-28")
p202006: ["2020-06-28", "2020-07-28")

Modify dynamic partition attributes
You can modify the attributes of dynamic partitions through the following command:

ALTER TABLE tbl1 SET
(
    "dynamic_partition.prop1" = "value1",
    ...
);

Modification of some properties may cause conflicts. Assume that the previous partition granularity was DAY and the following partitions have been created:

p20200519: ["2020-05-19", "2020-05-20")
p20200520: ["2020-05-20", "2020-05-21")
p20200521: ["2020-05-21", "2020-05-22")

If the partition granularity is changed to MONTH at this time, the system will try to create a partition with the range ["2020-05-01", "2020-06-01"), and the partition range of this partition conflicts with the existing partition, so Unable to create. Partitions with a range of ["2020-06-01", "2020-07-01") can be created normally. Therefore, the partition from 2020-05-22 to 2020-05-30 needs to be filled in by itself.

View the scheduling status of dynamic partition tables
You can further check the scheduling status of all dynamic partition tables in the current database through the following command:

mysql> SHOW DYNAMIC PARTITION TABLES;
+-----------+--------+----------+-------------+------+--------+---------+-----------+----------------+---------------------+--------+------------------------+----------------------+-------------------------+
| TableName | Enable | TimeUnit | Start       | End  | Prefix | Buckets | StartOf   | LastUpdateTime | LastSchedulerTime   | State  | LastCreatePartitionMsg | LastDropPartitionMsg | ReservedHistoryPeriods  |
+-----------+--------+----------+-------------+------+--------+---------+-----------+----------------+---------------------+--------+------------------------+----------------------+-------------------------+
| d3        | true   | WEEK     | -3          | 3    | p      | 1       | MONDAY    | N/A            | 2020-05-25 14:29:24 | NORMAL | N/A                    | N/A                  | [2021-12-01,2021-12-31] |
| d5        | true   | DAY      | -7          | 3    | p      | 32      | N/A       | N/A            | 2020-05-25 14:29:24 | NORMAL | N/A                    | N/A                  | NULL                    |
| d4        | true   | WEEK     | -3          | 3    | p      | 1       | WEDNESDAY | N/A            | 2020-05-25 14:29:24 | NORMAL | N/A                    | N/A                  | NULL                    | 
| d6        | true   | MONTH    | -2147483648 | 2    | p      | 8       | 3rd       | N/A            | 2020-05-25 14:29:24 | NORMAL | N/A                    | N/A                  | NULL                    |
| d2        | true   | DAY      | -3          | 3    | p      | 32      | N/A       | N/A            | 2020-05-25 14:29:24 | NORMAL | N/A                    | N/A                  | NULL                    |
| d7        | true   | MONTH    | -2147483648 | 5    | p      | 8       | 24th      | N/A            | 2020-05-25 14:29:24 | NORMAL | N/A                    | N/A                  | NULL                    |
+-----------+--------+----------+-------------+------+--------+---------+-----------+----------------+---------------------+--------+------------------------+----------------------+-------------------------+
7 rows in set (0.02 sec)

LastUpdateTime: The last time the dynamic partition attributes were modified
LastSchedulerTime: The last time dynamic partition scheduling was executed
State: The last time dynamic partitioning was executed Scheduling status
LastCreatePartitionMsg: Error message for the last time dynamically adding partition scheduling
LastDropPartitionMsg: Error message for last time dynamically deleting partition scheduling

4. Advanced operations

FE 配置项
dynamic_partition_enable

Whether to enable the dynamic partitioning function of Doris. The default is false, which is off. This parameter only affects the partitioning operation of dynamic partitioned tables and does not affect ordinary tables. It can take effect by modifying the parameters in fe.conf and restarting FE. You can also execute the following command at runtime to take effect:

MySQL protocol:

ADMIN SET FRONTEND CONFIG (“dynamic_partition_enable” = “true”)

HTTP protocol:

curl --location-trusted -u username:password -XGET http://fe_host:fe_http_port/api/_set_config?dynamic_partition_enable=true

To turn off dynamic partitioning globally, set this parameter to false.

dynamic_partition_check_interval_seconds

The execution frequency of the dynamic partition thread defaults to 600 (10 minutes), that is, it is scheduled every 10 minutes. It can take effect by modifying the parameters in fe.conf and restarting FE. You can also execute the following command to modify it at runtime:

MySQL protocol:

ADMIN SET FRONTEND CONFIG (“dynamic_partition_check_interval_seconds” = “7200”)

HTTP protocol:

curl --location-trusted -u username:password -XGET http://fe_host:fe_http_port/api/_set_config?dynamic_partition_check_interval_seconds=432000

Conversion between dynamic partition table and manual partition table
For a table, dynamic partition and manual partition can be freely converted, but they cannot exist at the same time. There is only one kind state.

Convert manual partitioning to dynamic partitioning
If a table does not specify dynamic partitioning when it is created, you can use ALTER TABLE to modify the dynamic partitioning-related properties at runtime to convert it to dynamic partitioning. Specifically, Examples can be viewed via HELP ALTER TABLE.

After turning on the dynamic partition function, Doris will no longer allow users to manually manage partitions and will automatically manage partitions based on dynamic partition attributes.

Note: If dynamic_partition.start is set, historical partitions whose partition range is before the dynamic partition start offset will be deleted.

Convert dynamic partition to manual partition
You can turn off the dynamic partition function and convert it to manual partition by executing ALTER TABLE tbl_name SET ("dynamic_partition.enable" = "false") surface.

After turning off the dynamic partitioning function, Doris will no longer automatically manage partitions, and users need to manually create or delete partitions through ALTER TABLE.

5. Frequently Asked Questions

After creating a dynamic partition table, the message Could not create table with dynamic partition when fe config dynamic_partition_enable is false

Because the main switch of dynamic partitioning, that is, the configuration dynamic_partition_enable of FE is false, the dynamic partition table cannot be created.

At this time, please modify the configuration file of FE, add a line dynamic_partition_enable=true, and restart FE. Or execute the command ADMIN SET FRONTEND CONFIG (“dynamic_partition_enable” = “true”) to turn on the dynamic partition switch.

About replica settings for dynamic partitions

Dynamic partitions are automatically created by the scheduling logic within the system. When automatically creating a partition, the partition attributes used (including the number of copies of the partition, etc.) are all attributes prefixed by dynamic_partition alone, instead of using the default attributes of the table. for example:

CREATE TABLE tbl1 (
`k1` int,
`k2` date
)
PARTITION BY RANGE(k2)()
DISTRIBUTED BY HASH(k1) BUCKETS 3
PROPERTIES
(
"dynamic_partition.enable" = "true",
"dynamic_partition.time_unit" = "DAY",
"dynamic_partition.end" = "3",
"dynamic_partition.prefix" = "p",
"dynamic_partition.buckets" = "32",
"dynamic_partition.replication_num" = "1",
"dynamic_partition.start" = "-3",
"replication_num" = "3"
);

In this example, no initial partition is created (the partition definition in the PARTITION BY clause is empty), and DISTRIBUTED BY HASH(k1) BUCKETS 3, “replication_num” = “3”, “dynamic_partition.replication_num” = “1” are set ” and “dynamic_partition.buckets” = “32”.

We make the first two parameters the default parameters for the table, and the last two parameters become the dynamic partition-specific parameters.

When the system automatically creates a partition, it will use the two configurations of bucket number 32 and copy number 1 (that is, dynamic partition-specific parameters). Instead of the two configurations of bucket number 3 and replica number 3.

When the user manually adds a partition through the ALTER TABLE tbl1 ADD PARTITION statement, the two configurations of bucket number 3 and copy number 3 (that is, the default parameters of the table) will be used.

That is, dynamic partitioning uses an independent set of parameter settings. The table's default parameters are used only when no dynamic partition-specific parameters are set. as follows:

CREATE TABLE tbl2 (
`k1` int,
`k2` date
)
PARTITION BY RANGE(k2)()
DISTRIBUTED BY HASH(k1) BUCKETS 3
PROPERTIES
(
"dynamic_partition.enable" = "true",
"dynamic_partition.time_unit" = "DAY",
"dynamic_partition.end" = "3",
"dynamic_partition.prefix" = "p",
"dynamic_partition.start" = "-3",
"dynamic_partition.buckets" = "32",
"replication_num" = "3"
);

In this example, if dynamic_partition.replication_num is not specified separately, dynamic partitioning will use the default parameters of the table, that is, "replication_num" = "3".

And the following example:

CREATE TABLE tbl3 (
`k1` int,
`k2` date
)
PARTITION BY RANGE(k2)(
    PARTITION p1 VALUES LESS THAN ("2019-10-10")
)
DISTRIBUTED BY HASH(k1) BUCKETS 3
PROPERTIES
(
"dynamic_partition.enable" = "true",
"dynamic_partition.time_unit" = "DAY",
"dynamic_partition.end" = "3",
"dynamic_partition.prefix" = "p",
"dynamic_partition.start" = "-3",
"dynamic_partition.buckets" = "32",
"dynamic_partition.replication_num" = "1",
"replication_num" = "3"
);

In this example, there is a manually created partition p1. This partition will use the table's default settings of 3 buckets and 3 replicas. Subsequent dynamic partitions automatically created by the system will still use the special parameters for dynamic partitions, that is, the number of buckets is 32 and the number of replicas is 1.

6. More help

For more detailed syntax and best practices for using dynamic partitioning, please refer to the SHOW DYNAMIC PARTITION command manual. You can also enter HELP ALTER TABLE on the MySql client command line for more help information.

7. Temporary partition

In version 0.12, Doris supports the temporary partition function.

The temporary partition belongs to a certain partition table. Temporary partitions can be created only for partitioned tables.

8. Temporary zoning rules

The partition column of the temporary partition is the same as the formal partition and cannot be modified.
The partition ranges between all temporary partitions of a table cannot overlap, but the range of temporary partitions and the range of formal partitions can overlap.
The partition name of the temporary partition cannot be the same as the formal partition and other temporary partitions.

9. Operations supported by temporary partition

Temporary partition supports add, delete, and replace operations.

10. Add temporary partition

Temporary partitions can be added to a table with the ALTER TABLE ADD TEMPORARY PARTITION statement:

ALTER TABLE tbl1 ADD TEMPORARY PARTITION tp1 VALUES LESS THAN("2020-02-01");

ALTER TABLE tbl2 ADD TEMPORARY PARTITION tp1 VALUES [("2020-01-01"), ("2020-02-01"));

ALTER TABLE tbl1 ADD TEMPORARY PARTITION tp1 VALUES LESS THAN("2020-02-01")
("replication_num" = "1")
DISTRIBUTED BY HASH(k1) BUCKETS 5;

ALTER TABLE tbl3 ADD TEMPORARY PARTITION tp1 VALUES IN ("Beijing", "Shanghai");

ALTER TABLE tbl4 ADD TEMPORARY PARTITION tp1 VALUES IN ((1, "Beijing"), (1, "Shanghai"));

ALTER TABLE tbl3 ADD TEMPORARY PARTITION tp1 VALUES IN ("Beijing", "Shanghai")
("replication_num" = "1")
DISTRIBUTED BY HASH(k1) BUCKETS 5;

See more help and examples via HELP ALTER TABLE;.

Some instructions for adding operations:

The operation of adding a temporary partition is similar to that of a formal partition. The partition scope of a temporary partition is independent of the formal partition.
The temporary partition can specify some attributes independently. Including information such as the number of buckets, number of copies, storage media, etc.

Delete temporary partition

You can delete a table's temporary partition with the ALTER TABLE DROP TEMPORARY PARTITION statement:

ALTER TABLE tbl1 DROP TEMPORARY PARTITION tp1;

See more help and examples via HELP ALTER TABLE;.

Some instructions for deletion operations:

Deleting the temporary partition does not affect the data of the formal partition.

Replace partition

You can replace a table's formal partitions with temporary partitions through the ALTER TABLE REPLACE PARTITION statement.

ALTER TABLE tbl1 REPLACE PARTITION (p1) WITH TEMPORARY PARTITION (tp1);

ALTER TABLE tbl1 REPLACE PARTITION (p1, p2) WITH TEMPORARY PARTITION (tp1, tp2, tp3);

ALTER TABLE tbl1 REPLACE PARTITION (p1, p2) WITH TEMPORARY PARTITION (tp1, tp2)
PROPERTIES (
    "strict_range" = "false",
    "use_temp_partition_name" = "true"
);

See more help and examples via HELP ALTER TABLE;.

The replacement operation has two special optional parameters:

1、strict_range

Defaults to true.

For Range partitioning, when this parameter is true, it means that the union of ranges of all formal partitions to be replaced needs to be exactly the same as the union of ranges of the temporary partitions being replaced. When set to false, you only need to ensure that the ranges of the new formal partitions do not overlap after replacement.

For List partitions, this parameter is always true. The enumeration values ​​of all formal partitions to be replaced must be exactly the same as the enumeration values ​​of the temporary partitions being replaced.

Here are some examples:

Example 1

Range of partitions p1, p2, p3 to be replaced (=> union):

[10, 20), [20, 30), [40, 50) => [10, 30), [40, 50)

Replace the range of partitions tp1, tp2 (=> union):

[10, 30), [40, 45), [45, 50) => [10, 30), [40, 50)

The range unions are the same, then you can use tp1 and tp2 to replace p1, p2, p3.

Example 2

Range (=> union) of partition p1 to be replaced:

[10, 50) => [10, 50)

Replace the range of partitions tp1, tp2 (=> union):

[10, 30), [40, 50) => [10, 30), [40, 50)

Range unions are not identical, and if strict_range is true, tp1 and tp2 cannot be used to replace p1. If false, the replacement can be performed if the two replaced partition ranges [10, 30), [40, 50) do not overlap with other formal partitions.

Example 3

Enumeration values ​​of partitions p1, p2 to be replaced (=> union):

(1, 2, 3), (4, 5, 6) => (1, 2, 3, 4, 5, 6)

Replace the enumeration values ​​of partitions tp1, tp2, tp3 (=> union):

(1, 2, 3), (4), (5, 6) => (1, 2, 3, 4, 5, 6)

The union of enumeration values ​​is the same, you can use tp1, tp2, tp3 to replace p1, p2

Example 4

Enumeration values ​​of partitions p1, p2, p3 to be replaced (=> union):

((“1”,“beijing”), (“1”, “shanghai”)), ((“2”,“beijing”), (“2”, “shanghai”)), ((“3”, "beijing"), ("3", "shanghai")) => (("1","beijing"), ("1", "shanghai"), ("2","beijing"), ( "2", "shanghai"), ("3", "beijing"), ("3", "shanghai"))

Replace the enumeration values ​​of partitions tp1, tp2 (=> union):

((“1”,“beijing”), (“1”, “shanghai”)), ((“2”,“beijing”), (“2”, “shanghai”), (“3”,“beijing” "), ("3", "shanghai")) => (("1","beijing"), ("1", "shanghai"), ("2","beijing"), ("2 ", "shanghai"), ("3", "beijing"), ("3", "shanghai"))

The union of enumeration values ​​is the same, you can use tp1, tp2 to replace p1, p2, p3

2、use_temp_partition_name

Default is false. When this parameter is false and the number of partitions to be replaced is the same as the number of replacement partitions, the official partition name after replacement remains unchanged. If true, after replacement, the name of the official partition is the name of the replacement partition. Here are some examples:

Example 1

ALTER TABLE tbl1 REPLACE PARTITION (p1) WITH TEMPORARY PARTITION (tp1);

use_temp_partition_name defaults to false. After replacement, the partition name is still p1, but the related data and attributes are replaced with tp1.

If use_temp_partition_name defaults to true, the partition's name is tp1 after replacement. The p1 partition no longer exists.

Example 2

ALTER TABLE tbl1 REPLACE PARTITION (p1, p2) WITH TEMPORARY PARTITION (tp1);

use_temp_partition_name defaults to false, but because the number of partitions to be replaced is different from the number of replacement partitions, this parameter is invalid. After the replacement, the partition name is tp1, and p1 and p2 no longer exist.

Some instructions for replacement operations:

After the partition replacement is successful, the replaced partition will be deleted and cannot be recovered.

Import and query of temporary partitions

Users can import data into temporary partitions or specify temporary partitions for query.

1. Import temporary partition

Depending on the import method, the syntax for specifying the temporary partition to import is slightly different. Here is a simple explanation with an example

INSERT INTO tbl TEMPORARY PARTITION(tp1, tp2, ...) SELECT ....

curl --location-trusted -u root: -H "label:123" -H "temporary_partitions: tp1, tp2, ..." -T testData http://host:port/api/testDb/testTbl/_stream_load    


LOAD LABEL example_db.label1
(
DATA INFILE("hdfs://hdfs_host:hdfs_port/user/palo/data/input/file")
INTO TABLE `my_table`
TEMPORARY PARTITION (tp1, tp2, ...)
...
)
WITH BROKER hdfs ("username"="hdfs_user", "password"="hdfs_password");


CREATE ROUTINE LOAD example_db.test1 ON example_tbl
COLUMNS(k1, k2, k3, v1, v2, v3 = k1 * 100),
TEMPORARY PARTITIONS(tp1, tp2, ...),
WHERE k1 > 100
PROPERTIES
(...)
FROM KAFKA
(...);




2. Query the temporary partition

SELECT ... FROM
tbl1 TEMPORARY PARTITION(tp1, tp2, ...)
JOIN
tbl2 TEMPORARY PARTITION(tp1, tp2, ...)
ON ...
WHERE ...;

Relationship with other operations

DROP

After directly deleting the database or table using the Drop operation, the database or table can be restored through the Recover command (within a limited time), but the temporary partition will not be restored.
After using the Alter command to delete the official partition, you can use the Recover command to restore the partition (within a limited time). The operation of formal partitions and temporary partitions has nothing to do with it.
After the temporary partition is deleted using the Alter command, the temporary partition cannot be restored through the Recover command.

TRUNCATE

Use the Truncate command to clear the table. The temporary partition of the table will be deleted and cannot be recovered.
When you use the Truncate command to clear the formal partition, the temporary partition will not be affected.
The Truncate command cannot be used to clear the temporary partition.

ALTER

When the table has temporary partitions, you cannot use the Alter command to perform Schema Change, Rollup and other modification operations on the table.
Temporary partitions cannot be added to the table while the table is undergoing modification operations.

Best Practices

1. Atomic overwrite operation

In some cases, users want to be able to rewrite the data of a certain partition. However, if they delete it first and then import it, they will not be able to view the data for a period of time. At this time, the user can first create a corresponding temporary partition, import the new data into the temporary partition, and then atomically replace the original partition through the replacement operation to achieve the purpose. For atomic overwrite writes to non-partitioned tables, see the replacement table documentation

2. Modify the number of buckets

In some cases, users use an inappropriate number of buckets when creating partitions. Then the user can first create a temporary partition corresponding to the partition range and specify the new number of buckets. Then use the INSERT INTO command to import the data of the formal partition into the temporary partition, and use the replacement operation to atomically replace the original partition to achieve the purpose.

3. Merge or split partitions

In some cases, users want to modify the range of partitions, such as merging two partitions, or dividing a large partition into multiple small partitions. Then the user can first create a temporary partition corresponding to the merged or split range, and then import the data of the formal partition into the temporary partition through the INSERT INTO command, and atomically replace the original partition through the replacement operation to achieve the purpose.

Guess you like

Origin blog.csdn.net/qq_44696532/article/details/134153881