Doris

modify table

modify table name

-- 1.将名为 table1 的表修改为 table2
ALTER TABLE table1 RENAME table2;

-- 示例
ALTER TABLE aggregate_test RENAME aggregate_test1;

-- 2.将表 example_table 中名为 rollup1 的 rollup index 修改为 rollup2
ALTER TABLE base_table_name RENAME ROLLUP old_rollup_name new_rollup_name;

ALTER TABLE ex_user RENAME ROLLUP rollup_u_cost new_rollup_u_cost;

desc ex_user all;

-- 3.将表 example_table 中名为 p1 的 partition 修改为 p2
ALTER TABLE example_table RENAME PARTITION old_partition_name new_partition_name ;

-- 示例:
ALTER TABLE expamle_range_tbl RENAME PARTITION p201701 newp201701; 

show partitions from expamle_range_tbl \G;

Table structure changes

Users can modify the schema of an existing table through the Schema Change operation. Currently Doris supports the following modifications:
• Add and delete columns
• Modify column types
• Adjust column order
• Add and modify Bloom Filter index
• Add and delete bitmap index

Principle introduction

The basic process of executing Schema Change is to generate a new Schema Index data through the original Index data. Among them, two parts of data conversion are mainly required:
one is the conversion of existing historical data;
the other is the conversion of newly arrived imported data during the execution of Schema Change.

Creating a job
The creation of Schema Change is an asynchronous process. After the job is submitted successfully, the user needs to view the job progress through the SHOW ALTER TABLE COLUMN command.

-- 语法:
ALTER TABLE [database.]table alter_clause;

The alter_clause of schema change supports the following modification methods:
1. Add a column to the specified position of the specified index

ALTER TABLE db.table_name
-- 如果增加的是key列 那么,需要在 列类型后面增加key 这个关键字
-- 如果增加的是value列 那么,是聚合表模型,需要指定列的聚合类型   如果是明细模型和唯一模型,不需要指定
ADD COLUMN column_name column_type [KEY | agg_type] [DEFAULT "default_value"]
[AFTER column_name|FIRST]  -- 确定列的位置   如果不写,默认插在最后
[TO rollup_index_name]   -- 如果你是针对rollup表新增一个列,那么这个列明基表中不能有
[PROPERTIES ("key"="value", ...)]

-- 明细模型中添加value列
ALTER TABLE test.expamle_range_tbl ADD COLUMN abc varchar AFTER age;

-- 明细模型中添加key 列
ALTER TABLE test.expamle_range_tbl ADD COLUMN abckey varchar key AFTER user_id;

-- 聚合模型中添加一个value列
mysql> ALTER TABLE test.ex_user ADD COLUMN abckey int sum AFTER cost;

Notice:

  • If you add a value column to the aggregation model, you need to specify agg_type
  • If you add a key column to a non-aggregate model (such as DUPLICATE KEY), you need to specify the KEY keyword
  • You cannot add columns that already exist in the base index to the rollup index (you can recreate a rollup index if necessary)

Example:

-- 源schema:

+-----------+-------+------+------+------+---------+-------+
| IndexName | Field | Type | Null | Key  | Default | Extra |
+-----------+-------+------+------+------+---------+-------+
| tbl1      | k1    | INT  | No   | true | N/A     |       |
|           | k2    | INT  | No   | true | N/A     |       |
|           | k3    | INT  | No   | true | N/A     |       |
|           |       |      |      |      |         |       |
| rollup2   | k2    | INT  | No   | true | N/A     |       |
|           |       |      |      |      |         |       |
| rollup1   | k1    | INT  | No   | true | N/A     |       |
|           | k2    | INT  | No   | true | N/A     |       |
+-----------+-------+------+------+------+---------+-------+

-- 源schema中没有k4和k5列,所以可以往rollup表中添加 k4和k5列,在往rollup表中添加的过程,也会往base表中添加一份
ALTER TABLE tbl1
ADD COLUMN k4 INT default "1" to rollup1,
ADD COLUMN k4 INT default "1" to rollup2,
ADD COLUMN k5 INT default "1" to rollup2;

-- 改变完成后,Schema 变为       base表中也会相应的添加k4和k5
+-----------+-------+------+------+------+---------+-------+
| IndexName | Field | Type | Null | Key  | Default | Extra |
+-----------+-------+------+------+------+---------+-------+
| tbl1      | k1    | INT  | No   | true | N/A     |       |
|           | k2    | INT  | No   | true | N/A     |       |
|           | k3    | INT  | No   | true | N/A     |       |
|           | k4    | INT  | No   | true | 1       |       |
|           | k5    | INT  | No   | true | 1       |       |
|           |       |      |      |      |         |       |
| rollup2   | k2    | INT  | No   | true | N/A     |       |
|           | k4    | INT  | No   | true | 1       |       |
|           | k5    | INT  | No   | true | 1       |       |
|           |       |      |      |      |         |       |
| rollup1   | k1    | INT  | No   | true | N/A     |       |
|           | k2    | INT  | No   | true | N/A     |       |
|           | k4    | INT  | No   | true | 1       |       |
+-----------+-------+------+------+------+---------+-------+

-- 这样的导入方式错误
-- 因为base表中已经存在k3,导入的时候无法将base表中在添加一个叫k3的列,重复
ALTER TABLE tbl1
ADD COLUMN k3 INT default "1" to rollup1

2. Add multiple columns to the specified index

ALTER TABLE db.table_name
ADD COLUMN (column_name1 column_type [KEY | agg_type] DEFAULT "default_value", ...)
[TO rollup_index_name]
[PROPERTIES ("key"="value", ...)]

-- 添加的时候根据key和value列,添加在对应的列之后
ALTER TABLE test.expamle_range_tbl ADD COLUMN (abc int,bcd int);

mysql> ALTER TABLE test.expamle_range_tbl ADD COLUMN (a int key ,b int);
Query OK, 0 rows affected (0.01 sec)

mysql> desc expamle_range_tbl all;


3. Delete a column from the specified index

ALTER TABLE db.table_name
DROP COLUMN column_name
[FROM rollup_index_name]

-- 删除明细表中的value列
ALTER TABLE test.expamle_range_tbl DROP COLUMN abc;

-- 删除明细表中的key列
ALTER TABLE test.expamle_range_tbl DROP COLUMN abckey;

-- 删除聚合模型中的value列
ALTER TABLE test.ex_user DROP COLUMN abckey;

-- 注意:
-- 不能删除分区列
-- 如果是从 base index 中删除列,则如果 rollup index 中包含该列,也会被删除

4. Modify the column type and column position of the specified index

ALTER TABLE db.table_name
MODIFY COLUMN column_name column_type [KEY | agg_type] [NULL | NOT NULL] [DEFAULT "default_value"]
[AFTER column_name|FIRST]
[FROM rollup_index_name]
[PROPERTIES ("key"="value", ...)]

-- 注意:
-- 聚合模型如果修改 value 列,需要指定 agg_type
-- 非聚合类型如果修改key列,需要指定KEY关键字
-- 分区列和分桶列不能做任何修改

5. Reorder the columns of the specified index

ALTER TABLE db.table_name
ORDER BY (column_name1, column_name2, ...)
[FROM rollup_index_name]
[PROPERTIES ("key"="value", ...)]

-- 注意:
-- index 中的所有列都要写出来
-- value 列在 key 列之后

Example:

-- 1.向 example_rollup_index 的 col1 后添加一个key列 new_col(非聚合模型)
ALTER TABLE example_db.my_table
ADD COLUMN new_col INT KEY DEFAULT "0" AFTER col1
TO example_rollup_index;

-- 2.向example_rollup_index的col1后添加一个value列new_col(非聚合模型)
ALTER TABLE example_db.my_table  
ADD COLUMN new_col INT DEFAULT "0" AFTER col1 
TO example_rollup_index;

-- 3.向example_rollup_index的col1后添加一个key列new_col(聚合模型)
ALTER TABLE example_db.my_table   
ADD COLUMN new_col INT DEFAULT "0" AFTER col1    
TO example_rollup_index;

-- 4.向example_rollup_index的col1后添加一个value列new_col SUM聚合类型(聚合模型)
ALTER TABLE example_db.my_table   
ADD COLUMN new_col INT SUM DEFAULT "0" AFTER col1    
TO example_rollup_index;

-- 5.向 example_rollup_index 添加多列(聚合模型)
ALTER TABLE example_db.my_table
ADD COLUMN (col1 INT DEFAULT "1", col2 FLOAT SUM DEFAULT "2.3")
TO example_rollup_index;

-- 6.从 example_rollup_index 删除一列
ALTER TABLE example_db.my_table
DROP COLUMN col2
FROM example_rollup_index;

-- 7.修改 base index 的 key 列 col1 的类型为 BIGINT,并移动到 col2 列后面。
ALTER TABLE example_db.my_table 
MODIFY COLUMN col1 BIGINT KEY DEFAULT "1" AFTER col2;

-- 注意:无论是修改 key 列还是 value 列都需要声明完整的 column 信息

-- 8.修改 base index 的 val1 列最大长度。原 val1 为 (val1 VARCHAR(32) REPLACE DEFAULT "abc")
ALTER TABLE example_db.my_table 
MODIFY COLUMN val1 VARCHAR(64) REPLACE DEFAULT "abc";

-- 9.重新排序 example_rollup_index 中的列(设原列顺序为:k1,k2,k3,v1,v2)
ALTER TABLE example_db.my_table
ORDER BY (k3,k1,k2,v2,v1)
FROM example_rollup_index;

-- 10.同时执行两种操作
ALTER TABLE example_db.my_table
ADD COLUMN v2 INT MAX DEFAULT "0" AFTER k2 TO example_rollup_index,
ORDER BY (k3,k1,k2,v2,v1) FROM example_rollup_index;

view assignment

SHOW ALTER TABLE COLUMN can view the currently executing or completed Schema Change jobs. When a Schema Change job involves multiple Indexes, the command will display multiple lines, each line corresponding to an Index

SHOW ALTER TABLE COLUMN\G;
*************************** 1. row ***************************
        JobId: 20021
    TableName: tbl1
   CreateTime: 2019-08-05 23:03:13
   FinishTime: 2019-08-05 23:03:42
    IndexName: tbl1
      IndexId: 20022
OriginIndexId: 20017
SchemaVersion: 2:792557838
TransactionId: 10023
        State: FINISHED
          Msg: 
     Progress: NULL
      Timeout: 86400
1 row in set (0.00 sec)

-- JobId:每个 Schema Change 作业的唯一 ID。
-- TableName:Schema Change 对应的基表的表名。
-- CreateTime:作业创建时间。
-- FinishedTime:作业结束时间。如未结束,则显示 "N/A"。
-- IndexName: 本次修改所涉及的某一个 Index 的名称。
-- IndexId:新的 Index 的唯一 ID。
-- OriginIndexId:旧的 Index 的唯一 ID。
-- SchemaVersion:以 M:N 的格式展示。其中 M 表示本次 Schema Change 变更的版本,N 表示对应的 Hash 值。每次 Schema Change,版本都会递增。
-- TransactionId:转换历史数据的分水岭 transaction ID。
-- State:作业所在阶段。
-- 	PENDING:作业在队列中等待被调度。
-- 	WAITING_TXN:等待分水岭 transaction ID 之前的导入任务完成。
-- 	RUNNING:历史数据转换中。
-- 	FINISHED:作业成功。
-- 	CANCELLED:作业失败。
-- Msg:如果作业失败,这里会显示失败信息。
-- Progress:作业进度。只有在 RUNNING 状态才会显示进度。进度是以 M/N 的形式显示。其中 N 为 Schema Change 涉及的总副本数。M 为已完成历史数据转换的副本数。
-- Timeout:作业超时时间。单位秒。

cancel job

When the job status is not FINISHED or CANCELLED, the Schema Change job can be canceled by the following command:

CANCEL ALTER TABLE COLUMN FROM tbl_name;

Notes
• There can only be one Schema Change job running for a table at a time.
• Schema Change operations do not block import and query operations.
• Partition and bucket columns cannot be modified.
• If there is a value column aggregated in REPLACE mode in the schema, the key column is not allowed to be deleted.
• If the Key column is deleted, Doris cannot determine the value of the REPLACE column.
• All non-Key columns of the Unique data model table are REPLACE aggregated.
• When adding a value column whose aggregation type is SUM or REPLACE, the default value of this column has no meaning for historical data.
• Since the historical data has lost detailed information, the value of the default value does not actually reflect the aggregated value.
• When modifying the column type, fields other than Type need to be completed according to the information on the original column.
• If the type of column k1 INT SUM NULL DEFAULT "1" is changed to BIGINT, execute the command as follows:
• ALTER TABLE tbl1 MODIFY COLUMN k1 BIGINT SUM NULL DEFAULT "1";
• Note that in addition to new column types, such as aggregation , Nullable attributes, and default values ​​must be completed according to the original information.
• Modifying column names, aggregation types, Nullable properties, default values, and column comments is not supported.

increase or decrease of partition

-- 1.增加分区, 使用默认分桶方式:现有分区 \[MIN, 2013-01-01),增加分区 \[2013-01-01, 2014-01-01)
ALTER TABLE example_db.my_table ADD PARTITION p1 VALUES LESS THAN ("2014-01-01");

-- 2.增加分区,使用新的分桶数
ALTER TABLE example_db.my_table ADD PARTITION p1 VALUES LESS THAN ("2015-01-01") 
DISTRIBUTED BY HASH(k1) BUCKETS 20; 

-- 3.增加分区,使用新的副本数 
ALTER TABLE example_db.my_table ADD PARTITION p1 VALUES LESS THAN ("2015-01-01") 
("replication_num"="1"); 

-- 4.修改分区副本数 
ALTER TABLE example_db.my_table MODIFY PARTITION p1 SET("replication_num"="1"); 
-- 5.批量修改指定分区
ALTER TABLE example_db.my_table MODIFY PARTITION (p1, p2, p4) SET("in_memory"="true"); 

-- 6.批量修改所有分区 
ALTER TABLE example_db.my_table MODIFY PARTITION (*) SET("storage_medium"="HDD"); 

-- 7.删除分区 
ALTER TABLE example_db.my_table DROP PARTITION p1; 
-- 8.增加一个指定上下界的分区 
ALTER TABLE example_db.my_table ADD PARTITION p1 VALUES [("2014-01-01"), ("2014-02-01")); 

Increase and decrease of rollup

-- 1.创建 index: example_rollup_index,基于 base index(k1,k2,k3,v1,v2)。列式存储。 
ALTER TABLE example_db.my_table ADD ROLLUP example_rollup_index(k1, k3, v1, v2);

-- 2.创建 index: example_rollup_index2,基于 example_rollup_index(k1,k3,v1,v2)
ALTER TABLE example_db.my_table ADD ROLLUP example_rollup_index2 (k1, v1) 
FROM example_rollup_index;

-- 3.创建 index: example_rollup_index3, 基于base index (k1,k2,k3,v1), 自定义rollup超时时间一小时
ALTER TABLE example_db.my_table ADD ROLLUP example_rollup_index(k1, k3, v1) 
PROPERTIES("timeout" = "3600"); 

-- 4.删除 index: example_rollup_index2
ALTER TABLE example_db.my_table DROP ROLLUP example_rollup_index2; 

Dynamic Partitions and Temporary Partitions

dynamic partition

principle

In some usage scenarios, the user will divide the table into partitions by day, and perform routine tasks regularly every day. At this time, the user needs to manually manage the partitions. Otherwise, the data import may fail because the user has not created a partition. Incurred additional maintenance costs. Through the dynamic partition function, users can set the rules of dynamic partition when creating a table. FE starts a background thread to create or delete partitions based on user-specified rules. Users can also make changes to existing rules at runtime.

How to use

The rules of dynamic partitioning can be specified when creating a table, or modified at runtime. Currently only supports setting dynamic partition rules for partition tables with a single partition column

-- 建表时指定
CREATE TABLE tbl1
(...)
PROPERTIES
(
-- 添加动态分区的规则
    "dynamic_partition.prop1" = "value1",
    "dynamic_partition.prop2" = "value2",
    ...
)

-- 运行时修改
ALTER TABLE tbl1 SET
(
    "dynamic_partition.prop1" = "value1",
    "dynamic_partition.prop2" = "value2",
    ...
)

Dynamic Partition Rule Parameters

  1. dynamic_partition.enable: Whether to enable the dynamic partition feature. The default is true
  2. dynamic_partition.time_unit: The unit of dynamic partition scheduling. Can be specified as HOUR, DAY, WEEK, MONTH. Respectively represent creating or deleting partitions by hour, day, week, and month.
  3. dynamic_partition.time_zone: the time zone of the dynamic partition, if not filled in, it defaults to the time zone of the current machine's system
  4. dynamic_partition.start: The starting offset of the dynamic partition, which is a negative number. Based on the current day (week/month), the partitions whose partition range is before this offset will be deleted. If not filled, the default value is -2147483648, that is, the history partition will not be deleted.
  5. dynamic_partition.end: The end offset of the dynamic partition, which is a positive number. According to the time_unit attribute, based on the current day (week/month), create partitions of the corresponding range in advance.
  6. dynamic_partition.prefix: Dynamically created partition name prefix.
  7. dynamic_partition.buckets: the number of buckets corresponding to dynamically created partitions
  8. dynamic_partition.replication_num: the number of replicas corresponding to the dynamically created partition, if not filled, the default is the number of replicas specified when the table was created
  9. dynamic_partition.start_day_of_week: When time_unit is WEEK, this parameter is used to specify the starting point of each week. Values ​​are 1 to 7. Where 1 means Monday and 7 means Sunday. The default is 1, which means that every week starts from Monday.
  10. dynamic_partition.start_day_of_month: When time_unit is MONTH, this parameter is used to specify the start day of each month. Values ​​are 1 to 28. Among them, 1 means the 1st of every month, and 28 means the 28th of every month. The default is 1, which means that every month starts with the 1st position. Temporarily does not support 29, 30, 31 as the starting date to avoid ambiguity caused by leap year or leap month
  11. dynamic_partition.create_history_partition: when true, the representative can create a historical partition, the default is false
  12. dynamic_partition.history_partition_num: When create_history_partition is true, this parameter is used to specify the number of historical partitions to be created. The default value is -1, which is not set.
  13. dynamic_partition.hot_partition_num: Specifies how many of the latest partitions are hot partitions. For hot partitions, the system will automatically set its storage_medium parameter to SSD, and set storage_cooldown_time. hot_partition_num: Set the previous n days and all partitions in the future as hot partitions, and automatically set the cooling time
  14. dynamic_partition.reserved_history_periods: The time range of historical partitions that need to be reserved.

Modify dynamic partition properties

ALTER TABLE tbl1 SET
(
    "dynamic_partition.prop1" = "value1",
    ...
);


ALTER TABLE partition_test SET
(
    "dynamic_partition.time_unit" = "week",
    "dynamic_partition.start" = "-1",
    "dynamic_partition.end" = "1"
);

Modifications of certain properties may create conflicts. Assume that the previous partition granularity is DAY, and the following partitions have been created:

p20200519: ["2020-05-19", "2020-05-20")
p20200520: ["2020-05-20", "2020-05-21")
p20200521: ["2020-05-21", "2020-05-22")

If the partition granularity is changed to MONTH at this time, the system will try to create a partition with the range ["2020-05-01", "2020-06-01"), and the partition range of this partition conflicts with the existing partition, so Unable to create. However, partitions with the range ["2020-06-01", "2020-07-01") can be created normally. Therefore, the partition from 2020-05-22 to 2020-05-30 needs to be filled by itself.

View dynamic partition table scheduling

-- 通过以下命令可以进一步查看当前数据库下,所有动态分区表的调度情况:
SHOW DYNAMIC PARTITION TABLES;

-- LastUpdateTime: 最后一次修改动态分区属性的时间
-- LastSchedulerTime: 最后一次执行动态分区调度的时间
-- State: 最后一次执行动态分区调度的状态
-- LastCreatePartitionMsg: 最后一次执行动态添加分区调度的错误信息
-- LastDropPartitionMsg: 最后一次执行动态删除分区调度的错误信息

temporary partition

rule

• The partition column of the temporary partition is the same as that of the formal partition and cannot be modified.
• The partition ranges of all temporary partitions of a table cannot overlap, but the ranges of temporary partitions and formal partitions can overlap.
• The partition name of the temporary partition cannot be the same as that of the official partition and other temporary partitions.

operate

Temporary partitions support add, delete, and replace operations.

add temporary partition

Temporary partitions can be added to a table with the ALTER TABLE ADD TEMPORARY PARTITION statement:

ALTER TABLE tbl1 ADD TEMPORARY PARTITION tp1 VALUES LESS THAN("2020-02-01");


ALTER TABLE tbl1 ADD TEMPORARY PARTITION tp2 VALUES LESS THAN("2020-02-02")
("in_memory" = "true", "replication_num" = "1")
DISTRIBUTED BY HASH(k1) BUCKETS 5;


ALTER TABLE tbl3 ADD TEMPORARY PARTITION tp1 VALUES IN ("Beijing", "Shanghai");

ALTER TABLE tbl3 ADD TEMPORARY PARTITION tp1 VALUES IN ("Beijing", "Shanghai")
("in_memory" = "true", "replication_num" = "1")
DISTRIBUTED BY HASH(k1) BUCKETS 5;


-- 添加操作的一些说明:
-- 临时分区的添加和正式分区的添加操作相似。临时分区的分区范围独立于正式分区。
-- 临时分区可以独立指定一些属性。包括分桶数、副本数、是否是内存表、存储介质等信息。

delete temporary partition

-- 可以通过 ALTER TABLE DROP TEMPORARY PARTITION 语句删除一个表的临时分区:
ALTER TABLE tbl1 DROP TEMPORARY PARTITION tp1;
-- 删除临时分区,不影响正式分区的数据。

replace partition

Formal partitions of a table can be replaced with temporary partitions by the ALTER TABLE REPLACE PARTITION statement.

-- 正式分区替换成临时分区以后,正是分区的数据会被删除,并且这个过程是不可逆的
-- 用之前要小心
ALTER TABLE tbl1 REPLACE PARTITION (p1) WITH TEMPORARY PARTITION (tp1);

ALTER TABLE partition_test REPLACE PARTITION (p20230104) WITH TEMPORARY PARTITION (tp1);

ALTER TABLE tbl1 REPLACE PARTITION (p1, p2) WITH TEMPORARY PARTITION (tp1, tp2)
PROPERTIES (
    "strict_range" = "false",
    "use_temp_partition_name" = "true"
);

-- strict_range:默认为 true。
-- 	对于 Range 分区,当该参数为 true 时,表示要被替换的所有正式分区的范围并集需要和替换的临时分区的范围并集完全相同。当置为 false 时,只需要保证替换后,新的正式分区间的范围不重叠即可。
-- 	对于 List 分区,该参数恒为 true。要被替换的所有正式分区的枚举值必须和替换的临时分区枚举值完全相同。
-- use_temp_partition_name:默认为 false。当该参数为 false,并且待替换的分区和替换分区的个数相同时,则替换后的正式分区名称维持不变。如果为 true,则替换后,正式分区的名称为替换分区的名称。


LTER TABLE tbl1 REPLACE PARTITION (p1) WITH TEMPORARY PARTITION (tp1);
-- use_temp_partition_name 默认为 false,则在替换后,分区的名称依然为 p1,但是相关的数据和属性都替换为 tp1 的。 如果 use_temp_partition_name 默认为 true,则在替换后,分区的名称为 tp1。p1 分区不再存在。

ALTER TABLE tbl1 REPLACE PARTITION (p1, p2) WITH TEMPORARY PARTITION (tp1);
-- use_temp_partition_name 默认为 false,但因为待替换分区的个数和替换分区的个数不同,则该参数无效。替换后,分区名称为 tp1,p1 和 p2 不再存在。


-- 替换操作的一些说明:
-- 分区替换成功后,被替换的分区将被删除且不可恢复。

Data import and query

Import temporary partition

Depending on the import method, the syntax for specifying a temporary partition to import varies slightly. Here is a brief explanation with an example

-- 查询结果用insert导入
INSERT INTO tbl TEMPORARY PARTITION(tp1, tp2, ...) SELECT ....

-- 查看数据
SELECT ... FROM
tbl1 TEMPORARY PARTITION(tp1, tp2, ...)
JOIN
tbl2 TEMPORARY PARTITION(tp1, tp2, ...)
ON ...
WHERE ...;

The optimization principle of join in doris

Shuffle Join(Partitioned Join)

The same as the shuffle process in mr, shuffle is performed on the data on each node, and the join method of distributing the same data to downstream nodes is called shuffle join

-- 订单表
CREATE TABLE  test.order_info_shuffle
(
 `order_id` varchar(20) COMMENT "订单id",
 `user_id` varchar(20) COMMENT "用户id",
 `goods_id` VARCHAR(20) COMMENT "商品id",
 `goods_num` Int COMMENT "商品数量",
 `price` double COMMENT "商品价格"
)
duplicate KEY(`order_id`)
DISTRIBUTED BY HASH(`order_id`) BUCKETS 5;

-- 导入数据:
insert into test.order_info_shuffle values\
('o001','u001','g001',1,9.9 ),\
('o001','u001','g002',2,19.9),\
('o001','u001','g003',2,39.9),\
('o002','u002','g001',3,9.9 ),\
('o002','u002','g002',1,19.9),\
('o003','u002','g003',1,39.9),\
('o003','u002','g002',2,19.9),\
('o003','u002','g004',3,99.9),\
('o003','u002','g005',1,99.9),\
('o004','u003','g001',2,9.9 ),\
('o004','u003','g002',1,19.9),\
('o004','u003','g003',4,39.9),\
('o004','u003','g004',1,99.9),\
('o004','u003','g005',4,89.9);


-- 商品表
CREATE TABLE  test.goods_shuffle
(
 `goods_id` VARCHAR(20) COMMENT "商品id",
 `goods_name`  VARCHAR(20) COMMENT "商品名称",
 `category_id` VARCHAR(20) COMMENT "商品品类id"
)
duplicate KEY(`goods_id`)
DISTRIBUTED BY HASH(`goods_id`) BUCKETS 5;

-- 导入数据:
insert into test.goods_shuffle values\
('g001','iphon13','c001'),\
('g002','ipad','c002'),\
('g003','xiaomi12','c001'),\
('g004','huaweip40','c001'),\
('g005','headset','c003');


-- sql示例
EXPLAIN 
select 
oi.order_id,
oi.user_id,
oi.goods_id,
gs.goods_name,
gs.category_id,
oi.goods_num,
oi.price
from order_info_shuffle as oi
-- 我们可以不指定哪一种join方式,doris会自己根据数据的实际情况帮我们选择
JOIN goods_shuffle as gs
on oi.goods_id = gs.goods_id;

EXPLAIN select 
oi.order_id,
oi.user_id,
oi.goods_id,
gs.goods_name,
gs.category_id,
oi.goods_num,
oi.price
from order_info_shuffle as oi
-- 可以显式的hint指定我们想要的join类型
JOIN [broadcast] goods_shuffle as gs
on oi.goods_id = gs.goods_id;

Applicable scenarios: Regardless of the amount of data, it can be used regardless of the amount of data, whether it is a large table join or a large table join small table. Advantages: General
Disadvantages
: Shuffle memory and network overhead are relatively large, and the efficiency is not high

Broadcast Join

When a large table joins a small table, the method of broadcasting the small table to each node where each large table is located (stored in memory in the form of a hash table) is called Broadcast Join, which is similar to a map in mr side join

-- 显式使用 Broadcast Join:
EXPLAIN 
select 
oi.order_id,
oi.user_id,
oi.goods_id,
gs.goods_name,
gs.category_id,
oi.goods_num,
oi.price
from order_info_broadcast as oi
JOIN [broadcast] goods_broadcast as gs
on oi.goods_id = gs.goods_id;

Applicable scenario:
Join the left table to the right table, requiring relatively large amount of data in the left table and relatively small amount of data in the right table. Advantages:
Avoid shuffle and improve computing efficiency.
Disadvantages: There are restrictions, and the amount of data in the right table must be relatively small

Bucket Shuffle Join

Using the feature of bucketing when creating a table, when joining, when the join condition is the same as the bucketing field of the left table, the right table is shuffled according to the bucketing rules of the left table, so that the data that needs to be joined in the right table The join method that falls on the BE node that needs to join data in the left table is called Bucket Shuffle Join.

-- 从 0.14 版本开始默认为 true,新版本可以不用设置这个参数了!
show variables like '%bucket_shuffle_join%'; 
set enable_bucket_shuffle_join = true;
-- 通过 explain 查看 join 类型
EXPLAIN 
select 
oi.order_id,
oi.user_id,
oi.goods_id,
gs.goods_name,
gs.category_id,
oi.goods_num,
oi.price
from order_info_bucket as oi
-- 目前 Bucket Shuffle Join不能像Shuffle Join那样可以显示指定Join方式,
-- 只能让执行引擎自动选择,
-- 选择的顺序:Colocate Join -> Bucket Shuffle Join -> Broadcast Join -> Shuffle Join。
JOIN goods_bucket as gs
where oi.goods_id = gs.goods_id;




EXPLAIN select 
oi.order_id,
oi.user_id,
oi.goods_id,
gs.goods_name,
gs.category_id,
oi.goods_num,
oi.price
from order_info_bucket as oi
-- 目前 Bucket Shuffle Join不能像Shuffle Join那样可以显示指定Join方式,
-- 只能让执行引擎自动选择,
-- 选择的顺序:Colocate Join -> Bucket Shuffle Join -> Broadcast Join -> Shuffle Join。
JOIN goods_bucket1 as gs
where oi.goods_id = gs.goods_id;

-- 注意事项:
-- Bucket Shuffle Join 只生效于 Join 条件为等值的场景
-- Bucket Shuffle Join 要求左表的分桶列的类型与右表等值 join 列的类型需要保持一致,否则无法进行对应的规划。 
-- Bucket Shuffle Join 只作用于 Doris 原生的 OLAP 表,对于 ODBC,MySQL,ES 等外表,当其作为左表时是无法规划生效的。 
-- Bucket Shuffle Join只能保证左表为单分区时生效。所以在 SQL 执行之中,需要尽量使用 where 条件使分区裁剪的策略能够生效。

Colocation Join

The Chinese meaning is called location collaborative grouping join, which means that the two pieces of data that need to be joined are on the same BE node, so that when joining, the local join calculation can be performed directly without shuffle.

Glossary
• Colocation Group (location coordination group CG): Tables in the same CG have the same Colocation Group Schema and the same data fragmentation distribution (three conditions are met).
• Colocation Group Schema (CGS): used to describe the Table in a CG and the general Schema information related to Colocation. Including the bucket column type, the number of buckets, and the number of copies of the partition, etc.

usage restrictions

  1. When creating a table, the type and number of bucket columns in the two tables must be exactly the same, and the number of buckets must be the same, so as to ensure that the data shards of multiple tables can be distributed and controlled in one-to-one correspondence.
  2. The number of copies of all partitions (Partition) of all tables in the same CG must be consistent. If inconsistent, there may be a certain copy of a certain tablet, and there is no corresponding copy of other table fragments on the same BE
  3. For tables in the same CG, the number, range, and type of partition columns are not required to be consistent.

Use Cases

-- 建两张表,分桶列都为 int 类型,且桶的个数都是 5 个。副本数都为默认副本数

-- 编写查询语句,并查看执行计划
EXPLAIN 
select 
oi.order_id,
oi.user_id,
oi.goods_id,
gs.goods_name,
gs.category_id,
oi.goods_num,
oi.price
from order_info_colocation as oi
-- 目前 Colocation Join不能像Shuffle Join那样可以显示指定Join方式,
-- 只能让执行引擎自动选择,
-- 选择的顺序:Colocate Join -> Bucket Shuffle Join -> Broadcast Join -> Shuffle Join。
JOIN goods_colocation as gs
where oi.goods_id = gs.goods_id;


-- 查看 Group
SHOW PROC '/colocation_group';

-- 当 Group 中最后一张表彻底删除后(彻底删除是指从回收站中删除。通常,一张表通过DROP TABLE 命令删除后,会在回收站默认停留一天的时间后,再删除),该 Group 也会被自动删除。
-- 修改表 Colocate Group 属性
ALTER TABLE tbl SET ("colocate_with" = "group2");
-- 如果被修改的表原来有group,那么会直接将原来的group删除后创建新的group, 如果原来没有组,就直接创建

-- 删除表的 Colocation 属性
ALTER TABLE tbl SET ("colocate_with" = ""); 
-- 当对一个具有 Colocation 属性的表进行增加分区(ADD PARTITION)、修改副本数时,Doris 会检查修改是否会违反 Colocation Group Schema,如果违反则会拒绝

Runtime Filter

Runtime Filter will create a HashJoinNode and a ScanNode to filter and optimize the join data when the sql with the join action is running, so that the amount of data is reduced during the join, thereby improving efficiency

use

-- 指定 RuntimeFilter 类型 
set runtime_filter_type="BLOOM_FILTER,IN,MIN_MAX";

set runtime_filter_type="MIN_MAX";


Parameter explanation:

  • runtime_filter_type: 包括Bloom Filter、MinMax Filter、IN predicate、IN Or Bloom Filter
    • Bloom Filter: mark all the data in the join field in the right table in a Bloom filter, so as to judge whether the data that needs to be joined in the left table is present or not
    • MinMax Filter: Get the maximum and minimum values ​​of the data in the right table, look at the left table, and filter out the data that exceeds the range of the maximum and minimum values
    • IN predicate: Construct an IN predicate for all the data in the right table that needs to be joined, and then go to the left table to filter meaningless data
  • runtime_filter_wait_time_ms: The time for the ScanNode in the left table to wait for each Runtime Filter, the default is 1000ms
  • runtime_filters_max_num: The maximum number of Bloom Filters in Runtime Filters that can be applied to each query, default 10
  • runtime_bloom_filter_min_size: The minimum length of Bloom Filter in Runtime Filter, the default is 1M
  • runtime_bloom_filter_max_size: The maximum length of Bloom Filter in Runtime Filter, the default is 16M
  • runtime_bloom_filter_size: The default length of Bloom Filter in Runtime Filter, default 2M
  • runtime_filter_max_in_num: If the number of data rows in the right table of join is greater than this value, we will not generate IN predicate, the default is 102400

example

-- 建表
CREATE TABLE test (t1 INT) DISTRIBUTED BY HASH (t1) BUCKETS 2  
PROPERTIES("replication_num" = "1"); 

INSERT INTO test VALUES (1), (2), (3), (4); 

CREATE TABLE test2 (t2 INT) DISTRIBUTED BY HASH (t2) BUCKETS 2  
PROPERTIES("replication_num" = "1"); 

INSERT INTO test2 VALUES (3), (4), (5); 

-- 查看执行计划
set runtime_filter_type="BLOOM_FILTER,IN,MIN_MAX";

EXPLAIN SELECT t1 FROM test JOIN test2 where test.t1 = test2.t2;

Guess you like

Origin blog.csdn.net/qq_61162288/article/details/130997838