ROLLUP means "rolling up" in multidimensional analysis, which means further aggregating data to a specified granularity .
Previous aggregation model:
user id | Data insertion time | City | age | gender | time of last visit | The total consumption of this user | The maximum length of stay of this user | The user’s minimum length of stay |
10000 | 2017/10/2 | Beijing | 10 | 0 | 2017/10/02 08:00:00 | 65 | 15 | 2 |
10000 | 2017/10/2 | Beijing | 20 | 0 | 2017/10/02 08:00:00 | 65 | 15 | 2 |
10000 | 2017/10/2 | Beijing | 30 | 0 | 2017/10/02 08:00:00 | 65 | 15 | 2 |
10000 | 2017/10/1 | Shanghai | 20 | 0 | 2017/10/01 08:00:00 | 100 | 122 | 2 |
10000 | 2017/10/2 | Shanghai | 20 | 0 | 2017/10/02 08:00:00 | 30 | 30 | 2 |
10000 | 2017/10/3 | Shanghai | 10 | 0 | 2017/10/03 08:00:00 | 55 | 33 | 2 |
10000 | 2017/10/4 | Shanghai | 20 | 0 | 2017/10/04 08:00:00 | 65 | 15 | 2 |
10001 | 2017/10/1 | Shanghai | 30 | 1 | 2017/10/01 17:05:45 | 20 | 22 | 22 |
10001 | 2017/10/2 | Shanghai | 10 | 1 | 2017/10/01 17:05:45 | 10 | 123 | 22 |
10001 | 2017/10/2 | Tianjin | 10 | 1 | 2017/10/01 17:05:45 | 18 | 2 | 22 |
10001 | 2017/10/1 | Shanghai | 10 | 1 | 2017/10/01 17:05:45 | 10 | 123 | 22 |
10001 | 2017/10/1 | Tianjin | 10 | 1 | 2017/10/01 17:05:45 | 18 | 2 | 22 |
10001 | 2017/10/1 | Tianjin | 20 | 1 | 2017/10/01 17:05:45 | 28 | 45 | 22 |
10002 | 2017/10/1 | Tianjin | 30 | 1 | 2017/10/01 17:05:45 | 35 | 11 | 22 |
10002 | 2017/10/2 | Tianjin | 10 | 1 | 2017/10/01 08:00:00 | 20 | 23 | 2 |
10002 | 2017/10/2 | Beijing | 20 | 1 | 2017/10/03 17:05:45 | 35 | 11 | 22 |
10002 | 2017/10/1 | Tianjin | 10 | 1 | 2017/10/01 08:00:00 | 20 | 23 | 2 |
10002 | 2017/10/3 | Beijing | 20 | 1 | 2017/10/03 17:05:45 | 35 | 11 | 22 |
10002 | 2017/10/3 | Beijing | 30 | 1 | 2017/10/03 08:00:00 | 20 | 23 | 2 |
1. Find the total daily sales of each user in each city
select
user_id,city,date,
sum(sum_cost) as sum_cost
from t
group by user_id,city,date
-- user_id date city sum_cost
10000 2017/10/2 北京 195
10000 2017/10/1 上海 100
10000 2017/10/2 上海 30
10000 2017/10/3 上海 55
10000 2017/10/4 上海 65
10001 2017/10/1 上海 30
10001 2017/10/2 上海 10
10001 2017/10/2 天津 18
10001 2017/10/1 天津 46
10002 2017/10/1 天津 55
10002 2017/10/3 北京 55
10002 2017/10/2 天津 20
10002 2017/10/2 北京 35
2. Find the total consumption of each user and each city
select
user_id,city,
sum(sum_cost) as sum_cost
from t
group by user_id,city
user_id city sum_cost
10000 北京 195
10000 上海 100
10001 上海 40
10001 天津 64
10002 天津 75
10002 北京 90
3. Find the total consumption of each user
select
user_id,
sum(sum_cost) as sum_cost
from t
group by user_id
user_id sum_cost
10000 295
10001 104
10002 165
1Basic concepts
The table created by the table creation statement is called Base Table (Base Table, base table)
On top of the Base table, we can create any number of ROLLUP tables. These ROLLUP data are generated based on the Base table and are physically stored independently .
Benefits of Rollup tables:
-
Sharing the same table name with the base table, doris will select the appropriate data source (appropriate table) based on the specific query logic to calculate the results.
-
For additions, deletions and modifications of data in the base table, the rollup table will be automatically updated and synchronized.
2 ROLLUP in Aggregate model
Check out a table created before:
mysql> desc ex_user all;
Example 1: View the total consumption of a user
Add/delete roll up table
alter table aggregate表名 add rollup "rollup表的表名" (user_id,city,date,cost);
alter table ex_user add rollup rollup_ucd_cost(user_id,city,date,cost);
alter table ex_user add rollup rollup_u_cost(user_id,cost);
alter table ex_user add rollup rollup_cd_cost(city,date,cost);
alter table ex_user drop rollup rollup_u_cost;
alter table ex_user drop rollup rollup_cd_cost;
--如果是replace聚合类型得value,需要指定所有得key
-- alter table ex_user add rollup rollup_cd_visit(city,date,last_visit_date);
-- ERROR 1105 (HY000): errCode = 2, detailMessage = Rollup should contains
-- all keys if there is a REPLACE value
--添加完成之后可以show一下,看看底层得rollup有没有执行完成
SHOW ALTER TABLE ROLLUP;
Doris will automatically hit the ROLLUP table, so only a very small amount of data needs to be scanned to complete the aggregation query.
explain SELECT user_id, sum(cost) FROM ex_user GROUP BY user_id;
3 ROLLUP in Duplicate model
ROLLUP adjusts the prefix index (adds a new set of prefix index)
Because the column order has been specified when creating the table, a table has only one prefix index. This may not be efficient enough for queries using other columns that cannot hit the prefix index as conditions. Therefore, we can manually adjust the column order by creating a ROLLUP.
The Base table structure is as follows:
ColumnName | Type |
user_id | BIGINT |
age | INT |
message | VARCHAR(100) |
max_dwell_time | DATETIME |
min_dwell_time | DATETIME |
We can create a ROLLUP table based on this:
ColumnName | Type |
age | INT |
user_id | BIGINT |
message | VARCHAR(100) |
max_dwell_time | DATETIME |
min_dwell_time | DATETIME |
As you can see, the columns of ROLLUP and Base tables are exactly the same, except that the order of user_id and age is reversed. So when we make the following query:
SELECT * FROM table where age=20 and message LIKE "%error%";
The ROLLUP table will be preferred because the ROLLUP prefix index has a higher matching degree.
ROLLUP instructions for use
-
ROLLUP is attached to the Base table. Users can create or delete ROLLUP based on the Base table, but they cannot explicitly specify a certain ROLLUP in the query. Whether to hit ROLLUP is completely determined automatically by the Doris system.
-
ROLLUP data is physically stored independently. Therefore, the more ROLLUPs created, the more disk space they take up. At the same time, it will also have an impact on the import speed, but it will not reduce the query efficiency (it will only be better).
-
The data update of ROLLUP is completely synchronized with the Base table. Users do not need to worry about this issue.
-
In the aggregation model, the aggregation type of the column in ROLLUP is exactly the same as the Base table. There is no need to specify when creating a ROLLUP and it cannot be modified.
-
You can obtain the query execution plan through the EXPLAIN your_sql; command. In the execution plan, check whether ROLLUP is hit.
-
You can display the Base table and all created ROLLUPs through the DESC tbl_name ALL; statement