Doris----Rollup table analysis and case implementation

ROLLUP means "rolling up" in multidimensional analysis, which means further aggregating data to a specified granularity .

Previous aggregation model:

user id Data insertion time City age gender time of last visit The total consumption of this user The maximum length of stay of this user The user’s minimum length of stay
10000 2017/10/2 Beijing 10 0 2017/10/02 08:00:00 65 15 2
10000 2017/10/2 Beijing 20 0 2017/10/02 08:00:00 65 15 2
10000 2017/10/2 Beijing 30 0 2017/10/02 08:00:00 65 15 2
10000 2017/10/1 Shanghai 20 0 2017/10/01 08:00:00 100 122 2
10000 2017/10/2 Shanghai 20 0 2017/10/02 08:00:00 30 30 2
10000 2017/10/3 Shanghai 10 0 2017/10/03 08:00:00 55 33 2
10000 2017/10/4 Shanghai 20 0 2017/10/04 08:00:00 65 15 2
10001 2017/10/1 Shanghai 30 1 2017/10/01 17:05:45 20 22 22
10001 2017/10/2 Shanghai 10 1 2017/10/01 17:05:45 10 123 22
10001 2017/10/2 Tianjin 10 1 2017/10/01 17:05:45 18 2 22
10001 2017/10/1 Shanghai 10 1 2017/10/01 17:05:45 10 123 22
10001 2017/10/1 Tianjin 10 1 2017/10/01 17:05:45 18 2 22
10001 2017/10/1 Tianjin 20 1 2017/10/01 17:05:45 28 45 22
10002 2017/10/1 Tianjin 30 1 2017/10/01 17:05:45 35 11 22
10002 2017/10/2 Tianjin 10 1 2017/10/01 08:00:00 20 23 2
10002 2017/10/2 Beijing 20 1 2017/10/03 17:05:45 35 11 22
10002 2017/10/1 Tianjin 10 1 2017/10/01 08:00:00 20 23 2
10002 2017/10/3 Beijing 20 1 2017/10/03 17:05:45 35 11 22
10002 2017/10/3 Beijing 30 1 2017/10/03 08:00:00 20 23 2

 1. Find the total daily sales of each user in each city

select 
user_id,city,date,
sum(sum_cost) as sum_cost
from t
group by user_id,city,date

 

-- user_id      date             city      sum_cost
   10000        2017/10/2        北京        195
   10000        2017/10/1        上海        100
   10000        2017/10/2        上海        30 
   10000        2017/10/3        上海        55 
   10000        2017/10/4        上海        65 
   10001        2017/10/1        上海        30
   10001        2017/10/2        上海        10        
   10001        2017/10/2        天津        18         
   10001        2017/10/1        天津        46
   10002        2017/10/1        天津        55
   10002        2017/10/3        北京        55 
   10002        2017/10/2        天津        20        
   10002        2017/10/2        北京        35        

 2. Find the total consumption of each user and each city

select 
user_id,city,
sum(sum_cost) as sum_cost
from t
group by user_id,city
user_id      city       sum_cost
10000        北京        195
10000        上海        100
10001        上海        40
10001        天津        64
10002        天津        75
10002        北京        90

 3. Find the total consumption of each user

select 
user_id,
sum(sum_cost) as sum_cost
from t
group by user_id
user_id        sum_cost
10000            295
10001            104
10002            165

 

 1Basic concepts

The table created by the table creation statement is called Base Table (Base Table, base table)

On top of the Base table, we can create any number of ROLLUP tables. These ROLLUP data are generated based on the Base table and are physically stored independently .

Benefits of Rollup tables:

  1. Sharing the same table name with the base table, doris will select the appropriate data source (appropriate table) based on the specific query logic to calculate the results.

  2. For additions, deletions and modifications of data in the base table, the rollup table will be automatically updated and synchronized.

 2 ROLLUP in Aggregate model

 Check out a table created before:

mysql> desc ex_user all;  

Example 1: View the total consumption of a user

Add/delete roll up table

alter table aggregate表名 add rollup "rollup表的表名" (user_id,city,date,cost);

alter table ex_user add rollup rollup_ucd_cost(user_id,city,date,cost);
alter table ex_user add rollup rollup_u_cost(user_id,cost);
alter table ex_user add rollup rollup_cd_cost(city,date,cost);

alter table ex_user drop rollup rollup_u_cost;
alter table ex_user drop rollup rollup_cd_cost;

--如果是replace聚合类型得value,需要指定所有得key
-- alter table ex_user add rollup rollup_cd_visit(city,date,last_visit_date);
-- ERROR 1105 (HY000): errCode = 2, detailMessage = Rollup should contains 
-- all keys if there is a REPLACE value

--添加完成之后可以show一下,看看底层得rollup有没有执行完成
SHOW ALTER TABLE ROLLUP;

Doris will automatically hit the ROLLUP table, so only a very small amount of data needs to be scanned to complete the aggregation query.

explain SELECT user_id, sum(cost) FROM ex_user GROUP BY user_id;

 

ROLLUP in Duplicate model

ROLLUP adjusts the prefix index (adds a new set of prefix index)

Because the column order has been specified when creating the table, a table has only one prefix index. This may not be efficient enough for queries using other columns that cannot hit the prefix index as conditions. Therefore, we can manually adjust the column order by creating a ROLLUP.

The Base table structure is as follows:

ColumnName Type
user_id BIGINT
age INT
message VARCHAR(100)
max_dwell_time DATETIME
min_dwell_time DATETIME

 We can create a ROLLUP table based on this:

ColumnName Type
age INT
user_id BIGINT
message VARCHAR(100)
max_dwell_time DATETIME
min_dwell_time DATETIME

 As you can see, the columns of ROLLUP and Base tables are exactly the same, except that the order of user_id and age is reversed. So when we make the following query:

SELECT * FROM table where age=20 and message LIKE "%error%";

 The ROLLUP table will be preferred because the ROLLUP prefix index has a higher matching degree.

 

ROLLUP instructions for use

  1. ROLLUP is attached to the Base table. Users can create or delete ROLLUP based on the Base table, but they cannot explicitly specify a certain ROLLUP in the query. Whether to hit ROLLUP is completely determined automatically by the Doris system.

  2. ROLLUP data is physically stored independently. Therefore, the more ROLLUPs created, the more disk space they take up. At the same time, it will also have an impact on the import speed, but it will not reduce the query efficiency (it will only be better).

  3. The data update of ROLLUP is completely synchronized with the Base table. Users do not need to worry about this issue.

  4. In the aggregation model, the aggregation type of the column in ROLLUP is exactly the same as the Base table. There is no need to specify when creating a ROLLUP and it cannot be modified.

  5. You can obtain the query execution plan through the EXPLAIN your_sql; command. In the execution plan, check whether ROLLUP is hit.

  6. You can display the Base table and all created ROLLUPs through the DESC tbl_name ALL; statement

 

Guess you like

Origin blog.csdn.net/m0_53400772/article/details/130916112