In-depth MaxCompute -Episode 12-PIVOT/UNPIVOT

Introduction:  MaxCompute introduces a new syntax - PIVOT/UNPIVOT: Use the PIVOT keyword to convert one or more rows with specified values ​​into columns based on aggregation; use the UNPIVOT keyword to convert one or more columns into rows. It meets the needs of row to column and column to row in a more concise and easy-to-use way, simplifies query statements, and improves the productivity of the majority of big data developers.

MaxCompute (formerly ODPS) is an industry-leading distributed big data processing platform independently developed by Alibaba Cloud. It is widely used within the group and supports the core businesses of multiple BUs. In addition to continuously optimizing performance, MaxCompute is also committed to improving the user experience and expressive capabilities of the SQL language and improving the productivity of MaxCompute developers.

MaxCompute is based on the new generation SQL engine of MaxCompute2.0, which significantly improves the ease of use of the SQL language compilation process and the expressiveness of the language. We hereby launch a series of in-depth MaxCompute articles

The first bullet - Make good use of MaxCompute compiler errors and warnings
The second bullet - New basic data types and built-in functions
The third bullet - Complex types
The fourth bullet - CTE, VALUES, SEMIJOIN
The fifth bullet - SELECT TRANSFORM
The sixth bullet - User Defined Type
Part 7 - Grouping Set, Cube and Rollup
Part 8 - Dynamic Type Functions
Part 9 - Script Mode and Parameter View

Bullet 10 - IF ELSE branch statement

Episode 11 - QUALIFY

This article will introduce you to the new syntax supported by MaxCompute - PIVOT/UNPIVOT , which uses the PIVOT keyword to convert one or more rows with specified values ​​into columns based on aggregation; the UNPIVOT keyword can be used to convert one or more columns into rows. Common scenarios are as follows:

  • Scenario 1
    : A certain business table needs to treat the values ​​in the table as new columns, and aggregate existing results based on each value to achieve the effect of row conversion. Before PIVOT was supported, to achieve this requirement, it was necessary to combine GROUP BY syntax + aggregate function + Filter syntax filtering.
  • Scenario 2
    : A certain business table needs to construct a new column, merge the original column names into this column, and use another new column to place the values ​​of the original columns, so as to achieve the effect of column conversion. Before UNPIVOT was supported, to achieve this requirement, it was necessary to combine CROSS JOIN syntax + CASE WHEN expression to construct the implementation.

PIVOT/UNPIVOT function

PIVOT

PIVOT overview

The PIVOT syntax rotates the specified rows into multiple columns, aggregates the remaining column values ​​to get the result, and rotates the table. The PIVOT syntax is part of the FROM clause.

SELECT ... 
FROM ... 
PIVOT ( 
    <aggregate function> [AS <alias>] [, <aggregate function> [AS <alias>]] ... 
    FOR (<column> [, <column>] ...) 
    IN ( 
        (<value> [, <value>] ...) AS <new column> 
        [, (<value> [, <value>] ...) AS <new column>] 
        ... 
       ) 
    ) 
[...]
  • <aggregate_function> represents the aggregate function
    that needs to be calculated when converting rows to columns , and no function can be nested in the outer layer of the aggregate function. It can be an expression composed of a Scalar function and a column. At the same time, there cannot be other aggregate functions or Window functions inside the parameters of the aggregate function, and the columns of the aggregate function can only be columns in the upstream table.
  • Indicates the alias of the corresponding column of the aggregate function that needs to be calculated when converting rows to columns.
  • Indicates the column name of the corresponding row to which rows are converted to columns. It cannot be any expression.
  • Represents the value of the corresponding row converted from row to column, which can also be an expression, but no aggregate functions and window functions are allowed, and the number of elements in each tuple must be consistent with the number.
  • <new_column>
    represents the alias of the new column after rows are converted to columns. If no alias is specified, an alias will be attempted to be guessed. If the guess fails, the system will automatically generate an alias.

For more detailed syntax instructions, please refer to the documentation .

The PIVOT syntax can be equivalent to the combination of group by + aggregate function + filter. Take the following example

SELECT ...
FROM ...
PIVOT (
 agg1 AS a, agg2 AS b, ...
 FOR (axis1, ..., axisN)
 IN (
     (v11, ..., v1N) AS label1,
     (v21, ..., v2N) AS label2, 
     ...)
 )

The above syntax is equivalent to

SELECT 
 k1, ... kN, 
 agg1 AS label1_a FILTER (where axis1 = v11 and ... and axisN = v1N), 
 agg2 AS label1_b FILTER (where axis1 = v21 and ... and axisN = v2N), 
 ..., 
 agg1 AS label2_a FILTER (where axis1 = v11 and ... and axisN = v1N),
 agg2 AS label2_b FILTER (where axis1 = v21 and ... and axisN = v2N), 
 ..., 
 FROM xxxxxx
 GROUP BY k1, ... kN

The table in FROM is the result of PIVOT upstream, k1, ... kN is the set of all columns that do not appear in agg1, ​​agg2, ... and axis1, ..., axisN.

PVIOT example

  • data preparation. The following table represents the sales of corresponding items in several chain stores in the corresponding years.
create table shops_table as select * from (select * from values
('pen', 10, 500, 'shop1', 2020),
('pen', 11, 500, 'shop2', 2020),
('pen', 9, 300, 'shop3', 2020),
('pen', 12, 400,'shop4', 2020),
('pen', 15, 200, 'shop1', 2021),
('pen', 16, 300, 'shop2', 2021),
('pen', 16, 400, 'shop3', 2021),
('pen', 15, 300, 'shop4', 2021),
('ruler', 20, 700, 'shop1', 2020),
('ruler', 19, 900, 'shop2', 2020),
('ruler', 22, 800, 'shop3', 2020),
('ruler', 19, 700, 'shop4', 2020),
('ruler', 25, 300, 'shop1', 2021),
('ruler', 20, 500, 'shop2', 2021),
('ruler', 23, 500, 'shop3', 2021),
('ruler', 26, 600, 'shop4', 2021)
shops(item_name, count, sales, shop_name, year));
select * from shops_table;
-- 结果如下:
+-----------+------------+------------+-----------+------------+
| item_name | count      | sales      | shop_name | year       |
+-----------+------------+------------+-----------+------------+
| pen       | 10         | 500        | shop1     | 2020       |
| pen       | 11         | 500        | shop2     | 2020       |
| pen       | 9          | 300        | shop3     | 2020       |
| pen       | 12         | 400        | shop4     | 2020       |
| pen       | 15         | 200        | shop1     | 2021       |
| pen       | 16         | 300        | shop2     | 2021       |
| pen       | 16         | 400        | shop3     | 2021       |
| pen       | 15         | 300        | shop4     | 2021       |
| ruler     | 20         | 700        | shop1     | 2020       |
| ruler     | 19         | 900        | shop2     | 2020       |
| ruler     | 22         | 800        | shop3     | 2020       |
| ruler     | 19         | 700        | shop4     | 2020       |
| ruler     | 25         | 300        | shop1     | 2021       |
| ruler     | 20         | 500        | shop2     | 2021       |
| ruler     | 23         | 500        | shop3     | 2021       |
| ruler     | 26         | 600        | shop4     | 2021       |
+-----------+------------+------------+-----------+------------+
  • Statistics on the number of items sold by each store in each year.
    • Before PVIOT syntax is supported, the implementation is as follows:
SELECT  item_name
        ,year
        ,SUM(CASE shop_name WHEN 'shop1' THEN count END) AS shop1
        ,SUM(CASE shop_name WHEN 'shop2' THEN count END) AS shop2
        ,SUM(CASE shop_name WHEN 'shop3' THEN count END) AS shop3
        ,SUM(CASE shop_name WHEN 'shop4' THEN count END) AS shop4
FROM    shops_table
GROUP BY item_name
         ,year
;
--结果如下:
+-----------+------------+------------+------------+------------+------------+
| item_name | year       | 'shop1'    | 'shop2'    | 'shop3'    | 'shop4'    |
+-----------+------------+------------+------------+------------+------------+
| pen       | 2020       | 10         | 11         | 9          | 12         |
| pen       | 2021       | 15         | 16         | 16         | 15         |
| ruler     | 2020       | 20         | 19         | 22         | 19         |
| ruler     | 2021       | 25         | 20         | 23         | 26         |
+-----------+------------+------------+------------+------------+------------+
    • The implementation through PVIOT syntax is as follows:
select * from (select item_name, year,count,shop_name from shops_table)
pivot (sum(count) for shop_name in ('shop1', 'shop2', 'shop3', 'shop4'));
--结果如下:
+------------+------------+------------+------------+------------+------------+
| item_name  | year       | 'shop1'    | 'shop2'    | 'shop3'    | 'shop4'    | 
+------------+------------+------------+------------+------------+------------+
| pen        | 2020       | 10         | 11         | 9          | 12         | 
| pen        | 2021       | 15         | 16         | 16         | 15         | 
| ruler      | 2020       | 20         | 19         | 22         | 19         | 
| ruler      | 2021       | 25         | 20         | 23         | 26         | 
+------------+------------+------------+------------+------------+------------+

You can alias the aggregate function and the new column at this point, and the column names are merged according to the underscore:

select * from (select item_name, count, shop_name, year from shops_table)
pivot (sum(count) as sum_count for shop_name in ('shop1' as shop_name_1, 'shop2' as shop_name_2, 'shop3' as shop_name_3, 'shop4' as shop_name_4));
--结果如下:
+------------+------------+-----------------------+-----------------------+-----------------------+-----------------------+
| item_name  | year       | shop_name_1_sum_count | shop_name_2_sum_count | shop_name_3_sum_count | shop_name_4_sum_count | 
+------------+------------+-----------------------+-----------------------+-----------------------+-----------------------+
| pen        | 2020       | 10                    | 11                    | 9                     | 12                    | 
| pen        | 2021       | 15                    | 16                    | 16                    | 15                    | 
| ruler      | 2020       | 20                    | 19                    | 22                    | 19                    | 
| ruler      | 2021       | 25                    | 20                    | 23                    | 26                    | 
+------------+------------+-----------------------+-----------------------+-----------------------+-----------------------+
  • Calculate the total sales quantity and maximum sales volume of each item per store per year, and achieve this through PIVOT as follows:
select * from shops_table
pivot (sum(count) as sum_count, max(sales) as max_sales for shop_name in ('shop1' as shop_name_1, 'shop2' as shop_name_2, 'shop3' as shop_name_3, 'shop4' as shop_name_4));
--结果如下:
+-----------+------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+
| item_name | year       | shop_name_1_sum_count | shop_name_2_sum_count | shop_name_3_sum_count | shop_name_4_sum_count | shop_name_1_max_sales | shop_name_2_max_sales | shop_name_3_max_sales | shop_name_4_max_sales |
+-----------+------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+
| pen       | 2020       | 10                    | 11                    | 9                     | 12                    | 500                   | 500                   | 300                   | 400                   |
| pen       | 2021       | 15                    | 16                    | 16                    | 15                    | 200                   | 300                   | 400                   | 300                   |
| ruler     | 2020       | 20                    | 19                    | 22                    | 19                    | 700                   | 900                   | 800                   | 700                   |
| ruler     | 2021       | 25                    | 20                    | 23                    | 26                    | 300                   | 500                   | 500                   | 600                   |
+-----------+------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+
  • Only calculate the total sales quantity and maximum sales of each item in shop1 in 2020 and 2021, and implement it through PIVOT as follows:
select * from shops_table
pivot (sum(count) as sum_count, max(sales) as max_sales for (shop_name, year) in (('shop1', 2021) as shop1_2021, ('shop1', 2020) as shop1_2020));
--结果如下:
+-----------+----------------------+----------------------+----------------------+----------------------+
| item_name | shop1_2021_sum_count | shop1_2020_sum_count | shop1_2021_max_sales | shop1_2020_max_sales |
+-----------+----------------------+----------------------+----------------------+----------------------+
| pen       | 15                   | 10                   | 200                  | 500                  |
| ruler     | 25                   | 20                   | 300                  | 700                  |
+-----------+----------------------+----------------------+----------------------+----------------------+

UNPIVOT

UNPIVOTOverview

The UNPIVOT syntax rotates the table by converting columns into rows. The UNPIVOT syntax is part of the FROM clause.

SELECT ...
FROM ...
UNPIVOT [EXCLUDE NULLS] (
    <new_column_of_name> [, <new_column_of_name>] ...
    FOR (<new_column_of_value> [, <new_column_of_value>] ...)
    IN (
        (<column> [, <column>] ...) AS (<column_value> [, <column_value>] ...)
        [, (<column> [, <column>] ...) AS (<column_value> [, <column_value>] ...)]
        ...
       )
    )
[...]
  • [EXCLUDE NULLS]
    If this syntax is specified, all rows that are null will be filtered out.
  • <new_column_of_name>
    is a column used to store the original column name after column conversion. It must be a column name and cannot be an expression or have the same name. The number needs to be the same as the number of internal elements of each tuple. If not specified, MaxCompute will automatically generate a set of string type tuples.
  • <new_column_of_value>
    is a column used to store the corresponding value of the original column after the column is converted. The column name must not be an expression or have the same name, and the number must be the same as the number of internal elements of each tuple.
  • The original column used for column conversion.
  • <column_value>
    is the alias of the original column used for column conversion. It can be used to replace the original column name. Internal column names are not allowed.

For more detailed syntax instructions, please refer to the documentation .

The UNPIVOT syntax can be equivalent to the combination of CROSS JOIN + CASE WHEN expression. Take the following example as an example:

SELECT ...
FROM ...
UNPIVOT [exclude nulls] (
 (measure1, ..., measureM)
 FOR (axis1, ..., axisN)
 IN ((c11, ..., c1M) AS (value11, ..., value1N),
     (c21, ..., c2M) AS (value21, ..., value2N), ...))
[...]

The above syntax is equivalent to

SELECT  * FROM
(
 SELECT
 k1, ... kN,
 CASE 
 WHEN axis1 = value11 AND ... AND axisN = value1N THEN c11
 WHEN axis1 = value21 AND ... AND axisN = value2N THEN c21
 ...
 ELSE null AS measure1,
 ..., 
 CASE 
 WHEN axis1 = value11 AND ... AND axisN = value1N THEN c1M
 WHEN axis1 = value21 AND ... AND axisN = value2N THEN c2M
 ELSE null AS measureM, 
 axis1, ..., axisN
 FROM xxxx 
 JOIN (VALUES (value11, ..., value1N),(value21, ..., value2N), ... AS generated_table_name(axis1, ..., axisN))
)
[WHERE measure1 is not null OR ... OR measureM is not null]

UNPIVOT example

  • data preparation. The following table represents the sales of corresponding items in several chain stores in the corresponding years:
create table shops as select * from (select * from values
('pen', 2020, 100, 200, 300, 400),
('pen', 2021, 100, 200, 200, 100),
('ruler', 2020, 300, 400, 300, 200),
('ruler', 2021, 400, 300, 100, 100)
shops(item_name, year, shop1, shop2, shop3, shop4));
SELECT * from shops;
--执行结果:
+-----------+------------+------------+------------+------------+------------+
| item_name | year       | shop1      | shop2      | shop3      | shop4      |
+-----------+------------+------------+------------+------------+------------+
| pen       | 2020       | 100        | 200        | 300        | 400        |
| pen       | 2021       | 100        | 200        | 200        | 100        |
| ruler     | 2020       | 300        | 400        | 300        | 200        |
| ruler     | 2021       | 400        | 300        | 100        | 100        |
+-----------+------------+------------+------------+------------+------------+
  • Rotate the table to get the sales quantity of each store and replace it with the new column name count.
    • Implementation without UNPIVOT:
select * from(
select item_name,year, 'shop1' as shop_name, shop1 as count from shops
union ALL 
select item_name,year, 'shop2' as shop_name, shop2 as count from shops
UNION ALL 
select item_name,year, 'shop3' as shop_name, shop3 as count from shops
UNION ALL  
select item_name,year, 'shop4' as shop_name, shop4 as count from shops
);
--执行结果
+------------+------------+------------+------------+
| item_name  | year       | shop_name  | count      | 
+------------+------------+------------+------------+
| pen        | 2020       | shop1      | 100        | 
| pen        | 2021       | shop1      | 100        | 
| ruler      | 2020       | shop1      | 300        | 
| ruler      | 2021       | shop1      | 400        | 
| pen        | 2020       | shop2      | 200        | 
| pen        | 2021       | shop2      | 200        | 
| ruler      | 2020       | shop2      | 400        | 
| ruler      | 2021       | shop2      | 300        | 
| pen        | 2020       | shop3      | 300        | 
| pen        | 2021       | shop3      | 200        | 
| ruler      | 2020       | shop3      | 300        | 
| ruler      | 2021       | shop3      | 100        | 
| pen        | 2020       | shop4      | 400        | 
| pen        | 2021       | shop4      | 100        | 
| ruler      | 2020       | shop4      | 200        | 
| ruler      | 2021       | shop4      | 100        | 
+------------+------------+------------+------------+
    • Achieved through UNPIVOT:
select * from shops
unpivot (count for shop_name in (shop1, shop2, shop3, shop4));
--执行结果
+------------+------------+------------+------------+
| item_name  | year       | count      | shop_name  | 
+------------+------------+------------+------------+
| pen        | 2020       | 100        | shop1      | 
| pen        | 2020       | 200        | shop2      | 
| pen        | 2020       | 300        | shop3      | 
| pen        | 2020       | 400        | shop4      | 
| pen        | 2021       | 100        | shop1      | 
| pen        | 2021       | 200        | shop2      | 
| pen        | 2021       | 200        | shop3      | 
| pen        | 2021       | 100        | shop4      | 
| ruler      | 2020       | 300        | shop1      | 
| ruler      | 2020       | 400        | shop2      | 
| ruler      | 2020       | 300        | shop3      | 
| ruler      | 2020       | 200        | shop4      | 
| ruler      | 2021       | 400        | shop1      | 
| ruler      | 2021       | 300        | shop2      | 
| ruler      | 2021       | 100        | shop3      | 
| ruler      | 2021       | 100        | shop4      | 
+------------+------------+------------+------------+
  • If shop1 and shop2 are East District stores, and shop3 and shop4 are West District stores, then a new column is needed to represent East District stores and West District stores. The count1 and count2 columns store the sales quantities of the two stores respectively.
select * from shops
unpivot ((count1, count2) for shop_name in ((shop1, shop2) as 'east_shop', (shop3, shop4) as 'west_shop'));
--执行结果
+------------+------------+------------+------------+------------+
| item_name  | year       | count1     | count2     | shop_name  | 
+------------+------------+------------+------------+------------+
| pen        | 2020       | 100        | 200        | east_shop  | 
| pen        | 2020       | 300        | 400        | west_shop  | 
| pen        | 2021       | 100        | 200        | east_shop  | 
| pen        | 2021       | 200        | 100        | west_shop  | 
| ruler      | 2020       | 300        | 400        | east_shop  | 
| ruler      | 2020       | 300        | 200        | west_shop  | 
| ruler      | 2021       | 400        | 300        | east_shop  | 
| ruler      | 2021       | 100        | 100        | west_shop  | 
+------------+------------+------------+------------+------------+

The alias can be multiple columns, but the corresponding new column names that need to be generated must be added accordingly:

select * from shops
unpivot ((count1, count2) for (shop_name, location) in ((shop1, shop2) as ('east_shop', 'east'), (shop3, shop4) as ('west_shop', 'west')));
--执行结果
+------------+------------+------------+------------+------------+------------+
| item_name  | year       | count1     | count2     | shop_name  | location   | 
+------------+------------+------------+------------+------------+------------+
| pen        | 2020       | 100        | 200        | east_shop  | east       | 
| pen        | 2020       | 300        | 400        | west_shop  | west       | 
| pen        | 2021       | 100        | 200        | east_shop  | east       | 
| pen        | 2021       | 200        | 100        | west_shop  | west       | 
| ruler      | 2020       | 300        | 400        | east_shop  | east       | 
| ruler      | 2020       | 300        | 200        | west_shop  | west       | 
| ruler      | 2021       | 400        | 300        | east_shop  | east       | 
| ruler      | 2021       | 100        | 100        | west_shop  | west       | 
+------------+------------+------------+------------+------------+------------+

summary

PIVOT/UNPIVOT syntax meets the needs of row to column and column to row in a more concise and easy-to-use way, simplifying query statements and improving the productivity of the majority of big data developers.

Guess you like

Origin blog.csdn.net/weixin_48534929/article/details/132607674