Data warehouse performance tuning: row_number() over(p)-rn=1 performance bottleneck discovery and rewriting routine

This article is shared from Huawei Cloud Community " GaussDB (DWS) Performance Tuning: Row_number() over(p)-rn=1 Performance Bottleneck Discovery and Rewriting Routine ", author: Zawami.

1. Rewrite the scene

This routine should be used in scenarios where the subquery contains row_number() over(partition by order by) rn, and only the rn column is used to filter the maximum value after classification and sorting.

2. Performance analysis

The execution of SQL statements in GaussDB is often streaming, that is, each piece of data is processed in a streamlined manner, and operators at each layer are executed at the same time, which shortens the execution time.

However, in some scenarios, it is necessary to obtain the entire result set of the previous operator before proceeding to the next step of processing; window function is one of them.

Observing the execution plan, we can see that after calculating the rn column, SQL will associate it with other columns of this layer's query. Due to the existence of window functions, operator No. 51 must be executed first before correlation can be performed, causing a performance bottleneck.

cke_143.png

By rewriting the window function, we can streamline the association between subtotals and detailed data.

Rewrite the local SQL before

SELECT

PROD_EN_NAME,

PROD_LIFE_CYCLE_STATUS

FROM

(

SELECT

PROD_EN_NAME,

LIFE_CYCLE AS PROD_LIFE_CYCLE_STATUS,

DEL_FLAG,

ROW_NUMBER ( ) OVER ( PARTITION BY PROD_EN_NAME ORDER BY RUN_DATE DESC ) RN

FROM

DMISC.DM_DIM_INV_PROD_ATTRI_SNAP_D

WHERE

DATA_TYPE = 1



AND DEL_FLAG = 'N'

AND RUN_DATE <= CAST ( '2023-06-11' || ' 00:00:00' AS TIMESTAMP )

)

WHERE

RN = 1

Partial SQL after rewriting

WITH T AS (

SELECT

PROD_EN_NAME,

MAX ( LIFE_CYCLE ) AS PROD_LIFE_CYCLE_STATUS,

RUN_DATE

FROM

DMISC.DM_DIM_INV_PROD_ATTRI_SNAP_D

WHERE

DATA_TYPE = 1

AND DEL_FLAG = 'N'

AND RUN_DATE <= CAST ( '2023-06-11' || ' 00:00:00' AS TIMESTAMP )

GROUP BY

PROD_EN_NAME,

RUN_DATE

)

SELECT

PROD_EN_NAME,

PROD_LIFE_CYCLE_STATUS

FROM T

WHERE

(PROD_EN_NAME, RUN_DATE) IN (SELECT PROD_EN_NAME, MAX(RUN_DATE) FROM T GROUP BY PROD_EN_NAME)

Rewriting analysis: Here, the data is first deduplicated based on the partition column and order column of row_number() over() in the original SQL. Since the original SQL does not define the sorting method of LIFE_CYCLE, the rewriting can use either MAX or MIN function for aggregation. . Then filter the deduplicated data, and the filtering conditions are obvious.

Using this modification method, the full execution plan before and after modification is given in the attachment.

This rewriting method solves the problem of window functions such as upper-level operators. We found that some business scenarios are not sensitive to other columns that do not involve aggregation, such as LIFE_CYCLE in the above example, and further aggregation is required, so there is no hard requirement for deduplication in this layer of subqueries. This layer can be further removed to remove duplication.

WITH T AS (

SELECT

PROD_EN_NAME,

LIFE_CYCLE AS PROD_LIFE_CYCLE_STATUS,

RUN_DATE

FROM

DMISC.DM_DIM_INV_PROD_ATTRI_SNAP_D

WHERE

DATA_TYPE = 1

AND DEL_FLAG = 'N'

AND RUN_DATE <= CAST ( '2023-06-11' || ' 00:00:00' AS TIMESTAMP )

)

SELECT

PROD_EN_NAME,

PROD_LIFE_CYCLE_STATUS

FROM T

WHERE

(PROD_EN_NAME, RUN_DATE) IN (SELECT PROD_EN_NAME, MAX(RUN_DATE) FROM T GROUP BY PROD_EN_NAME)

The rewritten execution plan is as follows:

cke_144.png

It can be seen that although the 51-layer operator in the execution plan is only 200ms faster, due to reduced blocking, the execution time of the 1 to 7-layer operators is shortened, and the overall execution time is about 480ms faster than before.

Click to follow and learn about Huawei Cloud’s new technologies as soon as possible~

Spring Boot 3.2.0 is officially released. The most serious service failure in Didi’s history. Is the culprit the underlying software or “reducing costs and increasing laughter”? Programmers tampered with ETC balances and embezzled more than 2.6 million yuan a year. Google employees criticized the big boss after leaving their jobs. They were deeply involved in the Flutter project and formulated HTML-related standards. Microsoft Copilot Web AI will be officially launched on December 1, supporting Chinese PHP 8.3 GA Firefox in 2023 Rust Web framework Rocket has become faster and released v0.5: supports asynchronous, SSE, WebSockets, etc. Loongson 3A6000 desktop processor is officially released, the light of domestic production! Broadcom announces successful acquisition of VMware
{{o.name}}
{{m.name}}

Guess you like

Origin my.oschina.net/u/4526289/blog/10307126