Case study of unequal value correlation optimization in 2 data warehouses

This article is shared from Huawei Cloud Community " GaussDB (DWS) Performance Tuning: Unequal Correlation Optimization ", author: A grapevine in front of the door.

scene 1

Usage scenario: This case is suitable for scenarios that meet the following conditions

  1. Association conditions use OR connection
  2. Use the same column in associated conditions for data filtering

original statement

SELECT

t2.PARTNER_CHANNEL_CODE AS CHANNEL_ID

,t1.COUNTRY_CODE

,t1.BRAND

,t2.CHANNEL_ID AS CHANNEL_ID2

FROM

t1

LEFT JOIN

t2

ON

( t2.CHANNEL_ID = t1.CHANNEL_ID AND t1.TYPE = 'DR' )

OR ( t2.PARTNER_CHANNEL_CODE = t1.CHANNEL_ID AND t1.TYPE = 'ALL' )

GROUP BY

t2.PARTNER_CHANNEL_CODE

,t1.COUNTRY_CODE

,t1.BRAND

,t2.CHANNEL_ID

Performance analysis

Through the query plan analysis, it was found that the NEST LOOP was associated with the t1 table and the t2 table. The overall query took 45 seconds, and the NEST LOOP took up 96% of the entire query execution time. Therefore, consider whether you can avoid NEST LOOP through SQL rewriting or HINT. It is observed that the t1 table and the t2 table contain two association association conditions. The two association conditions are connected using OR, which is a non-equivalent association, so HASH JOIN cannot be used. Further analysis of the SQL revealed that both association conditions use t1.TYPE for filtering:

(t2.CHANNEL_ID = t1.CHANNEL_ID AND t1.TYPE='DR')

OR (t2.PARTNER_CHANNEL_CODE = t1.CHANNEL_ID AND t1.TYPE='ALL' )

This association condition includes the following three association combinations:

  1. The rows with t1.TYPE='DR' in the t1 table can only be associated with the t2 table using the first association condition;
  2. The rows with t1.TYPE='ALL' in the t1 table can only be associated with the t2 table using the second association condition;
  3. The rows of t1.TYPE NOT IN ('ALL','DR') in the t1 table are not associated with the t2 table and are filled in the blanks directly.

A row of data in the t1 table can only choose one of these three association conditions to associate with the t2 table, so the association condition can be rewritten as UNION ALL of different association conditions (UNION will remove duplicates and is not equivalent).

Optimize and rewrite

After rewriting, the SQL looks as follows:

SELECT

CHANNEL_ID

,COUNTRY_CODE

,BRAND

,CHANNEL_ID

FROM

(

SELECT

t2.PARTNER_CHANNEL_CODE AS CHANNEL_ID

,t1.COUNTRY_CODE

,t1.BRAND

,t2.CHANNEL_ID AS CHANNEL_ID2

FROM

t1

LEFT JOIN

t2

ON

t2.CHANNEL_ID = t1.CHANNEL_ID

WHERE

t1.TYPE = 'DR'

UNION ALL

SELECT

t2.PARTNER_CHANNEL_CODE AS CHANNEL_ID

,t1.COUNTRY_CODE

,t1.BRAND

,t2.CHANNEL_ID AS CHANNEL_ID2

FROM

t1

t2

ON t2.PARTNER_CHANNEL_CODE = t1.CHANNEL_ID

WHERE t1.TYPE='ALL'

UNION ALL

SELECT

t2.PARTNER_CHANNEL_CODE AS CHANNEL_ID

,t1.COUNTRY_CODE

,t1.BRAND

,t2.CHANNEL_ID AS CHANNEL_ID2

FROM t1

LEFT JOIN

t2

ON FALSE

WHERE t1.TYPE NOT IN ('ALL','DR')

)

GROUP BY CHANNEL_ID,COUNTRY_CODE,BRAND,CHANNEL_ID

After rewriting, the SQL becomes UNION ALL of three subqueries, the execution time is reduced to less than 1 second, and the performance is optimized by 45 times.

Scene 2

Usage scenario : This case is suitable for scenarios that meet the following conditions

  1. Large table A associates small table B with unequal values
  2. The equivalent related field of B is the primary key

[Original statement]

SELECT

T.CREATE_INVOICE_USER,

T.PERIOD_ID,

T.AP_INVOICE_ID,

T.AP_INVOICE_NUM,

T.AP_BATCH_NAME,

EMP1.EMPLOYEE_NO,

EMP1.EMPLOYEE_NAME

FROM DWACTDI.DWR_AP_GLOBAL_INVOICE_DETAIL_F_I T

LEFT JOIN DWRDIM_DW1.DWR_DIM_EMPLOYEE_D EMP1 ON (EMP1.SCD_ACTIVE_IND = 1 AND(T.CREATE_INVOICE_USER = EMP1.EMPLOYEE_NO OR SUBSTR(T.CREATE_INVOICE_USER, 2) = EMP1.EMPLOYEE_NO))

【Performance Analysis】

The original statement execution timed out (more than 1h), and the execution plan is as follows. You can see that the execution statement has a large table NestLoop operation

cke_114.png

The analysis found that the table dwrdim_dw1.dwr_dim_employee_d is a dimension table, and the associated column employee_no is the primary key.

[Optimization and rewriting]

SELECT

T.CREATE_INVOICE_USER,

T.PERIOD_ID,

T.AP_INVOICE_ID,

T.AP_INVOICE_NUM,

T.AP_BATCH_NAME,

nvl(EMP1_0.EMPLOYEE_NO, EMP1_1.EMPLOYEE_NO) AS EMPLOYEE_NO,

nvl(EMP1_0.EMPLOYEE_NAME, EMP1_1.EMPLOYEE_NAME) AS ERP_ACCOUNTANT_ENAME

FROM DWACTDI.DWR_AP_GLOBAL_INVOICE_DETAIL_F_I T

LEFT JOIN DWRDIM_DW1.DWR_DIM_EMPLOYEE_D EMP1_0 ON (EMP1_0.SCD_ACTIVE_IND = 1 AND(T.CREATE_INVOICE_USER = EMP1_0.EMPLOYEE_NO))

LEFT JOIN DWRDIM_DW1.DWR_DIM_EMPLOYEE_D EMP1_1 ON (EMP1_1.SCD_ACTIVE_IND = 1 AND(SUBSTR(T.CREATE_INVOICE_USER, 2) = EMP1_1.EMPLOYEE_NO))

The execution information after rewriting is as follows

cke_115.png

Click to follow and learn about Huawei Cloud’s new technologies as soon as possible~

Lei Jun: The official version of Xiaomi’s new operating system ThePaper OS has been packaged. A pop-up window on the Gome App lottery page insults its founder. The U.S. government restricts the export of NVIDIA H800 GPU to China. The Xiaomi ThePaper OS interface is exposed. A master used Scratch to rub the RISC-V simulator and it ran successfully. Linux kernel RustDesk remote desktop 1.2.3 released, enhanced Wayland support After unplugging the Logitech USB receiver, the Linux kernel crashed DHH sharp review of "packaging tools": the front end does not need to be built at all (No Build) JetBrains launches Writerside to create technical documentation Tools for Node.js 21 officially released
{{o.name}}
{{m.name}}

Guess you like

Origin my.oschina.net/u/4526289/blog/10120471