This article is shared from the Huawei Cloud Community " GaussDB (DWS) Performance Tuning: A Case Study of Performance Bottleneck Problems Caused by Excessive Number of Filtered Rows During Table Scans " by O Paoguolai~.
1. [Problem Description]
During the execution of the SQL statement, a large table with a data volume of 1.2 billion is scanned, and 99% of the data is filtered, leaving only 617 rows of data. The performance bottleneck lies in scanning the table.
2. [Original statement]
set search_path = 'bi_dashboard'; WITH F_SRV_DB_DIM_PRD_D AS (SELECT EXTERNAL_NAME FROM ( SELECT MKT_NAME EXTERNAL_NAME FROM BI_DASHBOARD.DM_MSS_ITEM_PRODUCT_D PRD WHERE PRD.COMPANY_BRAND =any(array[string_to_array('HUAWEI',',')]) AND PRD .MKT_NAME =any (array[string_to_array('Enjoy 60, Enjoy 50, Enjoy 60X, Enjoy 60 Pro, Enjoy 50 Pro, Enjoy 50z, nova 10z, Enjoy 20e, Enjoy 20 Pro, Enjoy 10e, Enjoy 10 Plus, Enjoy 20 SE, Enjoy 10, nova 11i, Enjoy 20 Plus, Enjoy 9 Plus, Enjoy 20 5G, nova Y90, Enjoy 10S, nova Y70, Enjoy Z, Enjoy 9S, nova 8 SE Active Edition, Maimang 9 5G, Y9s, Maimang 9 5G',',')]) ) WHERE EXTERNAL_NAME<>'SNULL' GROUP BY EXTERNAL_NAME), V_PERIOD AS ( SELECT PERIOD_ID AS PERIOD_ID_M, LEAST(TO_CHAR(PERIOD_END_DATE, 'YYYYMMDD '), '20230630') AS PERIOD_ID, PERIOD_ID AS DATES FROM BI_DASHBOARD.RPT_TML_ACCOUNT_PERIOD_D WHERE PERIOD_TYPE = 'M' AND PERIOD_ID BETWEEN 202207 AND 202306 ), V_DATA_BASE AS ( SELECT A.PERIOD_ID, IFNULL(A.CHANNEL_ NAME, 'SNULL') AS DISTRIBUTOR_CHANNEL_NAME, SUM(A.SO_QTY_MTD) AS SO_QTY, SUM(DECODE(A.PERIOD_ID, 20230630, A.SO_QTY_MTD)) AS SO_QTY_ORDER select count(*) FROM DM_MSS_CN_PC_REP_RP_ST_D_F A INNER JOIN F_SRV_DB_DIM_PRD_D PRD ON A.EXTERNAL_NAME = PRD.EXTERNAL_NAME WHERE 1 = 1 AND A.CHANNEL_ID IN ('100013388802') AND A.ORG_KEY IN (10000651) AND A.SALES_FLAG IN ('1', '0') AND A.PERIOD_ID IN (20220731,20221031,20220930,20220831,20221130, 20221231,20230131,20230228,20230430,20230331,20230531,20230630) AND (A.SO_QTY_MTD <> 0) -- Filter all data whose date SO_QTY is 0 GROUP BY A.PERIOD_ID, IFNULL(A.CHANNEL_NAME, 'SNULL' ) ) , V_DATA AS ( SELECT PERIOD_ID, NVL(DISTRIBUTOR_CHANNEL_NAME, 'Total') AS DISTRIBUTOR_CHANNEL_NAME, SUM(SO_QTY) AS SO_QTY, SUM(SO_QTY_ORDER) AS SO_QTY_ORDER FROM V_DATA_BASE A GROUP BY GROUPING SETS ((PERIOD_ID), (PERIOD_ID, DISTRI BUTOR_CHANNEL_NAME)) ) SELECT STRING_AGG(P.DATES, ',' ORDER BY P.PERIOD_ID_M) AS PERIOD_LIST, B.DISTRIBUTOR_CHANNEL_NAME, STRING_AGG(NVL(TO_CHAR(ROUND(A.SO_QTY)), '0'), ',' ORDER BY P.PERIOD_ID_M ) AS SO_QTY FROM V_PERIOD P FULL JOIN (SELECT DISTINCT DISTRIBUTOR_CHANNEL_NAME FROM V_DATA) B ON 1 = 1 LEFT JOIN V_DATA A ON A.PERIOD_ID = P.PERIOD_IDPERIOD_ID = P.PERIOD_ID AND A.DISTRIBUTOR_CHANNEL_NAME = B.DISTRIBUTOR_CHANNEL_NAME GROUP BY B.DISTRIBUTOR_CHANNEL_NAME ORDER BY DECODE(B.DISTRIBUTOR_CHANNEL_NAME, 'Total', 0, 'SOURCE IS NULL', 2, '源为空', 3, 'SNULL', 4, 1), SUM(A.SO_QTY_ORDER) DESC NULLS LAST LIMIT 50 OFFSET 0
3. [Performance Analysis]
As can be seen from the performance execution plan in the figure above (the complete execution plan is in Appendix 1), the SQL statement is slow in scanning table a (bi_dashboard.dm_mss_cn_pc_rep_rp_st_d_f_test). The filter conditions during scanning include: sales_flag, so_qty_mtd, channel_id, org_key, period_id. The original local clustering key PCK on the table only contains period_id and does not include one of the other three filter conditions. Therefore, the PCK can be adjusted to reduce Execution time of scanning table a.
Supplement: local clustering key
Partial Cluster Key (PCK) is an index technology under column storage that uses min/max sparse indexes to achieve fast scanning of base tables. Partial Cluster Key can specify multiple columns, but it is generally not recommended to exceed 2 columns. PCK is suitable for accelerating point queries on large column-stored tables.In addition, there are many in values (12) in the where condition in the view statement. In DWS, the conditions after in can only be 5 by default. If there are more than 6, the filtering will not be pushed down . At this time, you can use or to combine the 12 Value rewriting,
A.PERIOD_ID IN (20220731,20221031,20220930,20220831,20221130) or A.PERIOD_ID IN (20221231,20230131,20230228,20230430,20230331) or A.PERIOD_ID IN (20230531,20230630)
At this time, the SQL statement execution time is reduced to 487ms, and the complete performance plan is shown in Appendix 2.
- Attachment: After optimization—performance.txt 466.64KB
- Attachment: Before optimization—performance.txt 449.47KB
Click to follow and learn about Huawei Cloud’s new technologies as soon as possible~
Alibaba Cloud suffered a serious failure and all products were affected (restored). Tumblr cooled down the Russian operating system Aurora OS 5.0. New UI unveiled Delphi 12 & C++ Builder 12, RAD Studio 12. Many Internet companies urgently recruit Hongmeng programmers. UNIX time is about to enter the 1.7 billion era (already entered). Meituan recruits troops and plans to develop the Hongmeng system App. Amazon develops a Linux-based operating system to get rid of Android's dependence on .NET 8 on Linux. The independent size is reduced by 50%. FFmpeg 6.1 "Heaviside" is released