GaussDB single SQL slow performance analysis

Problem Description

The performance of single SQL is slow, and the delay requirements of customer jobs may not meet customer expectations.

problem phenomenon

  • View alarms and find slow SQL alarms.
  • Analyze the WDR report for abnormal SQL.
  • Long unfinished SQL.
  • Users reported slow SQL.

alarm

  • Alarms related to interface delay and success rate on the service side.
  • Alarms related to database kernel P80/P95

Single SQL slow performance analysis

insert image description here

Step 1: Determine the target SQL

  • Actively discover:
    1. View alarms and find slow SQL alarms.
    2. Regularly check the WDR report and find abnormal SQL, such as Top SQL that consumes a lot of CPU.
    3. Long transaction alarm, it is found that there are SQLs that have not ended for a long time.
  • Passive tuning: users and business feedback slow SQL.

Step 2: Collect statistical information and rule out impacts in advance

  1. Get the complete SQL statement and the structure of related tables in SQL, index information, table size and index size and other information.
  2. Obtain parameter configuration information of the database, including work_mem, maintenance_work_mem, shared buffers, etc. For example, sorting operations or hash operation statements may affect execution efficiency because work_mem is too small.
  3. Obtain the pgstat information of related structures in SQL. pgstat can be used to analyze vacuum and analyze conditions, as well as the state information of tables and indexes. This part can be obtained through pg_stat_all_tables, pg_stat_all_indexes, pg_statio_all_tables, pg_statio_all_indexes and other views. For specific view analysis, please refer to "Single Slow SQL Performance - View Analysis".
  4. For slow SQL that may write a large number of logs, it is necessary to confirm whether the environment has enabled the flow control (recovery_time_target) operation. In order to ensure the RTO for the sudden surge of xlog logs, the flow control may limit the speed of xlog synchronization to the standby machine, resulting in statement Execution slows down. For specific troubleshooting methods, refer to "Slow Single SQL Performance - View Analysis".
  5. Collect the system resources in the time period corresponding to the slow SQL, and check whether the system resources are abnormal.

Step 3: Analyze SQL performance bottlenecks

  • If the target SQL is not finished for a long time.

    1. First determine where the SQL is slow. You can analyze the Top Wait Event information of the SQL through pg_thread_wait_status or ASP information. For the specific analysis method, refer to "Single SQL Performance Slow - View Analysis". For the description of wait events, please refer to • Abnormal Wait Events.
    2. If the SQL has a large number of waiting lock events, you can find the lock waiting relationship through the block sessionid information in the ASP, and determine the reason for the lock waiting.
    3. If the statement execution time exceeds the slow SQL threshold log_min_duration_statement, you can view the plan through the Full SQL view. For specific analysis methods, see "Single SQL Performance Slow-View Analysis".
    4. According to the found slow SQL, communicate with the business whether the completed business SQL can be obtained, and try to reproduce it.
  • Single SQL has been slow.

    • To obtain the slow SQL statement, first consider obtaining the plan through explain, which can quickly determine the performance bottleneck of the statement, and analyze the specific cause based on the information obtained in step 2.
    • In addition, you can analyze SQL KPI information through summary_statement and statement_history. First, you can determine the specific time-consuming stage through the SQL time model, and then determine the time-consuming reason for SQL based on information such as row activities and statement-level wait events. For specific analysis, refer to "Single SQL Slow Performance - View Analysis".
  • Single SQL is occasionally slow.

    • If the execution time of SQL exceeds the slow SQL threshold log_min_duration_statement, check the slow SQL execution plan, time model, row activity, wait event, lock and other information through statement_history. On the one hand, analyze from the plan, and refer to "Single SQL Performance Slow- View Analysis".
    • If there are other events in the top wait event of slow SQL, you can check the lock waiting relationship between sessions through the ASP information.
    • If the statement execution time does not exceed the log_min_duration_statement threshold, the first option can be considered to enable full SQL, set track_stmt_stat_level = 'L1,L1', it should be noted that opening SQL will record all executed statements (can be opened at the session level), which will take up a lot of disk , you need to know how to close it after use; the second method can consider calling the dynamic interface to track the fixed slowness. Generally, the slow SQL view is analyzed first, the wait event information corresponding to the slow SQL is checked through gs_asp, and the slow SQL information is checked through statement_history.
select * from dynamic_func_control('GLOBAL', 'STMT', 'TRACK', '{"3182919165", "L1"}');   -- 抓此SQL的FULLSQL L2 
select * from dynamic_func_control('GLOBAL', 'STMT', 'UNTRACK', '{"3182919165"}');      -- 取消抓取 
select * from dynamic_func_control('GLOBAL', 'STMT', 'LIST', '{}'); 
select * from dynamic_func_control('LOCAL', 'STMT', 'CLEAN', '{}');

Slow single SQL performance - view analysis

Flow control leads to slow SQL

Commonly used in batch derivative, pressure test, or batch submission scenarios.

  1. Search for words related to sleep during the time period when slow SQL occurs.
  2. Execute the following SQL statement. If the current_sleep_time field has a value, flow control occurs.
SELECT * FROM dbe_perf.global_recovery_status

Concurrency lock conflicts lead to slow SQL

  1. If the slow SQL threshold log_min_duration_statement is reached, check the Top wait event of statement_history.
SELECT statement_detail_decode(details, 'plaintext', true) FROM DBE_PERF.get_global_slow_sql_by_timestamp('start time','end time') WHERE unqiue_sql_id = xxx;
  1. If the slow SQL does not reach the SQL threshold log_min_duration_statement, you can directly check the dbe_perf.local_active_session/gs_asp information in the corresponding time period, and check the corresponding wait event. If the real-time SQL is slow, refer to 5.
  2. If the Top Wait event is acquire lock, analyze the wait lock timeout of the lock through the ASP information, find the corresponding block_sessionid, and query the session blocking the statement by querying the following SQL.
SELECT * FROM gs_asp WHERE sample_time > 'start_time' and sample_time < 'end_time' and query like 'xxx';
  1. Find the corresponding session information according to the block sessionid.
SELECT * FROM gs_asp WHERE sample_time > 'start_time' and sample_time < 'end_time' and query like 'xxx' and sessionid = $block sessionid;
  1. Running SQL in real time is slow, find the corresponding block_sessionid, and query the session that blocks the statement by querying the following SQL.
SELECT a.*,b.wait_status, b.wait_event FROM pgxc_stat_activity as a left join pgxc_thread_wait_status as b on a.pid = b.tid and a.sessionid = b. sessionid and a.coorname = b.node_name and b.sessionid = $block_sessionid;
  1. Analyze the found session information. If the wait event of the session is wait cmd, it means that the statement on the session has been executed and is waiting for the client to send a message.

Table bloat leads to large number of dead tuples

  1. If the slow SQL threshold log_min_duration_statement is reached, check statement_history/DBE_PERF.get_global_slow_sql_by_timestamp. If data_io_time is high or n_blocks_fetched-n_blocks_hit is increased, it means that SQL loads a large number of pages, resulting in increased SQL latency.
SELECT * FROM DBE_PERF.get_global_slow_sql_by_timestamp('start time','end time');
  1. If the SQL does not reach the slow SQL threshold, the first option can be considered to enable full SQL, set track_stmt_stat_level = 'L1,L1', you need to pay attention to open SQL will record all executed statements (can be opened at the session level), it will take up a lot of disk, After using it, you must know how to close it; the second method can consider calling the dynamic interface to track the fixed slowness.
SELECT * FROM dynamic_func_control('GLOBAL', 'STMT', 'TRACK', '{"3182919165", "L1"}');   -- 抓此SQL的FULLSQL L2 
SELECT * FROM dynamic_func_control('GLOBAL', 'STMT', 'UNTRACK', '{"3182919165"}');      -- 取消抓取 
SELECT * FROM dynamic_func_control('GLOBAL', 'STMT', 'LIST', '{}'); 
SELECT * FROM dynamic_func_control('LOCAL', 'STMT', 'CLEAN', '{}');
  1. View the specific IO situation by viewing the SQL plan explain(buffers, analyze). If the statement uses the index but the head fetch is large and the number of rows scanned is small, it means that a lot of visibility judgments need to be made.
  2. Check the pgstat information of the table corresponding to the slow SQL. If n_dead_tup shows that there are a large number of dead tuples, or if last_vacuum shows that vacuum has not been done for a long time, you need to vacuum the relevant table.
SELECT * FROM pg_stat_all_tables where relname = 'xxx';

Bad business statements and bad plans

  1. Collect SQL-related table structure, index, table and index size and other information.
  2. If the slow SQL threshold log_min_duration_statement is reached, look at statement_history/
DBE_PERF.get_global_slow_sql_by_timestamp,获取SQL的计划。
SELECT * FROM DBE_PERF.get_global_slow_sql_by_timestamp('start time','end time');
  1. If the SQL does not reach the slow SQL threshold, the first option can be considered to enable full SQL, set track_stmt_stat_level = 'L1,L1', you need to pay attention to open SQL will record all executed statements (can be opened at the session level), it will take up a lot of disk, After using it, you must understand and close it; the second method can consider calling the dynamic interface to track the fixed slowness to obtain the SQL plan information.
SELECT * FROM dynamic_func_control('GLOBAL', 'STMT', 'TRACK', '{"3182919165", "L1"}');   -- 抓此SQL的FULLSQL L2 
SELECT * FROM dynamic_func_control('GLOBAL', 'STMT', 'UNTRACK', '{"3182919165"}');      -- 取消抓取 
SELECT * FROM dynamic_func_control('GLOBAL', 'STMT', 'LIST', '{}'); 
SELECT * FROM dynamic_func_control('LOCAL', 'STMT', 'CLEAN', '{}');
  1. Or view the plan information of the statement through explain.
    According to the SQL plan information and the corresponding table and index information, confirm whether the SQL statement can be optimized, or whether the index is missing, etc.

Guess you like

Origin blog.csdn.net/GaussDB/article/details/131241388