GaussDB technical interpretation series: performance tuning

Recently, at the 14th China Database Technology Conference (DTCC2023) , GaussDB's "Five Highs and Two Easy" core technologies give the world a better choice , Huawei database technology expert Li Shifu explained in detail the related technologies and application practices of GaussDB performance tuning. .

This article shares with you the practice of GaussDB performance tuning. It mainly includes three parts, namely the overall introduction of performance tuning, key technologies of performance tuning, and application practice of performance tuning.

 

Introduction to GaussDB performance tuning 

We know that database, as system software, plays a key linking role in the entire computer system. You can see that the application interacts with the database through the northbound interface, and the database interacts with the operating system and hardware through the southbound interface. The impact on the performance of the database system is multifaceted. Whether it is hardware specifications, operating system configuration, database system design, application and client connection methods, they will all have a great impact on the final performance of the business, so the performance of the database It is essentially the result of the coordination of the software and hardware of the overall computer system.

Database performance optimization is full of complexity and challenges. It is both subjective and complicated. Performance issues are subjective. For example, if a query consumes 1 second, it is difficult to tell whether its performance is good or bad, and whether it deviates from the business goals. Therefore, when describing performance problems, the goals must be clear, the descriptions specific, and the results measurable. For example, after a problem occurs, information such as hardware configuration, parameter information, deployment form, business scenarios, current results, and expected goals need to be given. Another performance problem faced is complexity. It is difficult to determine which module is causing the problem at a glance. It is necessary to analyze a clear direction, further observe the data flow of the job between different modules, and analyze the problem from a global perspective; If the performance problem is not a single point problem, we need to find the main contradiction causing the problem, and then solve the big problem first to see if it meets the business requirements.

There are some industry terms that often come up when encountering database performance-related issues. For example: IOPS usually refers to the number of read and write IOs per second of the database system data disk, which reflects the read and write capabilities of the disk IO; throughput often refers to the number of transactions per second TPS or the number of queries QPS of the database, which reflects the overall load status. ; Response time refers to the time overhead from the initiation of a query to the return of results in the database, and is used to identify slow SQL; Saturation refers to the number of tasks belonging to the work queue in large concurrency scenarios, reflecting the current backlog of system jobs. and the level of busyness.

Logical architecture of GaussDB database

Before analyzing the performance issues of GaussDB, let’s briefly introduce its logical architecture, which mainly includes the following components:

  • CM cluster management component is responsible for starting and stopping the cluster, node, and instance levels, as well as cluster status query, master selection, master-standby switching, etc.;

  • The OM operation and maintenance management component is responsible for daily operation and management operations, such as installation, upgrade, node replacement, etc.;

  • The GTM global transaction manager is responsible for producing and maintaining globally unique information such as global transaction IDs, snapshots, and sequences to ensure global transaction consistency;

  • The CN coordination node is responsible for receiving application access requests and returning the results to the client; after completing SQL parsing and optimization in the CN, the plan is sent to each DN for execution;

  • DN data node is responsible for storing business data, executing data query services, and returning execution results;

  • The ETCD consistency component stores the cluster's topology status and information, active and standby status information, global transaction ID, sequence dependency ETCD, CMS's own arbitration and CMS's arbitration dependence on the database component ETCD, etc. In the future, ETCD will be changed to the self-developed component DCC Distributed Configuration Center to store cluster configuration information. The core components that affect business query are GTM, CN and DN components.

picture

Query processing process of GaussDB database

After understanding the GaussDB architecture, we briefly sort out the query processing process, disassemble it into each module, and expand on the functions and performance concerns of each module.

  • The query parsing module translates SQL text into a parse tree. The main performance factors here are lexical, syntactic, and semantic analysis efficiency. The technologies used are template query parsing-free and PBE mode binding variable execution.

  • Query optimization will perform logical optimization and physical optimization, and finally generate a physical plan. The core point of performance is efficient plan generation, including plan caching, rewriting rules, cardinality estimation, cost estimation accuracy, etc.

  • In the query execution phase, the query plan is executed and the results are returned. The core performance points here include SMP parallel execution, distributed execution, operator execution based on compilation technology, expression calculation, etc.

  • When reading data during query, the performance involves multiple modules of the storage engine, including efficient data storage, concurrency control, transaction capabilities, etc. In the end, the execution consumption of all stages is combined to determine the end-to-end performance of the query.

GaussDB performance tuning key technologies

The following introduces several key technologies of GaussDB in performance optimization. 

1. Plan caching

Plan caching technology is generally used in OLTP business loads. Because the amount of data involved is small, after the data access path is accelerated through indexes, query parsing, rewriting, and optimization account for a high proportion. For statements in template lines, the plan of the template statement can be cached. Then when statements with different parameters in the same template are executed, the cached plan can be used directly, which greatly improves concurrent throughput. Let's imagine whether the cache plan is for a session or for the entire system, which leads to two concepts, local plan cache and global plan cache. If you only cache executed plans on the session, it may result in a lot of execution plans on each session, occupying a high amount of memory resources. GPC can solve the memory occupation problem very well, but the maintenance cost is high and the management cost is high. Another problem is that the cache plan may be for a certain type of SQL. What if the execution plan is not optimal after the parameters change. GaussDB implements the ability of plan adaptive selection and can automatically configure the best cache plan for different parameters.

2. Intelligent cardinality estimation

Intelligent cardinality estimation mainly solves two problems: first, when to create multi-column statistical information? Second, what type of statistical model is created? The current common solution is to use MCV (a software architecture pattern) and the statistical model of histogram to achieve it. However, these two solutions are more accurate in single-column scenarios. In multi-column scenarios, they only support a single table, and the errors are large and cannot be applied. GaussDB innovatively designs a lightweight Bayesian network operator model based on the library to achieve cardinality estimation in multi-column scenarios; a lightweight operator based on DB4AI completes training and inference in the database, with almost no impact on the kernel. Impact; when automatically analyzing and collecting statistical information, a Bayesian network model is automatically created; when the optimizer performs multi-column cardinality estimation, it calls the trained model and gives the accuracy value.

3. Distributed query execution

GaussDB distributed database uses a variety of technologies to improve query execution performance during the query execution process. The distributed execution framework is used to realize the multi-node parallel processing capabilities of distributed clusters and improve the overall performance of the cluster. When querying complex statements, the re-execution operator will be pushed down to the DN node for execution, such as the AGG operator. When the pushdown operator is executed, the locality of the data will be considered and the calculation will be performed locally as much as possible to reduce the data transmission overhead in the network, thus improving the overall query performance of the database. Within a single node, through SMP parallel technology, the parallel acceleration of multi-core CPUs is used, combined with memory management and control, to improve the performance of a single node.

4. GTM Lite

GTM is responsible for the consistency of global transactions. Traditional GTM components need to maintain each active transaction information. When a job is executed and a snapshot needs to be obtained from GTM, the current active transaction list is obtained. If the amount is particularly large, it will put pressure on network concurrency. GaussDB implements the GTM lite capability and replaces the transaction list with the CSN number. The snapshot obtained is only the CSN number. When the transaction is submitted, only the CSN number is added atomically. The visibility judgment is based on the CSN number. If the CSN of the end of the transaction is smaller than the CSN value in the snapshot, then the tuple is visible, otherwise the tuple is invisible.

5. Log parallel pipeline

The database logging system is very critical and is the key guarantee for data persistence. Because logs have sequential dependencies, traditional databases generally adopt a design of serial flushing of logs. GaussDB uses a log writer log disk writing thread parallel writing mechanism to give full play to the capabilities of multi-channel IO. The main mechanism of the GaussDB log parallel pipeline is to split a large lock into multiple parallel small locks. The transaction logs of some worker threads are written to a transaction log shared buffer. Before the end of each transaction, the corresponding transaction log LSN is ensured. To flush the disk, there is a global LSN atomic plus guaranteed order.

6. NUMA Aware

GaussDB uses NUMA Aware technology to solve the problem of cross-NUMA memory access latency under the multi-core ARM architecture. We mainly took the following actions: NUMA-based transformation of the global data structure. Key data structures include CLOG, Wal insertlock, proc array, etc., which reduced data access delays; bound the kernel worker threads to NUMA Node to avoid cross-NUMA scheduling; utilized Atomic operation instruction set LSE improves computing performance.

GaussDB performance tuning application practice

Finally, the application practice of performance tuning is introduced. Let’s first look at the ideas of performance tuning. If you encounter performance problems, first determine the scope of performance tuning and observe whether a certain system resource has reached a bottleneck, or whether there is SQL blocking or slow SQL. If it is a system resource problem, diagnose and optimize it through system tuning; if it is a SQL-level problem, diagnose and optimize it through SQL tuning, and finally see whether the optimization effect meets the business needs. If one optimization cannot achieve expectations, then It takes multiple iterations to finally achieve the goal.

picture

To locate the overall performance problem of the database system, first determine whether the problem is at the database level or at other levels. Whether the other level problems are caused by database performance degradation caused by other processes on the node, or by improper configuration of operating system parameters. If the problem is at the database level, it is necessary to conduct a comprehensive analysis of the system resource information occupied by the database, database kernel resources, active and backup status, etc., and finally determine the root cause of the problem.

picture

When analyzing the performance of a single SQL statement, you can use the view statement_history or statement to query the execution time consumption of each stage of the statement to determine the performance bottleneck of statement execution. statement_history records detailed SQL information whose execution time exceeds the threshold (log_min_duration_statement, default 3 s), including plan generation time, execution time, lock waiting time and other information; statement records SQL execution information normalized according to unique_sql_id, including execution times, Total execution time, amount of data accessed, memory usage and other information. Then, query blocked sessions and objects through the waiting event view; finally, obtain more fine-grained operator execution status through the execution plan, such as whether to use the index, use the correct index, and whether the distributed execution operator stream involves broadcasting, etc.

picture

Speaking of GaussDB full SQL and slow SQL, what do they include? GaussDB's multi-dimensional indicator monitoring and collection granularity is different and has different impacts on system performance. It is mainly divided into three levels for collection.

  • The L0 level basically does not affect business execution. The collected information includes instance information, statement information, tuples, cache, execution time, etc.;

  • The L1 level has a slight impact on the system, but it is acceptable. This is also the default recommended configuration when the business goes online. It mainly includes execution plan information and lock statistics;

  • Enabling L2 will have a great impact on the system and is usually used for problem location, including fine-grained lock information and waiting time.

Two cases are introduced below. The first one is that the application upgrade caused drastic fluctuations in performance. We first observed the overall execution time of the instance to see which step consumes the highest proportion. We observed a single SQL through the normalized view to see the execution consumption of each step. We found that It was caused by the network sending part. We replaced the application side and the problem was solved after the application was updated.

picture

The second is the overall performance degradation of the cluster. We first observed the thread waiting status on CN and found that some nodes had extremely high waiting times. We analyzed the waiting events on the abnormal nodes and found that the waiting time for wal sync was relatively high. By comparing with the normal nodes, we confirmed that there was a wal sync problem on the node; finally It is analyzed that the log LSN fragmentation difference between the primary and secondary logs is large, and the playback is always performed, which affects the overall performance of the system.

picture

Finally, let’s introduce GaussDB’s two self-tuning tools, namely index recommendation and distribution key recommendation, which are mainly used to provide efficient recommendations. For example, index recommendation can help users automatically recommend appropriate index combinations based on business load and identify redundant indexes and invalid indexes; when recommending indexes, it also outputs positive improvement SQL information and negative improvement SQL information to help users decide whether to use it. Recommended index.

That’s it for sharing this article, thank you all.

Guess you like

Origin blog.csdn.net/GaussDB/article/details/133157851
Recommended