GaussDB Technical Interpretation Series SQL Audit, a SQL audit tool for application development

This article is shared from Huawei Cloud Community " GaussDB Technical Interpretation Series: SQL Audit, SQL Audit Tool for Application Development" by Huawei Cloud Database and Application Migration Expert.

foreword

Let's start with a SQL statement (take a traditional stand-alone database as an example).

Perhaps this is a SQL statement hidden in our business code. For an ordinary developer, this statement is neatly written, logically clear, and there is no problem. It can be directly pushed into the code warehouse and delivered online. But an experienced developer or database administrator may find that there are many optimization points in this SQL:

  • Is there an index on the id field of the two tables?

  • The like statement does not conform to the leftmost matching principle, can it be rewritten?

  • The create_time judgment in the where condition of the test_1 table does not conform to the unilateral principle, and cannot be indexed, and can be rewritten;

  • Union will remove the result set, the efficiency is low, can it be replaced by union all?

  • The id field of the test_2 table is referenced by the function, and may not use the index, which can be optimized;

  • Does the test_2 table have a joint index of id and name? Can you add a hint to specify a specific index to improve query performance?

It seems that after the above analysis, the SQL can take on a new look and run like a fly on the database, but is this over? In fact, no, it may have been optimized to the extreme on a stand-alone database, but what if our database is a distributed database? It may bring new performance problems. We need to consider whether the id in the where condition is a distribution key, whether the concat function will affect the pushdown of the operator... a series of problems will arise.

This is actually the status quo we are facing. The technical capabilities of developers are uneven, and the limitations of DBA's database knowledge lead to bad SQL everywhere. Moreover, with the continuous change or evolution of the database, some good SQL may gradually become necessary. Optimized bad SQL, we have to keep looking for their traces.

Introduction to SQL Audit Audit Tool

There are many business departments in Huawei. They have deep use of traditional stand-alone databases, MySQL, PostgreSQL and other databases, and have been plagued by bad SQL. With the large-scale application of GaussDB in internal business systems, the existing SQL in GaussDB High-quality operation is also facing challenges, so we developed the SQL Audit tool. Based on the SQL development specifications accumulated by various business departments within the company and the excellent practices of the GaussDB database for many years, we sorted out hundreds of SQL audit rules. Structure/index design, SQL performance optimization, distributed key and operator pushdown and other common problems that affect SQL quality can be analyzed and reviewed in depth. At the same time, we have developed some plug-ins, which are directly integrated into the development pipeline, automatically The code warehouse obtains SQL statements to achieve one-click auditing.

The core process of SQL audit can be divided into the following three stages:

SQL acquisition: that is, from which channels can we obtain the SQL that needs to be audited, and the ability to acquire determines whether we can conduct a more comprehensive audit of the code under development;

SQL syntax analysis: it is to generate and analyze the syntax tree for each specific SQL;

SQL rule audit: It disassembles each part of the SQL statement, matches with the relevant audit rule items one by one, finds out points to be optimized or risks, and finally forms an audit report.

SQL get

There are various channels for customers to access the database through SQL, such as client tools, command lines, SQL scripts, application codes...

Code development can also use various methods such as JDBC, ODBC, and underlying API calls. SQL statements can be spliced ​​directly in the code, or through configuration files (such as: Mybatis), or through ORM frameworks (such as: Hibernate) to access the database , so it is very difficult to get all the SQL of the customer.

SQL Audit supports most of the current SQL usage scenarios, and continues to expand the scope of SQL acquisition, striving to comprehensively audit all the SQL used by customers. The following figure shows the SQL acquisition scope supported by the current SQL Audit tool .

  • manual input

Manual input provides customers with a simple and easy-to-operate platform. Customers can input their own SQL statements into the SQL Audit tool for audit at any time, and directly adjust the statements according to the audit results. Upload it for batch review.

  • source code

Source code is the main source of bad SQL, but because of its variety of programming languages ​​(C/C++/JAVA/GO/PYTHON/SHELL...) All get complete, we divide the SQL in the code into three categories:

1) Source code splicing SQL

The SQL statement is generated by splicing. The splicing process may introduce many variables. In this case, the complete SQL cannot be obtained, so the method of extracting SQL through static files will have great defects. The SQL Audit tool supports syntax for Java code. Parsing and extracting the SQL inside, codes in other languages ​​are currently not supported.

2) ORM framework without SQL

For example, ORM frameworks such as Hibernate and SQLAlchemy cannot obtain SQL statements from the code. The SQLAudit tool provides Java-based binary rewriting technology to dynamically monitor the JDBC API to obtain SQL statements when the JVM is running.

3) Configure the ORM of SQL

Many business systems build the ability to access databases based on the Mybatis framework. Mybatis writes SQL statements through annotations or configuration files. The SQL Audit tool can perform in-depth analysis of Mybatis annotations and configuration files, and the success rate of extracting SQL is over 99%.

  • database object

The design of database table structure, index and constraint, and the writing of PL/SQL such as stored procedures and functions play a decisive role in the performance of the database. The SQL Audit tool can connect to the database to obtain all object definitions in the database, from the standardization of the design ( Such as: naming conventions, length/case restrictions), rationality (such as: whether the index is reasonable) and performance are considered, and audit suggestions are given.

  • database log

In order to more comprehensively obtain the SQL statements that occur in the database, it is also a feasible solution to start from the log level of the database itself. Analyzing the redo of the database, opening the database audit log, and querying the SQL cache can effectively obtain the running SQL statements. , the SQL Audit tool also supports the ability to obtain SQL statements through database logs.

  • traffic capture

In order to solve the problem that all SQL cannot be obtained from the source code, we have developed the SQL audit capability based on traffic capture, which can greatly improve the completeness of SQL acquisition. IP+port is used as a unified entrance to the database, which can basically contain all SQL statements generated by customer business and operation and maintenance. Through bypass monitoring of the database server port, network protocol packets are obtained, and after analyzing the database network protocol and repeating SQL filtering , to obtain valid SQL statements, and finally pass these SQLs into the SQL Audit tool for auditing.

SQL parsing

The process of SQL parsing is the process of parsing SQL statements into syntax trees according to grammatical rules. The general parsing process is divided into lexical analysis and grammatical analysis, and then generates a syntax tree. Most tools for SQL statement analysis are implemented by directly traversing the syntax tree. Yes, the SQL Audit tool does not directly parse the syntax tree, but adds a process to parse the syntax tree into a Java description class. All subsequent audit rules are based on this syntax description class, which greatly improves the development of audit rules. Efficiency, while reducing the difficulty of development.

SQL Audit

  • Rich audit rules

The core of the audit is the audit rules, and the core of the audit rules is the understanding of the database + the practical experience summary of the customer's business development understanding. We combined the best practices of the GaussDB database + the actual usage scenarios of internal and external customers to sort out the audit rules Hundreds of rules, currently 78 rules are supported in the product, including common specifications and performance problems in the SQL development process, and more rules will continue to be enriched in the product in the future.

SQL Audit also provides a template configuration function, and customers can flexibly select the rules to be audited according to their own business scenarios.

  • In-depth review

The SQL Audit review process is shown in the following figure:

 

When a SQL is input into SQL Audit, the SQL will be parsed first, and then the metadata information (column information, index information, etc.) The performance of the statement may be affected by the execution plan, and then the execution plan of the statement will be obtained from the database, and all the above information will be combined to match each relevant rule one by one for review, and finally output all rule violation items.

Practice case

Part of the business code of a certain system in HUAWEI CLOUD is developed based on the JAVA Mybatis framework. During the process of replacing the database with GaussDB, a large amount of SQL has been modified for compatibility. In order to ensure that the modified SQL can be used in the GaussDB database with high quality Running in the center, the system conducts a comprehensive audit of the entire code warehouse through the SQL Audit tool. At the same time, the SQL Audit audit plug-in is deployed in the pipeline to continuously guard the incremental code. SQL Audit found a large number of non-standard and low-performance SQL, which avoided the risk of SQL flowing into the production environment in advance. Developers optimized the code according to the audit report of SQL Audit, and the business continued to run stably after switching to GaussDB.

Taking one of the tasks as an example, the task involved a total of 1,881 SQLs, and more than 300 SQLs with problems were found in the audit.

Statistical report of audit results

Audit Question SQL Details

Summarize

While building its core competitiveness, GaussDB hopes to provide customers with a full-process, full-link, development- and operation-oriented database autopilot experience. The SQL automatic audit tool we released this year helps customers write good SQL and reject bad SQL during the development process.

In the future, we will further support PL/SQL auditing, such as auditing of stored procedures, functions, triggers, packages, etc., as well as the combination with AI large models, which have already done a good job in the processing of SQL language , the SQL Audit tool will interface with Huawei's Pangu large model, and enhance its auditing, optimization, and rewriting capabilities through the capabilities of the large model.

​​Click to follow and learn about Huawei Cloud's fresh technologies for the first time ~

It is infinitely faster than Protocol Buffers. After ten years of open source, Cap'n Proto 1.0 was finally released. The postdoctoral fellow of Huazhong University of Science and Technology reproduced the LK-99 magnetic levitation phenomenon. Loongson Zhongke successfully developed a new generation of processor Loongson 3A6000 miniblink version 108. The world's smallest Chromium core ChromeOS splits the browser and operating system into an independent 1TB solid-state drive on the Tesla China Mall, priced at 2,720 yuan Huawei officially released the security upgrade version of HarmonyOS 4, causing all Electron-based applications to freeze AWS will begin to support IPv4 public network addresses next year Official release of Nim v2.0, an imperative programming language
{{o.name}}
{{m.name}}

Guess you like

Origin my.oschina.net/u/4526289/blog/10092484