Explore the features of Apache ShardingSphere SQL Parse Format

Chen Chuxin, SphereEx middleware R&D engineer, Apache ShardingSphere Committer, currently focuses on the research and development of Apache ShardingSphere kernel modules.

Friends who often use databases must have seen extremely complex SQL. Take the following SQL statement as an example, can you immediately see the meaning of this SQL?

select a.order_id,a.status,sum(b.money) as money from t_order a inner join (select c.order_id as order_id, c.number * d.price as money from t_order_detail c inner join t_order_price d on c.s_id = d.s_id) b on a.order_id = b.order_id where b.money > 100 group by a.order_id

Is it easier to understand after formatting:

SELECT a . order_id , a . status , SUM(b . money) AS money
FROM t_order a INNER JOIN 
(
        SELECT c . order_id AS order_id, c . number * d . price AS money
        FROM t_order_detail c INNER JOIN t_order_price d ON c . s_id = d . s_id
) b ON a . order_id = b . order_id
WHERE 
        b . money > 100
GROUP BY a . order_id;

I believe that the first step for you to get complex SQL analysis is to format the SQL, and then you can further analyze the SQL semantics based on the formatted content. The formatting function of SQL is also one of the necessary functions of many database-related software. Based on this requirement, Apache ShardingSphere has launched its own SQL formatting tool, SQL Parse Format, relying on its own database dialect parsing engine.

SQL Parse Format is one of the functions of the Apache ShardingSphere parsing engine and will be the basis for the SQL auditing function in future planned releases. This article will lead readers to understand the SQL Parse Format function in a simple way, understand its underlying principles, usage and how to participate in the development of SQL Parse Format.

Parser Engine

As one of the functions of the Apache ShardingSphere parsing engine, SQL Parse Format is a unique and relatively independent function in the parsing engine. To understand the SQL Parse Format function, you need to first understand the parsing engine of Apache ShardingSphere.

The original intention of the Apache ShardingSphere parsing engine is to extract key information in SQL, such as fields used for sub-database sub-tables, encrypted and rewritten columns, and so on. With the continuous development of Apache ShardingSphere, the parsing engine has also undergone 3 iterations of product updates.

The first generation parsing engine uses Druid as the SQL parser, which is used in versions before 1.4.x and has excellent performance. The second-generation parsing engine adopts a fully self-developed method. Due to the different purpose of use, the second-generation product adopts a semi-understanding method of SQL, only extracting the context information concerned by the fragmented data, without generating a parse tree or traversing twice. , so the performance and compatibility are further improved. The third-generation parsing engine uses ANTLR as the generator of the parsing engine to generate a parse tree, and then performs secondary traversal access to the parse tree to extract context information. After using ANTLR as the parsing engine generator, the compatibility of SQL can be greatly improved, and many functions of Apache ShardingSphere can also be quickly developed based on this foundation. Version 5.0.x also made a lot of performance optimizations for the third-generation parsing engine, including changing the traversal method from Listener to Visitor, adding parsing result caching for precompiled SQL statements, and so on.

The realization of the SQL Parse Format function is due to the creation of the third-generation parsing engine. Next, let's focus on the SQL Parse Format functionality.

SQL Parse Format

SQL Parse Format is a tool for formatting SQL statements. The SQL Parse Format function will also be used in the SQL auditing function in the future. It is convenient for users to view historical SQL, display formatted SQL through reports, or perform further analysis and processing on SQL.

For example, the following SQL will become the following format after being formatted by SQL Parse Format. It makes parts of SQL more prominent and clearer by wrapping lines and capitalizing keywords.

select age as b, name as n from table1 join table2 where id = 1 and name = 'lu';
-- 格式化
SELECT age AS b, name AS n
FROM table1 JOIN table2
WHERE 
        id = 1
        and name = 'lu';

After understanding the basic functions of SQL Parse Format, let's explore the principles behind SQL Parse Format.

Interpretation of the principle of SQL Parse Format

Take the following SQL as an example, let's explore how it is formatted in Apache ShardingSphere.

select order_id from t_order where status = 'OK'

Apache ShardingSphere uses ANTLR4 as the parsing engine generator tool, so we first follow the ANTLR4 method to define the grammar in the file (taking MySQL as an example). .g4 select

simpleSelect
    : SELECT ALL? targetList? intoClause? fromClause? whereClause? groupClause? havingClause? windowClause?
    | SELECT distinctClause targetList intoClause? fromClause? whereClause? groupClause? havingClause? windowClause?
    | valuesClause
    | TABLE relationExpr
    ;

We can easily view the syntax tree generated by SQL through IDEA's ANTLR4 plugin (https://plugins.jetbrains.com/plugin/7358-antlr-v4).

ANTLR4 will compile the grammar file we defined, first perform lexical analysis on SQL, split SQL into indivisible parts, namely tokens, and divide these tokens into keywords according to the dictionary values ​​provided by different databases to express formulas, literals and operators. For example, in the image above we get keywords , , , and variables , , , and , respectively . Then ANTLR4 will convert the output of the parser into the syntax tree shown above. SELECTFROMWHERE= orderidtorderstatusOK

Combined with the source code in Apache ShardingSphere, the above process is reproduced as follows.

String sql = "select order_id from t_order where status = 'OK'";
CacheOption cacheOption = new CacheOption(128, 1024L, 4);
SQLParserEngine parserEngine = new SQLParserEngine("MySQL", cacheOption, false);
ParseContext parseContext = parserEngine.parse(sql, false);

The SQLParserEngine of Apache ShardingSphere is the encapsulation and abstraction of ANTLR4 parsing. It will load the parser of the database dialect through SPI. Users can further extend the data dialect through the SPI extension point. In addition, a cache mechanism has been added internally to improve performance. Let's focus on the relevant code for parsing.

public ParseContext parse(final String sql) {
    ParseASTNode result = twoPhaseParse(sql);
    if (result.getRootNode() instanceof ErrorNode) {
        throw new SQLParsingException("Unsupported SQL of `%s`", sql);
    }
    return new ParseContext(result.getRootNode(), result.getHiddenTokens());
}

private ParseASTNode twoPhaseParse(final String sql) {
    DatabaseTypedSQLParserFacade sqlParserFacade = DatabaseTypedSQLParserFacadeRegistry.getFacade(databaseType);
    SQLParser sqlParser = SQLParserFactory.newInstance(sql, sqlParserFacade.getLexerClass(), sqlParserFacade.getParserClass(), sqlCommentParseEnabled);
    try {
        ((Parser) sqlParser).getInterpreter().setPredictionMode(PredictionMode.SLL);
        return (ParseASTNode) sqlParser.parse();
    } catch (final ParseCancellationException ex) {
        ((Parser) sqlParser).reset();
        ((Parser) sqlParser).getInterpreter().setPredictionMode(PredictionMode.LL);
        try {
            return (ParseASTNode) sqlParser.parse();
        } catch (final ParseCancellationException e) {
            throw new SQLParsingException("You have an error in your SQL syntax");
        }
    }
}

twoPhaseParseIt is the core of parsing. First, it will be loaded into the corresponding parsing class according to the database type, and then the parser instance of ANTLR4 will be generated through the reflection mechanism. Then, through the two parsing methods officially provided by ANTLR4, firstly, fast parsing is performed. If the quick parsing fails, regular parsing will be performed. Most of the SQL can get results through quick parsing to improve parsing performance. After parsing, we get the parse tree.

So how does Apache ShardingSphere get the formatted SQL from the parse tree? In fact, it is achieved through the Visitor method. ANTLR4 provides two ways to access the syntax tree, including Listener and Visitor. ShardingSphere uses the Visitor method to access the syntax tree. The code below shows how to get the formatted SQL from the syntax tree.

SQLVisitorEngine visitorEngine = new SQLVisitorEngine("MySQL""FORMAT"new Properties());
String result = visitorEngine.visit(parseContext);

Apache ShardingSphere is also an abstraction and encapsulation of various dialect accessors. The core method is as follows. SQLVisitorEngine

public <TT visit(final ParseContext parseContext) {
    ParseTreeVisitor<T> visitor = SQLVisitorFactory.newInstance(databaseType, visitorType, SQLVisitorRule.valueOf(parseContext.getParseTree().getClass()), props);
    T result = parseContext.getParseTree().accept(visitor);
    appendSQLComments(parseContext, result);
    return result;
}

The above visit method first determines the accessor to be used according to the database type and accessor type, and the accessor is also instantiated by reflection internally. Currently, visitorType supports two methods, one is formatting, and the other is a method commonly used by Apache ShardingSphere, which converts SQL into Statement information, extracts relevant context information, and serves subsequent functions such as sub-database and sub-table. In fact, this is the only difference between the SQL Parse Format function and the ordinary parsing engine function. Next, I will use the above SQL as an example to show how Visitor formats SQL through specific code. FORMAT STATEMENT

 

MySQLFormatSQLVisitorResponsible for the access task of the SQL, we can clearly see the execution path of this access through the DEBUG code, as shown in the following figure. Visitor traverses each part of the grammar tree, and ANTLR4 generates default methods for visiting each node according to the defined grammar rules. Apache ShardingSphere covers key methods to complete the SQL formatting function.

The following code snippet can help us understand the way Visitor implements formatting. When the Visitor traverses to the select, it will format it first, and then access the projection. The implementation of the internal formatting of the projection will be further implemented through the visitProjections method. Empty lines are processed before accessing from. The object instantiated by Visitor maintains a StringBuilder to store the formatted result. Since the parsers and accessors used by each SQL are newly instantiated objects, there are no thread safety concerns. After the final traversal, Apache ShardingSphere will output the result of StringBuilder, then we will get the formatted SQL.

public String visitQuerySpecification(final QuerySpecificationContext ctx) {
    formatPrint("SELECT ");
    int selectSpecCount = ctx.selectSpecification().size();
    for (int i = 0; i < selectSpecCount; i++) {
        visit(ctx.selectSpecification(i));
        formatPrint(" ");
    }
    visit(ctx.projections());
    if (null != ctx.fromClause()) {
        formatPrintln();
        visit(ctx.fromClause());
    }
    if (null != ctx.whereClause()) {
        formatPrintln();
        visit(ctx.whereClause());
    }
    if (null != ctx.groupByClause()) {
        formatPrintln();
        visit(ctx.groupByClause());
    }
    if (null != ctx.havingClause()) {
        formatPrintln();
        visit(ctx.havingClause());
    }
    if (null != ctx.windowClause()) {
        formatPrintln();
        visit(ctx.windowClause());
    }
    return result.toString();
}

I believe that through the above process analysis and code display, readers and friends can roughly understand the principle of SQL Parse Format.

SQL Parse Format User Guide

After understanding the principle of SQL Parse Format, using SQL Parse Format is also very simple.

For Java applications, you only need to add dependencies and call the api.

  • import dependencies
<dependency>
    <groupId>org.apache.shardingsphere</groupId>
    <artifactId>shardingsphere-sql-parser-engine</artifactId>
    <version>${project.version}</version>
</dependency>

<dependency>
    <groupId>org.apache.shardingsphere</groupId>
    <artifactId>shardingsphere-sql-parser-mysql</artifactId>
    <version>${project.version}</version>
</dependency>
  • call api
public static void main(String[] args{
    String sql = "select order_id from t_order where status = 'OK'";
    CacheOption cacheOption = new CacheOption(1281024L, 4);
    SQLParserEngine parserEngine = new SQLParserEngine("MySQL", cacheOption, false);
    ParseContext parseContext = parserEngine.parse(sql, false);
    SQLVisitorEngine visitorEngine = new SQLVisitorEngine("MySQL""FORMAT"new Properties());
    String result = visitorEngine.visit(parseContext);
    System.out.println(result);
}
  • Properties supports the following parameters

If you use ShardingSphere-Proxy, you can also use the SQL Parse Format function through DistSQL syntax.

mysql> FORMAT select orderid from tuser where status = 'OK';
+-----------------------------------------------------+
| formattedresult                                    |
+-----------------------------------------------------+
| SELECT orderid
FROM tuser
WHERE
        <span class="hljs-builtin" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(170, 87, 60); word-wrap: inherit !important; word-break: inherit !important;">status = 'OK'; |
+-----------------------------------------------------+

For the parsing engine in the Statement mode mentioned above, you can also easily view the result of converting SQL to SQLStatement.

mysql> parse SELECT id, name FROM t_user WHERE status = 'ACTIVE' AND age > 18;
+----------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| parsed_statement     | parsed_statement_detail                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          |
+----------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| MySQLSelectStatement | {"projections":{"startIndex":7,"stopIndex":14,"distinctRow":false,"projections":[{"column":{"startIndex":7,"stopIndex":8,"identifier":{"value":"id","quoteCharacter":"NONE"}}},{"column":{"startIndex":11,"stopIndex":14,"identifier":{"value":"name","quoteCharacter":"NONE"}}}]},"from":{"tableName":{"startIndex":21,"stopIndex":26,"identifier":{"value":"t_user","quoteCharacter":"NONE"}}},"where":{"startIndex":28,"stopIndex":63,"expr":{"startIndex":34,"stopIndex":63,"left":{"startIndex":34,"stopIndex":50,"left":{"startIndex":34,"stopIndex":39,"identifier":{"value":"status","quoteCharacter":"NONE"}},"right":{"startIndex":43,"stopIndex":50,"literals":"ACTIVE"},"operator":"\u003d","text":"status \u003d \u0027ACTIVE\u0027"},"right":{"startIndex":56,"stopIndex":63,"left":{"startIndex":56,"stopIndex":58,"identifier":{"value":"age","quoteCharacter":"NONE"}},"right":{"startIndex":62,"stopIndex":63,"literals":18},"operator":"\u003e","text":"age \u003e 18"},"operator":"AND","text":"status \u003d \u0027ACTIVE\u0027 AND age \u003e 18"}},"unionSegments":[],"parameterCount":0,"commentSegments":[]} |
+----------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

For more DistSQL functions, please check the DistSQL documentation: https://shardingsphere.apache.org/document/current/cn/concepts/distsql/

Epilogue

At present, Apache ShardingSphere only implements the formatting function of the MySQL dialect, and other dialects have not yet been implemented. I believe that if you understand the principle and use, you will be able to easily participate in the development of the SQL Parse Format function. If you are interested in participating in contributing open source code, you are also welcome to participate in the community contribution. I believe that by contributing code, you can have a deeper understanding of the related functions of Apache ShardingSphere.

GitHub issue:

https://github.com/apache/shardingsphere/issues

Contribution Guidelines:

https://shardingsphere.apache.org/community/cn/contribute/

Chinese Community:

https://community.sphere-ex.com/

{{o.name}}
{{m.name}}

Guess you like

Origin my.oschina.net/u/5137513/blog/5466136