The basic idea of Java parsing SQL

I write sql every day, and I have always been curious about how sql is parsed, so I casually made the whole small demo.

For example, sql is:

String sql = "select name,score,sex from users where name = 'jack' and isdelete <> 1 ; ";

There may be one or more spaces in the middle, so I think there must be a word segmentation class, called Tokenizer.

public class Tokenizer implements Iterator<String> {

    String[] tokens;
    int index = 0;

    public Tokenizer(String sql) throws BadSqlGrammarException {
        sql = sql.trim();
        if (!sql.endsWith(";")) {
            throw new BadSqlGrammarException("SQL未正确结束!");
        }
        //去除多余的空格
        sql = sql.replace(";", " ;").replaceAll("\\s+", " ");
        //分词
        tokens = sql.split(" ");
    }


    @Override
    public boolean hasNext() {
        return index < tokens.length;
    }

    @Override
    public String next() {
        return tokens[index++];
    }

}

This is a Java class called Tokenizer that implements the Iterator interface. It is used to segment the input SQL statement.

The constructor Tokenizer(String sql) accepts an SQL statement as a parameter and processes it. First, it strips whitespace from both ends of the string and checks whether the SQL statement ends with a semicolon. If it does not end with a semicolon, a BadSqlGrammarException is thrown.

Next, it uses regular expression substitution to replace semicolons with spaces plus semicolons, and to replace multiple consecutive spaces with a single space. Then, it splits the SQL statement into a string array of tokens using spaces as delimiters.

The hasNext() method is used to determine whether there is the next participle. It determines whether there are remaining tokens by comparing the length of the index and tokens arrays.

The next() method is used to return the next token. It returns the element at index in the tokens array, and increments index by 1 to point to the next token.

Iterator is implemented because it is necessary to traverse the maintained String array for later parsing.

Then there is the parser class, which aggregates a Tokenizer reference.

public class Parser {

    Tokenizer tokenizer;
    DBCmd cmd;
    
    Map<String, DBCmd> cmdMap = new HashMap<>();
    public Parser(String sql) throws BadSqlGrammarException {
        //用查表法,代替一大堆的if else
        this.cmdMap.put("select", new SelectCmd());
        this.tokenizer = new Tokenizer(sql);

        //根据第一个sql关键字来确定是什么sql命令
        this.cmd = this.cmdMap.get(tokenizer.next());

        if (cmd == null)
            throw new BadSqlGrammarException("未识别的sql命令!");

    }


    public void query() throws BadSqlGrammarException {
        cmd.query(tokenizer);
    }

}

This is a Java class called Parser that parses SQL statements and executes corresponding database commands.

This class contains the following member variables and methods:

  1. tokenizer: A Tokenizer object, used to decompose the input SQL statement into multiple tokens (tokens).

  2. cmd: A DBCmd object representing the database command to be executed.

  3. cmdMap: A HashMap for mapping SQL keywords to corresponding DBCmd objects.

The constructor Parser(String sql) is used to initialize the Parser object. It accepts a SQL statement as a parameter and determines the database command to execute based on the first SQL keyword. Specific steps are as follows:

  1. Create a HashMap object cmdMap, and map the "select" keyword with the SelectCmd object.

  2. Create a Tokenizer object and pass the input SQL statement to it as a parameter.

  3. Use the next() method of Tokenizer to get the first keyword of the SQL statement, and get the corresponding DBCmd object through cmdMap.

  4. If cmd is null, it means an unrecognized SQL command and a BadSqlGrammarException is thrown.

The query() method is used to perform database query operations. It calls the query() method of the cmd object, passing it the Tokenizer object as a parameter.

In short, the Parser class maps SQL keywords and corresponding database commands through the table lookup method, and realizes the parsing and execution of SQL statements.

DBCmd is made into an abstract class, and different commands need to design a class separately to inherit it. DBCmd just get some common methods.

public abstract class DBCmd {

    public abstract void query(Tokenizer tokenizer) throws BadSqlGrammarException;

    protected String splitUntilEnd(Tokenizer tokenizer) throws BadSqlGrammarException {
        return splitUntil(tokenizer, ";");
    }


    protected String splitUntil(Tokenizer tokenizer, String until) throws BadSqlGrammarException {
        StringBuffer sb = new StringBuffer();

        boolean find = false;
        while (tokenizer.hasNext()) {
            String next = tokenizer.next();
            if (!next.equals(until)) {
                sb.append(next).append(" ");
                continue;
            } else {
                find = true;
                break;
            }
        }
        if (!find)
            throw new BadSqlGrammarException("语法不正确");
        return sb.toString();
    }
}

This code is a Java class called DBCmd. It is an abstract class, which means it cannot be instantiated directly, but can only be inherited by other classes.

This class has an abstract method query, which accepts a Tokenizer object as a parameter, and may throw a BadSqlGrammarException. This method needs to be implemented in subclasses.

There are also two protected methods in the class: splitUntilEnd and splitUntil. Both methods accept a Tokenizer object and a string as parameters, and may throw a BadSqlGrammarException.

The splitUntilEnd method calls the splitUntil method and passes in the delimiter ";". The function of the splitUntilEnd method is to read the strings in the Tokenizer object one by one until the separator ";" is encountered, and splice the read strings into a new string and return it.

The splitUntil method is a loop that continuously reads strings from the Tokenizer object. If the read string is not equal to the given delimiter until, the strings are spliced ​​into the StringBuffer object sb, and between each string Add a space in between. If the read string is equal to the delimiter until, the loop ends and the concatenated string is returned.

If the separator until is not found after the loop ends, a BadSqlGrammarException is thrown, indicating that the syntax is incorrect.

Overall, this code defines an abstract class DBCmd, which contains some methods for processing SQL statements. Subclasses can inherit this class and implement the query method to perform specific SQL query operations.

Only test the select syntax, so only write a select operation class.

public class SelectCmd extends DBCmd {

    @Override
    public void query(Tokenizer tokenizer) throws BadSqlGrammarException {
        String querys = splitUntil(tokenizer, "from");
        String tableName = tokenizer.next();
        String condition = null;
        if (tokenizer.hasNext() && tokenizer.next().equals("where")) {
            condition = splitUntilEnd(tokenizer);
        }
        System.out.println("查询字段:" + querys);
        System.out.println("查询表:" + tableName);
        System.out.println("查询条件:" + condition);

    }


}

Finally test it for a while:

public static void main(String[]args)throws BadSqlGrammarException{
        String sql="select  name,score,sex from    users where name = 'jack' and isdelete <> 1   ;  ";
        Parser parser=new Parser(sql);
        parser.query();

}

Effect:

Query field: name, score, sex Query table: users Query condition: name = 'jack' and isdelete <> 1

Guess you like

Origin blog.csdn.net/weixin_39570751/article/details/131568510