Postgresql kernel source code analysis syntax analysis gram.y

 


Table of contents

foreword

overview

Process introduction

call process

Grammatical analysis in detail

The main process of raw_parser

Instantiation of Lexical and Syntax Analyzers

Use with the lexer

The process of scanner_init

Use with parsers

Process of parser_init

base_yyparse execution

scanner_finish cleanup

end


foreword

This article is based on the analysis and interpretation of the postgresql 15 code, and the demonstration is carried out on the centos8 system.


overview

Perform lexical and grammatical analysis through flex/bison to generate a grammatical tree.

Of course, the pure-parse mode is used here, that is, the input of flex is the specified str, and the additional data and scanning state structure of each scanner need to be initialized;

Also in bison, it is necessary to initialize the corresponding lexical scanner state structure and the structure of additional data.

In this way, the SQL string input by the client can be input to the lexical analyzer, and the output of the lexical analyzer corresponds to the syntax analyzer, and the syntax analyzer stores the parsed content in the syntax tree.

Of course, another advantage of this is that the lexical-syntax analyzer can be instantiated, that is, the SQL input of multiple clients can be processed concurrently, multiple lexical-syntax analyzes are enabled to analyze the input, and multiple syntax trees are generated. This is the database concurrent multi-client.

Process introduction

Code location:

src/backend/parser/parser.c

In the raw_parser function, the lexical and grammatical analyzers are initialized, and the SQL string is input and the syntax tree is output.

call process

exec_simple_query

-> pg_parse_query

-> raw_parser

These two macros can be defined in pg_parse_query to open the debugging code.

COPY_PARSE_PLAN_TREES

WRITE_READ_PARSE_PLAN_TREES

In pg_parse_query, raw_parser is just wrapped and parse_tree_list is returned;

Then exec_simple_query continues to analyze, generate a parse tree, and then generate a plan tree until execution;

Grammatical analysis in detail

  • The main process of raw_parser

(1) Initialize the lexical analyzer; the SQL string received by the client is used as the input of the lexical analyzer;

(2) Initialize the parser;

(3) Start grammatical analysis, call base_yyparse to parse the grammar, of course its input is the lexical analyzer context yyscanner;

(4) Return the syntax tree;

  • Instantiation of Lexical and Syntax Analyzers

In my blog " Introduction to flex/bison in postgresql lexical/grammar (scanner/parser) "

The mechanism and simple usage of flex/bison have been introduced in the article.

Postgresql is a database that can be accessed concurrently, so there will be multiple input SQLs to be parsed at the same time. Of course, we cannot use a set of lexical/syntax parsers, which will become a performance bottleneck.

How to do it?

That is, the main process in raw_parser just seen, each concurrently initializes the lexical/syntactic analyzer dynamically, and the input of the lexical analyzer is no longer the standard input, but directly specifies the string.

For the corresponding parser, the input becomes the lexical analyzer just initialized.

Such multiple concurrent, independent analysis.

  • Use with the lexer

Initialization of the lexical analyzer

core_yyscan_t

scanner_init(const char *str,

                         core_yy_extra_type *yyext,

                          const ScanKeywordList *keywordlist,

                          const uint16 *keyword_tokens)

it is defined in

src/backend/parser/scan.l

Generated by flex after compiling

src/backend/parser.scan.c

  • The process of scanner_init

(1) Call yylex_init to initialize the lexical analyzer context scan

(2) Bind the user data to the yyextra of the lexical analyzer; the user data here means that when the lexical analyzer returns the identifier, some data needs to be recorded at the same time, and these data are recorded in yyextra.

(3) Initialize the user data structure, and copy the SQL string to the user data at the same time;

(4) Call yy_scan_buffer to set the input of the lexical analyzer to the SQL character buffer in the user data;

(5) Finally, initialize the buffer used for parsing;

(6) Return the scanner initialized above;

  • Use with parsers

Initialization of the parser

void

parser_init(base_yy_extra_type *yyext)

execution of parsing

int        base_yyparse(core_yyscan_t yyscanner);

Cleanup of dynamically allocated space

void

scanner_finish(core_yyscan_t yyscanner)

defined in

src/backend/parser/gram.y,

Generated after bison compilation

src/backend/parser/gram.c

  • Process of parser_init

Here the syntax tree is just initialized to NULL; the user data is not used, why?

Because the syntax parse tree is also stored in the user data section. In gram.y when the grammar is defined, the type of the corresponding structure is bound to the grammar expression, and the corresponding data is stored in the grammar tree, so the grammar parser just parses and it is OK.

  • base_yyparse execution

The execution of grammatical analysis actually drives lexical analysis and grammatical analysis at the same time. Every time lexical analysis returns a token, grammatical analysis performs expression rule matching.

When there is a matching expression, the corresponding rule is executed and the data is stored in the syntax tree node.

For specific rules, you can see the blog post of the current column

In my blog " Introduction to flex/bison in postgresql lexical/grammar (scanner/parser) "

  • scanner_finish cleanup

Only the SQL and analysis buffer are cleaned up here, but the lexical analyzer scanner is not cleaned up.


end

Author email: [email protected]
If there are any mistakes or omissions, please point them out and learn from each other.

Note: Do not reprint without consent!

Guess you like

Origin blog.csdn.net/senllang/article/details/130910592