- Column content: postgresql kernel source code analysis
- Personal homepage: My homepage
- Motto: Tian Xingjian, the gentleman strives for self-improvement;
Table of contents
Grammatical analysis in detail
The main process of raw_parser
Instantiation of Lexical and Syntax Analyzers
foreword
This article is based on the analysis and interpretation of the postgresql 15 code, and the demonstration is carried out on the centos8 system.
overview
Perform lexical and grammatical analysis through flex/bison to generate a grammatical tree.
Of course, the pure-parse mode is used here, that is, the input of flex is the specified str, and the additional data and scanning state structure of each scanner need to be initialized;
Also in bison, it is necessary to initialize the corresponding lexical scanner state structure and the structure of additional data.
In this way, the SQL string input by the client can be input to the lexical analyzer, and the output of the lexical analyzer corresponds to the syntax analyzer, and the syntax analyzer stores the parsed content in the syntax tree.
Of course, another advantage of this is that the lexical-syntax analyzer can be instantiated, that is, the SQL input of multiple clients can be processed concurrently, multiple lexical-syntax analyzes are enabled to analyze the input, and multiple syntax trees are generated. This is the database concurrent multi-client.
Process introduction
Code location:
src/backend/parser/parser.c
In the raw_parser function, the lexical and grammatical analyzers are initialized, and the SQL string is input and the syntax tree is output.
call process
exec_simple_query
-> pg_parse_query
-> raw_parser
These two macros can be defined in pg_parse_query to open the debugging code.
COPY_PARSE_PLAN_TREES
WRITE_READ_PARSE_PLAN_TREES
In pg_parse_query, raw_parser is just wrapped and parse_tree_list is returned;
Then exec_simple_query continues to analyze, generate a parse tree, and then generate a plan tree until execution;
Grammatical analysis in detail
-
The main process of raw_parser
(1) Initialize the lexical analyzer; the SQL string received by the client is used as the input of the lexical analyzer;
(2) Initialize the parser;
(3) Start grammatical analysis, call base_yyparse to parse the grammar, of course its input is the lexical analyzer context yyscanner;
(4) Return the syntax tree;
-
Instantiation of Lexical and Syntax Analyzers
In my blog " Introduction to flex/bison in postgresql lexical/grammar (scanner/parser) "
The mechanism and simple usage of flex/bison have been introduced in the article.
Postgresql is a database that can be accessed concurrently, so there will be multiple input SQLs to be parsed at the same time. Of course, we cannot use a set of lexical/syntax parsers, which will become a performance bottleneck.
How to do it?
That is, the main process in raw_parser just seen, each concurrently initializes the lexical/syntactic analyzer dynamically, and the input of the lexical analyzer is no longer the standard input, but directly specifies the string.
For the corresponding parser, the input becomes the lexical analyzer just initialized.
Such multiple concurrent, independent analysis.
-
Use with the lexer
Initialization of the lexical analyzer
core_yyscan_t
scanner_init(const char *str,
core_yy_extra_type *yyext,
const ScanKeywordList *keywordlist,
const uint16 *keyword_tokens)
it is defined in
src/backend/parser/scan.l
Generated by flex after compiling
src/backend/parser.scan.c
-
The process of scanner_init
(1) Call yylex_init to initialize the lexical analyzer context scan
(2) Bind the user data to the yyextra of the lexical analyzer; the user data here means that when the lexical analyzer returns the identifier, some data needs to be recorded at the same time, and these data are recorded in yyextra.
(3) Initialize the user data structure, and copy the SQL string to the user data at the same time;
(4) Call yy_scan_buffer to set the input of the lexical analyzer to the SQL character buffer in the user data;
(5) Finally, initialize the buffer used for parsing;
(6) Return the scanner initialized above;
-
Use with parsers
Initialization of the parser
void
parser_init(base_yy_extra_type *yyext)
execution of parsing
int base_yyparse(core_yyscan_t yyscanner);
Cleanup of dynamically allocated space
void
scanner_finish(core_yyscan_t yyscanner)
defined in
src/backend/parser/gram.y,
Generated after bison compilation
src/backend/parser/gram.c
-
Process of parser_init
Here the syntax tree is just initialized to NULL; the user data is not used, why?
Because the syntax parse tree is also stored in the user data section. In gram.y when the grammar is defined, the type of the corresponding structure is bound to the grammar expression, and the corresponding data is stored in the grammar tree, so the grammar parser just parses and it is OK.
-
base_yyparse execution
The execution of grammatical analysis actually drives lexical analysis and grammatical analysis at the same time. Every time lexical analysis returns a token, grammatical analysis performs expression rule matching.
When there is a matching expression, the corresponding rule is executed and the data is stored in the syntax tree node.
For specific rules, you can see the blog post of the current column
In my blog " Introduction to flex/bison in postgresql lexical/grammar (scanner/parser) "
-
scanner_finish cleanup
Only the SQL and analysis buffer are cleaned up here, but the lexical analyzer scanner is not cleaned up.
end
Author email: [email protected]
If there are any mistakes or omissions, please point them out and learn from each other.
Note: Do not reprint without consent!