[MySQL] Detailed explanation of the principle of Select statement


For a development engineer, I think it is very necessary to understand how MySQL executes a query statement.

First of all, we need to understand what the architecture of MYSQL looks like? Then let's talk about the execution process of a query statement?

MYSQL architecture

First look at a structure diagram, as follows:

Detailed explanation of the module

  1. Connector: Used to support the interaction between various languages ​​and SQL, such as PHP, Python, Java's JDBC;

  2. Management Serveices & Utilities: System management and control tools, including backup and recovery, MySQL replication, clustering, etc.;

  3. Connection Pool: Connection pool, manage resources that need to be buffered, including user password permission threads, etc.;

  4. SQL Interface: Used to receive the user's SQL command and return the query result required by the user;

  5. Parser: used to parse the SQL statement;

  6. Optimizer: query optimizer;

  7. Cache and Buffer: Query cache, in addition to row record cache, there are also table cache, key cache, permission cache, etc.;

  8. Pluggable Storage Engines: Plug-in storage engine, which provides an API for the service layer to deal with specific files.

Architecture layering

Divide MySQL into three layers, the connection layer that interfaces with the client, the service layer that actually performs operations, and the storage engine layer that deals with hardware.

image-20211007102305222

connection layer

If our client wants to connect to the MySQL server port 3306, it must establish a connection with the server, then manage all connections, verify the identity and authority of the client, and these functions are completed at the connection layer.

service layer

The connection layer will hand over the SQL statement to the service layer, which also includes a series of processes:

For example, query the cache judgment, call the corresponding interface according to the SQL, and analyze the lexical and grammatical analysis of our SQL statement (such as how to identify keywords, how to identify aliases, whether there are grammar errors, etc.).

Then there is the optimizer. The bottom layer of MySQL will optimize our SQL statements according to certain rules, and finally hand them over to the executor for execution.

storage engine

The storage engine is where our data is actually stored, and different storage engines are supported in MySQL. Next is the memory or disk.

SQL execution flow

Taking a query statement as an example, let's see what the MySQL workflow looks like.

select name from user where id=1 and age>20; 

First of all, let's look at a picture, and the following process is based on this picture:

image-20211006202806875

connect

For a program or tool to operate a database, the first step is to establish a connection with the database.

There are two kinds of connections in the database:

  • Short connection: A short connection is to be closed immediately after the operation is completed.
  • Long connection: The long connection can be kept open, reducing the consumption of creating and releasing the connection on the server side, and this connection can also be used when the subsequent program accesses.

Establishing a connection is cumbersome. First, you need to send a request. After sending the request, you need to verify the account password. After the verification, you need to check the permissions you have. Therefore, try to use long connections during use.

Keeping persistent connections consumes memory. For long periods of inactive connections, the MySQL server will disconnect. You can use the sql statement to view the default time:

show global variables like 'wait_timeout';

This time is controlled by wait_timeout, the default is 28800 seconds, 8 hours.

query cache

MySQL comes with a cache module internally. After executing the same query, we found that the cache did not take effect, why? MySQL caching is disabled by default.

show variables like 'query_cache%';

Closed by default means that it is not recommended. Why does MySQL not recommend using its own cache?

The main reason is that the application scenarios of MySQL's built-in cache are limited:

The first one is that it requires that the SQL statements must be exactly the same, with a space in the middle, and different capital and lowercase letters are considered different SQL.

The second is that when any piece of data in the table changes, all caches in this table will be invalidated, so it is not suitable for applications with a large amount of data updates.

Therefore, it is more appropriate to hand over the cache to the ORM framework (for example, MyBatis enables the first-level cache by default), or an independent cache service, such as Redis.

In MySQL 8.0, the query cache has been removed.

Parsing and preprocessing

Why can a SQL statement be recognized? If you execute a string hello casually, the server reports a 1064 error:

[Err] 1064 - You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'hello' at line 1

This is the MySQL parser and preprocessing module.

The main thing to do in this step is to perform lexical and grammatical analysis and semantic analysis on the statement based on the SQL grammar.

lexical analysis

Lexical analysis is to break a complete SQL statement into individual words.

For example, a simple SQL statement:select name from user where id = 1 and age >20;

image-20211006224637475

It will selectrecognize that this is a query statement, and then it will useralso recognize that you want to make a query in this table, and then recognize wherethe following conditions, so I need to find these contents.

Gramma analysis

The syntax analysis will check the syntax of SQL, such as whether the single quotation marks are closed, and then generate a data structure according to the SQL statement according to the syntax rules defined by MySQL. We call this data structure a parse tree (select_lex).

For example, the grammar in English "I use is, you use are", if it is wrong, it is definitely not allowed. After grammatical analysis, it is found that your SQL statement does not conform to the rules, and you will receive You hava an error in your SQL syntaxan

preprocessor

If you write a SQL with correct lexical and syntax, but the table name or field does not exist, where will the error be reported? Is it in the execution layer of the database or the parser? for example:
select * from hello;

Still reporting an error during parsing, there is a preprocessor in the part of parsing SQL. It examines the resulting parse tree, resolving semantics that the parser cannot resolve. For example, it checks for the existence of table and column names, checks names and aliases, and ensures there is no ambiguity. A new parse tree is obtained after preprocessing.

query optimizer

Is there only one way to execute a SQL statement? Or is the SQL that the database finally executes the same as the SQL we sent?

The answer to this is no. A SQL statement can be executed in many ways, and finally returns the same result, they are equivalent. But if there are so many execution methods, how are these execution methods obtained? Which one to choose in the end? According to what criteria to choose?

This is the MySQL query optimizer module (Optimizer). The purpose of the query optimizer is to generate different execution plans (Execution Plan) according to the parse tree, and then select an optimal execution plan. MySQL uses a cost-based optimizer, which has the least overhead. Use whatever.

You can use this command to view the cost of the query:

show status like 'Last_query_cost';

What types of optimization can MySQL's optimizer handle?

Give two simple examples:

1. When we perform associated queries on multiple tables, which table's data is used as the reference table.

2. When there are multiple indexes available, which index to choose.

In fact, for every database, the optimizer module is essential, and they use complex algorithms to achieve the goal of optimizing query efficiency as much as possible. However, the optimizer is not omnipotent. Not even the most garbage SQL statements can be automatically optimized, nor can the optimal execution plan be selected every time. You should pay attention when writing SQL statements.

Implementation plan

The optimizer will eventually turn the parse tree into an execution plan (execution_plans), which is a data structure. Of course, this execution plan is not necessarily the optimal execution plan, because MySQL may not cover all execution plans.

How do we check the execution plan of MySQL? For example, when multiple tables are associated with queries, which table should be queried first? What indexes may be used when executing queries, and what indexes are actually used?

MySQL provides a tool for executing plans. We can see the execution plan information by adding EXPLAIN in front of the SQL statement.

EXPLAIN select name from user where id=1;

storage engine

Before introducing the storage engine, let’s ask two questions:

1. From a logical point of view, where is our data placed, or in what structure?

2. Where is the execution plan executed? Who will execute it?

Basic introduction to storage engines

In relational databases, data is stored in tables. We can understand this table as an Excel spreadsheet. Therefore, while our tables store data, we also need to organize the storage structure of the data. This storage structure is determined by our storage engine, so we can also call the storage engine a table type.

In MySQL, multiple storage engines are supported, and they can be replaced, so they are called plug-in storage engines. Why support so many storage engines? Isn't one enough?

In MySQL, each table can specify its storage engine, rather than a database can only use one storage engine. The storage engine is used in units of tables. Also, the storage engine can be modified after the table is created.

How to choose a storage engine?

  • If you have high requirements for data consistency and need transaction support, you can choose InnoDB.

  • If there are more data queries and less updates, and the query performance requirements are relatively high, you can choose MyISAM.

  • If you need a temporary table for query, you can choose Memory.

  • If all storage engines cannot meet your needs, and the technical ability is sufficient, you can develop a storage engine in C language according to the internal manual of the official website. (https://dev.mysql.com/doc/internals/en/custom-engine.html)

execution engine

Who uses the execution plan to operate the storage engine? This is the execution engine (executor), which uses the corresponding API provided by the storage engine to complete the operation.

Why do we modify the storage engine of the table, and the operation method does not need to be changed? Because the APIs implemented by storage engines with different functions are the same.

Finally, return the data to the client, even if there is no result.

for example

Still take the above sql statement as an example, and then sort out the entire sql execution process.

select name from user where id = 1 and age >20;
  1. Query whether the role of the current executor has permission through the connector, and perform the query. If there is, continue to go down, if not, it will be rejected, and an Access denied for usererror ;

  2. The next step is to query the cache, first check if there is any in the cache, if there is, then there is no need to go down, just return the result to the client directly; if there is no cache, then execute the parser and preprocessing module. (MySQL version 8.0 directly deletes the whole function of query cache)

  3. The syntax parser and preprocessing are mainly to analyze whether the lexical and grammatical syntax of the SQL statement is correct, and if there is no problem, it will proceed to the next step and come to the query optimizer;

  4. The query optimizer will optimize the sql statement to see which method is the most cost-effective, and which sql statement will be executed. The above sql has two optimization schemes:

    • First query the name of the person whose id is 1 in the table user, and then find the person whose age is older than 20.
    • First query all the people whose age is older than 20 in the table user, and then find the one whose id is 1.
  5. After the optimizer decides which solution to choose, the execution engine executes it. Then return the result to the client.

Guess you like

Origin blog.csdn.net/jiang_wang01/article/details/131269483