The specific process of MYSQL executing a SELECT statement

CSDN suddenly had a convulsion yesterday. I deleted the entire article with ctrl+z and couldn’t restore it. I just lost my mind and didn’t want to write it. But this part is really important, let’s write it out

flow chart 

There are a total of two layers of structure Server layer storage engine

  • The Server layer is responsible for establishing connections, analyzing and executing SQL . Most of MySQL's core functional modules are implemented here, mainly including connectors, query caches, parsers, preprocessors, optimizers, executors, etc. In addition, all built-in functions (such as date, time, math and encryption functions, etc.) and all cross-storage engine functions (such as stored procedures, triggers, views, etc.) are implemented at the Server layer.
  • The storage engine layer is responsible for data storage and retrieval . It supports multiple storage engines such as InnoDB, MyISAM, and Memory, and different storage engines share a Server layer. Now the most commonly used storage engine is InnoDB. Starting from MySQL 5.5, InnoDB has become the default storage engine of MySQL. The index data structure we often say is implemented by the storage engine layer, and the index types supported by different storage engines are different. For example, the index type supported by InnoDB is B+ tree, which is used by default, that is to say, it is created in the data table. The primary key index and secondary index of the default use B + tree index.

Step 1: Connector 

It is the process of connecting the user to MYSQL. There will be three handshakes and four waved hands, because the connection is transmitted based on the TCP protocol.

# -h 指定 MySQL 服务得 IP 地址,如果是连接本地的 MySQL服务,可以不用这个参数;
# -u 指定用户名,管理员角色名为 root;
# -p 指定密码,如果命令行中不填写密码(为了密码安全,建议不要在命令行写密码),就需要在交互对话里面输入密码
mysql -h$ip -u$user -p

 If the MySQL service is running normally, after the establishment of the TCP connection is completed, the connector will start to verify your user name and password. If the user name or password is incorrect, you will receive an "Access denied for user" error, and then the client program ends implement.

If there is no problem with the user's password, the connector will obtain the user's permissions and save them. Any subsequent operations of the user in this connection will be judged based on the permissions read at the beginning of the connection.

Therefore, if a user has already established a connection, even if the administrator modifies the user's permissions midway, it will not affect the permissions of the existing connection. After the modification is completed, only new connections will use the new permission settings.

able to pass

 SHOW PROCESSLIST;

Check how many users are connected to MYSQL

For example, as shown in the above figure, there are two users whose user name is root connected to the MySQL service, and the status of the Command column of the user whose id is 6 is , which means that the user has not executed  Sleep any commands after connecting to the MySQL service. , which means that this is an idle connection, and the idle time is 736 seconds (Time column).

Will the idle connection be occupied all the time?

Of course not, MySQL defines the maximum idle time of idle connections,  wait_timeout controlled by parameters, the default value is 8 hours (28880 seconds), if the idle connection exceeds this time, the connector will automatically disconnect it.

mysql> show variables like 'wait_timeout';
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| wait_timeout  | 28800 |
+---------------+-------+
1 row in set (0.00 sec)

Of course, we can also manually disconnect the idle connection, using the kill connection + id command.

The difference between long links and short links

短连接
连接 mysql 服务(TCP 三次握手)
执行sql
断开 mysql 服务(TCP 四次挥手)

// 长连接
连接 mysql 服务(TCP 三次握手)
执行sql
执行sql
执行sql
....
断开 mysql 服务(TCP 四次挥手)

So there is only one SQL statement for a short connection and multiple SQL statements for a long connection

For the same task, we use long connections to save multiple connections and disconnection from MYSQL than short connections.

But is there no shortcoming of long connection? No

MySQL temporarily uses memory to manage connection objects during query execution, and these connection object resources will only be released when the connection is disconnected. If there are a lot of long connections, the MySQL service will take up too much memory and may be forcibly killed by the system, which will cause the MySQL service to restart abnormally.

how do we solve this problem

Don't you just disconnect?

1. Periodic manual disconnection

2. The client actively resets the connection . MySQL version 5.7 implements  mysql_reset_connection() the function interface. Note that this is an interface function rather than a command. Then when the client performs a large operation, call the mysql_reset_connection function in the code to reset the connection and release the memory. This process does not require reconnection and re-authentication, but it will restore the connection to the state just created.

Every time I learn the underlying principles of XX, I feel that the underlying operation is carrying the weight for me.

Step 2: Query the cache 

After the connection work is over, the client can send SQL statements to MYSQL. It will first parse the first field of the statement to see what statement it is. If it is a SELECT statement, it will go to the query cache (Query Cache ) to look up the cached data to see if this command has been executed before. The query cache is stored in the memory in the form of key-value. The key is the SQL query statement, and the value is the query result of the SQL statement.

If the query statement hits the query cache, the value will be returned directly to the client. If the query statement does not hit the query cache, it will continue to execute. After execution, the query result will be stored in the query cache.

Looking at it this way, it’s awesome. Isn’t this query cache like a dp array that can be optimized and run in large quantities?

In fact, it is not because the cache of this table will be cleared every time the data is updated for this table.... Ah, that is to say, if the updated data cannot be found, it can only be queried once without updating data 

?Then why do I check it? I just take out the last query result and use it.

Very good, MYSQL also thinks so, so MYSQL8.0 directly deletes this thing

Step 3: Parse SQL

The first thing: lexical analysis, MYSQL will analyze the key words in your SQL statement and then form a SQL syntax tree so that later modules can obtain SQL types, table names, field names, where conditions, etc. (Give MYSQL a long and difficult sentence analysis)

The second thing: syntactic analysis is to see if there are any grammatical errors

The grammatical error is bolded here. Fields, tables, and data do not exist, so it is not a grammatical error.

Just like metadata verification bytecode verification in JVM

Step 4: Execute SQL

Corresponding to the four blocks in the figure

  • The prepare stage, that is, the preprocessing stage;
  • The optimize stage is the optimization stage;
  • The execute phase, that is, the execution phase;

preprocessing stage 

 That must be in the preprocessor. As far as SELECT is concerned, preprocessing does two things

1. Check whether the fields, tables and other data exist

2. Replace * with all columns

optimization stage

Corresponding, of course, is the optimizer

We know that it is very troublesome when MYSQL executes a statement. It is necessary to judge the index and the length. This step is to determine a specific execution plan.

We can add a command at the front of the query statement  explain , so that the execution plan of this SQL statement will be output, and then the key in the execution plan indicates which index is used during execution. If the key parameter is null, it means that it is not used. Index means that the most inefficient full table query is used

Specifically, what solution will the optimizer choose?

SELECT id FROM product WHERE id > 1  AND name LIKE 'i%';

The result of this query statement can use both the primary key index and the ordinary index, but the execution efficiency will be different. At this time, the optimizer is required to decide which index to use.

The covering index is used here, that is to say , the information on these two indexes is enough to meet the query request when using these two indexes at the same time , and there is no need to go back to the primary key index to fetch data. ( The cost of querying the B+ tree of the primary key index will be higher than the cost of querying the B+ of the secondary index. Based on the consideration of the query cost, the optimizer will choose the ordinary index with low query cost. )

What is the process? First, find the matching data in the secondary index (name), then the data stored in the secondary index is the primary key id, and then record the data with id>1, so there is no need to go to the primary key index, and the result will be displayed directly.

execution phase 

Execution phase, that must be the executor

After going through the optimizer, we have got the execution plan, now is the time to formally execute the SQL

During the execution process, the executor will interact with the storage engine, and the interaction is in units of records.

There are three execution methods in total:

  • primary key index query
  • full table scan
  • index pushdown

primary key index query

From the perspective of language analysis, he must be querying through the primary key index, right(?)

Let's look at this line of code

select * from product where id = 1;

 First of all, id is the primary key, it must be 1, and this filter condition is equivalent, so it will only be queried once

The executor queries for the first time and calls the function pointed to by the read_first_record function pointer

This function pointer is pointed to the interface of the InnoDB engine index query (the optimizer parameter is const, and the storage engine can choose the corresponding execution mode only with this parameter), and the condition is given to the storage engine, so that the storage engine locates the first record that meets  id = 1 the condition . Then the storage engine finds the data with id 1 through the primary key index (b+ tree)

Judgment 1: If the record does not exist, an error that the record cannot be found will be reported to the executor, and then the query ends. If the record exists, it returns the record to the executor;

Then if it exists, judge 2

Check whether the filter condition is met. If it is met, it will be sent to the client. If it is not, the record will be skipped.

The query process of the executor is a while loop, so it will be checked again, but this time because it is not the first query, so this time it will call the function pointed to by the read_record function pointer (different from the above read_first_record), because the optimizer The selected access type is const (the parameter mentioned above appears again), and this function pointer is pointed to as a function that always returns -1, so when this function is called, the executor exits the loop, that is, ends the query

full table scan

select * from product where name = 'iphone';

The first step remains unchanged, or call the function pointed to by the read_first_record function pointer, but this time because it is a full table scan (the optimizer parameter is all)

Step 2 Start reading from the first record to see if it satisfies the WHERE condition (name = iPhone) If not, skip and send this record to the client

The query process of the executor is a while loop, so it will be checked again, and the function pointed to by the read_record function pointer will be called, because the access type selected by the optimizer is all, and the read_record function pointer still points to the interface of the full scan of the InnoDB engine, so Then ask the storage engine layer to continue reading the next record of the previous record. After the storage engine fetches the next record, it returns it to the executor (Server layer), and the executor continues to judge the conditions. If it does not meet the query conditions, it skips Otherwise, send the record to the client; repeat the above process until the storage engine reads all the records in the table, and then returns the read information to the executor (Server layer); the executor receives the report from the storage engine After querying the information, exit the loop and stop querying. 

index pushdown

It is a query optimization strategy introduced by MySQL 5.6

select * from t_user  where age > 20 and reward = 100000

Suppose we create indexes for age and reward

When the joint index encounters a range query (>, <), it will stop matching, that is, the  age field can use the joint index, but the reward field cannot use the index (just remember it, and the invalid part of the index may be updated later)

If not using index pushdown

The first step is still: the server layer calls the storage engine to find the first record that satisfies age>20

Step 2: The storage engine quickly locates this record according to the B+ tree of the secondary index, obtains the primary key value, and then returns the complete record to the server layer.

third step:

The server then judges whether the reward is equal to 100000. If it meets the requirement, return it to the user. If it does not meet the requirement, ignore it

Step 4: Then take the next record (no need to locate the records again, the records are connected by a linked list) After the secondary index locates the record, the storage engine obtains the primary key value, and then returns to the table to get all the record data to judge whether conform to

Then loop in three or four steps until the records of age>20 are queried

When there is no index pushdown, every time a secondary index record is queried, it must be returned to the table, and then the record is returned to the server, and then the server judges whether the reward of the record is equal to 100000

What about when there is an index pushdown?

The server layer first calls the interface of the storage engine to locate the first secondary index record that meets the query conditions, that is, locates the first record with age > 20;

After the storage engine locates the secondary index, it does not execute the table return operation, but first judges whether the condition (whether the reward is equal to 100000) of the column (reward column) contained in the index is true. If the condition is not true, skip the secondary index directly. If it is established, execute the operation of returning to the table, and return the completed record to the Server layer.

The server layer is judging whether other query conditions (this query has no other conditions) are true (why only the reward condition can be judged by the storage engine? Because although it does not use the joint index, there are still some indexed data in the joint index. It is very convenient to find the reward by id (one-to-one correspondence)), if it is established, it will be sent to the client; otherwise, skip the record, and then ask the storage engine for the next record. This goes on and on until the storage engine has read all the records in the table.

It is equivalent to outsourcing the matter of judging whether the reward is equal to 100,000 to the storage engine

Index pushdown can reduce the table return operation of the secondary index during query and improve the efficiency of query, because it hands over the part of the server layer to the storage engine layer for processing

Guess you like

Origin blog.csdn.net/chara9885/article/details/131521230