The execution flow of a SQL query

We usually use a database, and what we see is usually a whole. For example, if you have the simplest table with only one ID field, when you execute the following query:


mysql> select * from T where ID=10;

What we see is just input a statement and return a result, but we don't know the execution process of this statement in MySQL.

So today I want to disassemble MySQL with you to see what "parts" are in it. I hope this disassembly process will give you a deeper understanding of MySQL.

In this way, when we encounter some abnormalities or problems of MySQL, we can directly poke the essence, locate and solve the problem more quickly. Below I give a schematic diagram of the basic structure of MySQL, from which you can clearly see the execution process of SQL statements in various functional modules of MySQL.

Generally speaking, MySQL can be divided into two parts: Server layer and storage engine layer.

 

The Server layer includes connectors, query caches, analyzers, optimizers, executors, etc., covering most of the core service functions of MySQL, as well as all built-in functions (such as date, time, math and encryption functions, etc.), all across storage engines The functions are implemented in this layer, such as stored procedures, triggers, views, and so on. The storage engine layer is responsible for data storage and extraction.

Its architecture model is plug-in, supporting multiple storage engines such as InnoDB, MyISAM, and Memory. The most commonly used storage engine is InnoDB, which has become the default storage engine since MySQL 5.5.5. In other words, when you execute create table to build a table, if you do not specify the engine type, InnoDB is used by default. However, you can also select other engines by specifying the type of storage engine. For example, use engine=memory in the create table statement to specify the memory engine to create the table.

Different storage engines have different table data access methods and different supported functions. In a later article, we will discuss the choice of engine. It is not difficult to see from the figure that different storage engines share a Server layer, which is the part from the connector to the actuator. You can first have an impression of the name of each component, and then I will combine the SQL statement mentioned at the beginning to walk you through the entire execution process and look at the role of each component in turn.

 

Connector

In the first step, you will connect to this database first. At this time, the connector will receive you. The connector is responsible for establishing a connection with the client, obtaining permissions, maintaining and managing the connection. The connection command is generally written like this:


mysql -h$ip -P$port -u$user -p

After entering the command, you need to enter the password in the interactive dialogue. Although the password can also be written in the command line directly after -p, this may cause your password to be leaked.

  • If you are connected to a production server, it is strongly recommended that you do not do this. The mysql in the connection command is a client tool used to establish a connection with the server. After completing the classic TCP handshake, the connector will begin to authenticate your identity. At this time, the user name and password you entered is used. If the username or password is incorrect, you will receive an "Access denied for user" error, and then the client program will end execution.
  • If the user name and password are authenticated, the connector will go to the permission table to find out the permissions you have. After that, the permission judgment logic in this connection will all depend on the permission read at this time. This means that after a user successfully establishes a connection, even if you use the administrator account to modify the user's permissions, it will not affect the permissions of the existing connections.

. After the modification is completed, only new connections will use the new permission settings. After the connection is completed, if you have no follow-up actions, the connection is in an idle state, and you can see it in the show processlist command. The figure in the text is the result of show processlist. The Command column shows the line of "Sleep", which means that there is an idle connection in the system.

 

If the client does not move for too long, the connector will automatically disconnect it.

This time is controlled by the parameter wait_timeout, and the default value is 8 hours. If the client sends a request again after the connection is disconnected, it will receive an error alert: Lost connection to MySQL server during query. If you want to continue at this time, you need to reconnect and then execute the request. In the database, a long connection means that after a successful connection, if the client continues to request, the same connection will always be used.

A short connection means that the connection is disconnected after a few queries are executed each time, and one is re-established for the next query. The process of establishing a connection is usually more complicated, so I suggest that you try to minimize the action of establishing a connection in use, that is, use a long connection as much as possible. But after all the long connections are used, you may find that sometimes the memory occupied by MySQL increases very quickly. This is because the memory temporarily used by MySQL during execution is managed in the connection object. These resources will be released when the connection is disconnected.

Therefore, if long connections accumulate, it may cause too much memory usage and be forcibly killed by the system (OOM). From the perspective of the phenomenon, MySQL restarts abnormally. How to solve this problem? You can consider the following two options. Disconnect long connections regularly. After using it for a period of time, or after the program judges that a large query that takes up memory is executed, the connection is disconnected, and then the query must be reconnected.

If you are using MySQL 5.7 or later, you can reinitialize the connection resources by executing mysql_reset_connection after performing a relatively large operation each time. This process does not need to reconnect and re-authenticate, but it will restore the connection to the state when it was just created.

Query cache

After the connection is established, you can execute the select statement. The execution logic will come to the second step:

Query the cache.

After MySQL receives a query request, it will first go to the query cache to see if this statement has been executed before. The previously executed statements and their results may be directly cached in memory in the form of key-value pairs. The key is the query statement, and the value is the query result.

If your query can directly find the key in this cache, then the value will be directly returned to the client. If the statement is not in the query cache, the subsequent execution phase will continue. After the execution is complete, the execution result will be stored in the query cache.

As you can see, if the query hits the cache, MySQL does not need to perform the following complex operations and can return the results directly, which is very efficient.

 

But in most cases I would advise you not to use query caching. Why? Because query caching often does more harm than good.

Query cache invalidation is very frequent. As long as there is an update to a table, all query caches on this table will be emptied.

So it is very likely that you are struggling to save the results, and they are all emptied by an update before they are used. For a database with high update pressure, the hit rate of the query cache will be very low. Unless your business is to have a static table, it will only be updated once for a long time. For example, for a system configuration table, the queries on this table are suitable for query caching.

Fortunately, MySQL also provides this "on-demand use" approach. You can set the parameter query_cache_type to DEMAND, so that the query cache is not used for the default SQL statement. And for the statement you are sure to use the query cache, you can use SQL_CACHE to explicitly specify, like the following statement:

 


mysql> select SQL_CACHE * from T where ID=10;

 

It should be noted that MySQL 8.0 directly deletes the entire function of the query cache, which means that this function is completely absent from 8.0.

 

Analyzer

If there is no hit to the query cache, the statement will actually be executed.

First of all, MySQL needs to know what you are going to do, so it needs to parse the SQL statement. The analyzer will do "lexical analysis" first. What you entered is an SQL statement composed of multiple strings and spaces. MySQL needs to identify what the strings are and what they represent. MySQL recognizes from the keyword "select" you entered, which is a query statement.

It also recognizes the string "T" as "table name T" and the string "ID" as "column ID".

After these recognitions are done, "grammatical analysis" is required. According to the results of lexical analysis, the grammar analyzer will determine whether the SQL statement you entered meets the MySQL grammar according to the grammatical rules. If your sentence is wrong, you will receive an error reminder of "You have an error in your SQL syntax". For example, the following sentence select misses the initial letter "s".


mysql> elect * from t where ID=1;

ERROR 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'elect * from t where ID=1' at line 1

General grammatical errors will prompt the first place where the error occurred, so what you need to pay attention to is the content immediately following "use near".

 

Optimizer

After the analyzer, MySQL knows what you are going to do.

Before starting execution, it must be processed by the optimizer. The optimizer decides which index to use when there are multiple indexes in a table; or when a statement has multiple table associations (join), it decides the connection order of each table. For example, if you execute the following statement, this statement is to perform a join of two tables:

 


mysql> select * from t1 join t2 using(ID)  where t1.c=10 and t2.d=20;

 

  • You can first get the ID value of the record with c=10 from the table t1, and then associate it to the table t2 according to the ID value, and then judge whether the value of d in t2 is equal to 20.
  • You can also first retrieve the ID value of the record with d=20 from the table t2, and then associate it with t1 according to the ID value, and then judge whether the value of c in t1 is equal to 10. The logical results of these two execution methods are the same, but the execution efficiency will be different, and the role of the optimizer is to decide which scheme to use. After the optimizer stage is completed, the execution plan of this statement is determined, and then enters the executor stage.

If you still have some questions, such as how the optimizer chooses the index, is it possible to choose the wrong one, etc., it doesn't matter, I will explain the content of the optimizer separately in a later article. Actuator

Actuator

MySQL knows what you want to do through the analyzer, and knows what to do through the optimizer, so it enters the executor stage and starts to execute the statement.

When starting to execute, you must first determine whether you have the permission to execute the query on this table T. If not, it will return an error of no permission, as shown below (in engineering implementation, if the query cache is hit, it will be in the query cache When the results are returned, permission verification is performed. The query will also call precheck to verify permissions before the optimizer).


mysql> select * from T where ID=10;

ERROR 1142 (42000): SELECT command denied to user 'b'@'localhost' for table 'T'

If you have permission, open the table to continue execution. When the table is opened, the executor will use the interface provided by the engine according to the engine definition of the table.

 

For example, in the table T in our example, the ID field is not indexed, so the execution flow of the executor is as follows:

Call the InnoDB engine interface to fetch the first row of this table, determine whether the ID value is 10, skip it if it is not, and store this row in the result set if it is;

Call the engine interface to get the "next row" and repeat the same judgment logic until the last row of this table is obtained.

The executor returns the record set composed of all the rows that meet the conditions during the traversal process as the result set to the client. At this point, this statement is executed.

For indexed tables, the execution logic is similar. The first call is the interface of "fetch the first row that meets the condition", and then loop to get the interface of the "next row that meets the condition". These interfaces are all defined in the engine. You will see a rows_examined field in the slow query log of the database, indicating how many rows were scanned during the execution of this statement. This value is accumulated every time the executor calls the engine to get a data row. In some scenarios, the executor is called once and multiple rows are scanned inside the engine, so the number of rows scanned by the engine is not exactly the same as rows_examined.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Guess you like

Origin blog.csdn.net/m0_46405589/article/details/115185058