MySQL's in-depth analysis of the execution process of a SQL

Preface

When dealing with the database every day, you can write dozens of SQL statements a day, but do you know how the system interacts with the database? How does MySQL store data and how does it manage transactions? Do you feel that your basic mind is blank except for writing a few "select * from dual"?
Now will take you into the world of MySQL, let you thoroughly understand how the system interacts with MySQL, and what does MySQL do when it receives the SQL statement sent?

SQL execution process

One, MySQL driver

When the system communicates with the MySQL database, it is impossible to receive and send requests for no reason. Even if you don’t do anything, there should always be other "people" doing something for us. Basically The programmers who have used MySQL database will know the concept of MySQL driver more or less. It is this MySQL driver that helps us connect to the database at the bottom level. Only after the connection is established can we have subsequent interactions. As shown below:

Insert picture description here

In this case, before the system interacts with MySQL, the MySQL driver will help us establish a connection, and then we only need to send SQL statements to execute CRUD.
One SQL request will establish a connection, multiple requests will establish multiple connections, then the problem is here, our system is definitely not used by one person, in other words there must be multiple requests to compete for the connection at the same time .
Our web system is generally deployed in the tomcat container, and tomcat can process multiple requests concurrently, which will cause multiple requests to establish multiple connections, and then close them after use. What's the problem? As shown below:

Insert picture description here

The java system is based on the TCP/IP protocol when connecting to the MySQL database through the MySQL driver, so if each request is to create a new connection and destroy a connection, it will inevitably cause unnecessary waste and performance degradation, that is to say It is obviously unreasonable to frequently create and destroy connections during the above multi-threaded request. It will inevitably greatly reduce the performance of our system, but if we provide some fixed threads for connection, is it unnecessary to repeatedly create and destroy connections? I believe that knowledgeable friends will smile knowingly, yes, it is the database connection pool.
Database connection pool, that is, maintain a certain number of connections, which is convenient for the system to obtain connections. You can get them in the pool when you use them. You can put them back when you use them. You don’t need to care about the creation and destruction of connections, and you don’t need to care about how the thread pool is maintained. These are connected.

Insert picture description here

Common database connection pools include Druid, C3P0, and DBCP. The implementation principles of connection pools will not be discussed in depth here. The use of connection pools greatly saves the overhead of creating and destroying threads. This is the well-known "pooling" idea, whether it is Thread pool or HTTP connection pool, you can see it.

Second, the database connection pool

So far, what we already know is that when our system accesses the MySQL database, the connection established is not created every time a request is made, but is obtained from the database connection pool, which solves the problem of repeated creation and The performance loss caused by destroying the connection is a problem. But there is a small problem here. The business system is concurrent, but is there only one thread that MySQL accepts requests?
In fact, such a pool is also provided in the MySQL architecture system, which is also a database connection pool. Both parties manage each connection through the database connection pool, so that on the one hand, threads do not need to compete for connections before, and more importantly, do not need to repeatedly create and destroy connections.

Insert picture description here

So far, the connection problem between the system and the MySQL database has been explained clearly. So how are these connections in the MySQL database handled, and who handles them?

Third, the network connection must be handled by threads

Students who have a little understanding of computing basics know that all connections in the network are handled by threads. The so-called network connection is simply a request, and each request will be processed by a corresponding thread. In other words, requests for SQL statements are handled by individual threads in MySQL.

Insert picture description here

How will these threads handle these requests? What will you do?

Four, SQL interface

After obtaining the request, the thread processing the request in MySQL obtains the SQL statement to the SQL interface for processing.

Five, query parser

For example, now there is a SQL like this:

	SELECT stuName,age,sex FROM students WHERE id=1

This SQL is written for our programmers. Where does the machine know what you are talking about? At this time the parser is on the scene. It parses the SQL statements passed by the SQL interface and translates them into a language that MySQL can recognize. As for how to parse it, it doesn't need to be studied, it's nothing more than its own set of related rules.

Insert picture description here

After the process in the figure above, SQL has now been parsed into what MySQL recognizes. Isn’t the next step to be execution? In theory, it is like this, but the power of MySQL is far more than that. It will also help us choose the optimal query path (what is the optimal query path? That is, MySQL will execute the query in the most efficient way it thinks) .
How exactly did it do it? This is about MySQL's query optimizer.

Six, MySQL query optimizer

We don’t need to care about the specific implementation of the query optimizer. What we need to know is that MySQL will help me use the best way he thinks to optimize this SQL statement, and generate an execution plan. For example, you create For multiple indexes, MySQL will choose the corresponding index based on the principle of least cost. The cost here mainly includes two aspects, IO cost and CPU cost:
- IO cost: The cost of loading data from disk to memory. By default, the IO cost of reading data pages is 1. MySQL reads data in the form of pages, that is, when a certain data is used, it does not Only this data will be read, and the adjacent data of this data will also be read into the memory. This is the well-known principle of program locality, so MySQL will read a whole page each time, and the cost of a page is 1. So the cost of IO is mainly related to the size of the page;
- CPU cost: After reading the data into the memory, it is necessary to check whether the data meets the conditions and the cost of CPU operations such as sorting. Obviously it is related to the number of rows. By default, the cost of detecting records is 0.2;
The MySQL optimizer will calculate the index with the smallest "IO cost + CPU" cost for execution;

Insert picture description here

After the optimizer performs steps such as selecting the optimal index, it will call the storage engine interface and start to execute the SQL statement that has been parsed and optimized by MySQL.

Seven, storage engine

The query optimizer will call the interface of the storage engine to execute SQL, which means that the actual execution of SQL is done in the storage engine. Data is stored in memory or disk (the storage engine is a very important component, which will be described in detail below).

8. Actuator

The executor is a very important component, because the operations of the previous components must eventually be executed through the executor to call the storage engine interface. The executor finally calls the interface of the storage engine to complete the SQL execution according to a series of execution plans, as follows:

Insert picture description here

Storage engine

To illustrate with an updated SQL statement, the SQL is as follows:

	UPDATE students SET stuName = '小强' WHERE id = 1

When the system issues such a query to MySQL, MySQL will finally call the storage engine through the executor to execute it in accordance with the series of processes described above. When this SQL is executed, the data corresponding to the SQL statement is either in the memory or on the disk. If you operate directly on the disk, the speed of such random IO reading and writing is definitely unacceptable, so every time When SQL is executed, its data will be loaded into memory. This memory is a very important component in InnoDB: Buffer Pool.

一、Buffer Pool

Buffer Pool (buffer pool) is a very important memory structure in the InnoDB storage engine. As the name implies, the buffer pool is actually similar to Redis, acting as a cache, because we all know that MySQL data is ultimately stored on disk If there is no such Buffer Pool, then every database request will be searched in the disk, so there must be IO operations, which is definitely unacceptable. But with the Buffer Pool, we will store the results of the query in the Buffer Pool for the first time when querying, so that when there are further requests later, we will first query from the buffer pool. If we don’t find it on the disk, Then put it in the Buffer Pool, as shown below:

Insert picture description here

According to the picture above, the execution steps of this SQL statement are roughly like this:
- The innodb storage engine will look for the existence of the data with id=1 in the buffer pool;
- Found that does not exist, then it will go to the disk to load and store it in the buffer pool
- This record will be added with an exclusive lock;

2. undo log file: record data is modified

Undo, as the name implies, means nothing happened, nothing happened. Undo log is some logs where nothing happened (what was the original thing);
We have just said that when preparing to update a statement, the statement has been loaded into the Buffer pool. In fact, there is such an operation here, that is, when the statement is loaded into the Buffer Pool. Insert a log into the undo log file, that is, record the original value of the record with id=1. What is the purpose of this?
The biggest feature of the Innodb storage engine is to support transactions. If this update fails, that is, the transaction commit fails, then all operations in the transaction must be rolled back to the state before execution, that is, when the transaction fails, also Will not affect the original data, as shown below:

Insert picture description here

Here is an extra note. In fact, MySQL is also a system, just like the functional system of java that is usually developed. MySQL uses a system developed by its own corresponding language. It designs corresponding functions according to its own needs. , What it can do now, then it must be defined by the designers or evolved based on actual scene changes. So everyone calm down and treat MySQL as a system to understand and familiarize it;
At this point, the SQL statement we executed has been loaded into the Buffer Pool, and then we start to update this statement. The update operation is actually executed in the Buffer Pool. Then the problem is coming. According to the set we usually develop When the data in the theoretical buffer pool is inconsistent with the data in the database, we think that the data in the cache is dirty data . At this time, does the data in the buffer pool become dirty data? That's right, the current data is dirty data, and the record in the Buffer Pool is the record in Xiaoqiang's database is Wangcai. How does MySQL handle this situation? Continue to look down;

3. Redo log file: after the record data is modified

In addition to loading files from the disk and saving the records before the operation to the undo log file, other operations are completed in the memory. The characteristics of the data in the memory are: power loss. If the server where MySQL is down at this time, then all the data in the Buffer Pool will be lost. At this time, the redo log file needs to show its magic;
Redo records the value after the data is modified, and it will be recorded regardless of whether the transaction is submitted or not. For example, what will be done at this time is update students set stuName='Xiaoqiang' where id=1; Then this operation will be recorded in the redo log What is in the buffer? How to create a redo log buffer is very simple. In order to improve efficiency, MySQL puts these operations in memory to complete, and then persists it to disk at a certain time.

Insert picture description here

Up to now, we should be familiar with how the MySQL executor calls the storage engine to load a piece of SQL into the buffer pool and which logs to record. The process is as follows:
- Prepare to update a SQL statement;
- MySQL (innodb) will first go to the buffer pool (BufferPool) to find the data, if not found, it will go to the disk to find it, and if it finds it, it will load the data into the buffer pool (BufferPool);
- While loading into the Buffer Pool, the original record of this data will be saved to the undo log file;
- Innodb will perform update operations in the Buffer Pool;
- The updated data will be recorded in the redo log buffer;
The steps mentioned above are all operating under normal conditions, but the design and optimization of the program are not only done for these normal conditions, but also for the critical areas and extreme conditions to optimize the design;
If the server is down at this time, the data in the cache is still lost. It's really annoying, even if the data is always lost, can it be saved directly to the disk instead of in the memory? Obviously not, because it has been introduced above that the purpose of operations in memory is to improve efficiency.
At this point, if MySQL really goes down, it doesn't matter, because MySQL will think that this transaction failed, so the data is still the same as before the update, and there will be no impact.
Ok, the statement is also updated, then the updated value needs to be submitted, that is, the transaction needs to be submitted, because as long as the transaction is successfully submitted, the last change will be saved to the database, and it will still be submitted before the transaction Other related operations;
To persist the data in the redo log buffer to the disk is to write the data in the redo log buffer to the redo log disk file. In general, the strategy for writing the redo log buffer data to the disk is to immediately flush the data to the disk (specifically The strategy will be described in detail in the following summary), as shown in the figure:

Insert picture description here

If the database server goes down after the redo log buffer is flushed to the disk, what should we do with the updated data? The data is in the memory at this time, isn't the data lost? No, the data will not be lost this time, because the data in the redo log buffer has been written to the disk and has been persisted. Even if the database is down, MySQL will redo the log file the next time it restarts. The content is restored to the Buffer Pool (My understanding here is that the persistence mechanism of Redis is similar. When Redis is started, it will check RDB or AOF or both, and restore the data according to the persisted file. Into memory);
So far, what have you done to call the storage engine interface from the executor?
- Prepare to update a SQL statement;
- MySQL (innodb) will first go to the buffer pool (BufferPool) to find the data, if not found, it will go to the disk to find it, and if it finds it, it will load the data;
- To the buffer pool (BufferPool) 3. While loading into the Buffer Pool, the original record of this data will be saved to the undo log file;
- Innodb will perform update operations in the Buffer Pool;
- The updated data will be recorded in the redo log buffer;
- MySQL time to commit the transaction, the write data will redo log buffer in the redo log file to the disk can be set through the brush innodb_flush_log_at_trx_commit parameters:
  A value of 0 indicates no brush into the disk;
  value of 1 indicates the disk immediately brush;
  value 2 means flash to os cache first;
- When myslq restarts, the redo log will be restored to the buffer pool;

Four, bin log log file: record the entire operation process

The redo log introduced above is a log file unique to the InnoDB storage engine, and the bin log is a MySQL-level log. The things recorded in the redo log are biased towards physical properties, such as: "what data and what changes have been made". Bin log is biased towards logical nature, similar to: "updated the record with id 1 in the students table" The main features of the two are summarized as follows:
- Nature redo Logbin Log file size The size of redo log is fixed (it can also be set in the configuration, generally the default is sufficient) bin log can be set through the configuration parameter max_bin log_size to set the size of each bin log file (but generally not recommended to modify) .
- Redo log is implemented by the InnoDB engine layer (that is, it is unique to Innodb storage) bin log is implemented by the MySQL layer. All engines can use the bin log logging method. Redo log uses circular writing to record. When writing to the end, it will return to the beginning to write logs circularly.
- The bin log is recorded by appending. When the file size is larger than the given value, the subsequent log will be recorded to the new file. Use scenario redo log is suitable for crash-safe (this is actually very similar to the persistence of Redis) Characteristics) bin log is suitable for master-slave replication and data recovery;
- How is the bin log file flushed to the disk? Bin log flushing has a related strategy. The strategy can be modified by sync_bin log. The default is 0, which means that it is written to the os cache first, which means that when the transaction is committed, The data will not go directly to the disk, so if the bin log data is down, it will still be lost. So it is recommended to set sync_bin log to 1, which means to write data directly to the disk file.
There are several modes of flashing bin log:
- STATMENT: SQL statement-based replication (statement-based replication, SBR), each SQL statement that will modify data will be recorded in the bin log:
  [Advantages]: There is no need to record the changes of each line, which reduces the amount of bin log. Save IO, thereby improving performance
  [Disadvantages]: In some cases, it will cause inconsistent master-slave data, such as executing sysdate(), slepp(), etc.
- ROW: Row-based replication (RBR), does not record the context information of each SQL statement, only needs to record which data is modified:
  [Advantages]: There will be no stored procedures under certain circumstances, Or the problem that the function or trigger call and trigger cannot be copied correctly;
  [Disadvantage]: A large amount of logs will be generated, especially when the alter table is used, the logs will skyrocket;
MIXED: Mixed-based replication (MBR) based on the STATMENT and ROW modes. General replication uses the STATEMENT mode to save the bin log. For operations that cannot be replicated in the STATEMENT mode, use the ROW mode to save the bin log. Since bin log is also a log file, what is it recording data? In fact, when MySQL commits a transaction, it not only writes the data in the redo log buffer to the redo log file, but also records the modified data in the bin log file, and also records the modified bin this time. The location of the log file name and the modified content in the bin log is recorded in the redo log, and finally a commit mark is written at the end of the redo log, which means that the transaction was successfully committed.

Insert picture description here

If the data is written to the bin log file and the database is down just after the writing is completed, will the data be lost?
The first thing to be sure is that as long as there is no commit mark at the end of the redo log, this indicates that the transaction must have failed. But the data is not lost, because it has been recorded in the redo log disk file. When MySQL restarts, the data in the redo log will be restored (loaded) into the Buffer Pool.
Well, so far, we have basically introduced an update operation basically, but do you feel that there is something missing that has not been done? Did you also find that the updated record at this time is only executed in memory, even if the machine is down and resumed, it is just loading the updated record into the Buffer Pool. At this time, the record in the MySQL database is still The old value, that is to say, the data in the memory is still dirty data in our opinion. What should we do at this time?
In fact, MySQL will have a background thread, which will flush the dirty data in our Buffer Pool to the MySQL database at a certain time, so that the memory and database data will be unified.

Insert picture description here

Summary of this article

About the concepts of Buffer Pool, Redo Log Buffer, undo log, redo log, and bin log:
- Buffer Pool is a very important component of MySQL, because the addition, deletion and modification operations for the database are all done in the Buffer Pool;
- Undo log records the look before the data operation;
- The redo log records the appearance of the data after being manipulated (redo log is unique to the Innodb storage engine);
- The bin log records the entire operation record (this is very important for master-slave replication);
Description of the process from preparing to update a piece of data to committing the transaction:
- First, the executor queries the data according to the MySQL execution plan, first queries the data from the buffer pool, if not, it queries the database, and if the query is found, it puts it in the buffer pool;
- When the data is cached in the buffer pool, it will be written to the undo log log file;
- The update action is completed in BufferPool, and the updated data will be added to the redo log buffer at the same time
- After completion, you can submit the transaction, and the following three things will be done while submitting;
- (First thing) Flush the data in the redo log buffer to the redo log file;
- (The second thing) Write this operation record to the bin log file;
- (The third thing) Record the location of the bin log file name and updated content in the bin log in the redo log, and add a commit mark at the end of the redo log.