MySQL - How a SQL statement is executed (detailed explanation of SQL execution)

Preface

We deal with databases every day and can write dozens of SQL statements a day, but do you know how our system interacts with the database? How does MySQL help us store data, and how does it help us manage transactions? ....Do you feel that your mind is basically blank except for writing a few "select * from dual"? This article will take you into the world of MySQL, allowing you to thoroughly understand how the system interacts with MySQL, and what MySQL does when it receives the SQL statement we send.

¶ MySQL driver

When our system communicates with the MySQL database, it is impossible to receive and send requests for no reason. Even if you don't do anything, there should always be other "people" who do something for us. Basically, programmers who have used the MySQL database will more or less know the concept of the MySQL driver. It is this MySQL driver that helps us connect to the database at the bottom level. Only after the connection is established can subsequent interactions be possible. See the picture below to show

In this case, before the system interacts with MySQL, the MySQL driver will help us establish a connection, and then we only need to send SQL statements to execute CRUD. One SQL request will establish a connection, and multiple requests will establish multiple connections. Then the problem is that our system is definitely not used by one person. In other words, there must be multiple requests competing for connections at the same time. . Our web systems are generally deployed in tomcat containers, and tomcat can process multiple requests concurrently, which will cause multiple requests to establish multiple connections, and then close them after use. This will cause What's the problem? As shown below

The Java system is based on the TCP/IP protocol when connecting to the MySQL database through the MySQL driver. Therefore, if each request is to create a new connection and destroy a connection, this will inevitably cause unnecessary waste and performance degradation. In other words, The frequent creation and destruction of connections during the above multi-threaded request is obviously unreasonable. It will inevitably greatly reduce the performance of our system, but if we provide you with some fixed threads for connection, will there be no need to repeatedly create and destroy connections? I believe that knowledgeable friends will smile knowingly, yes, it is the database connection pool.

Database connection pool : Maintain a certain number of connections to facilitate the system to obtain connections. Just go to the pool to get them when you use them, and just put them back after use. We don’t need to care about the creation and destruction of connections, nor how to maintain the thread pool. These are connected.

Common database connection pools include Druid, C3P0, and DBCP. The implementation principles of connection pools will not be discussed in depth here. Using connection pools greatly saves the overhead of continuously creating and destroying threads. This is the famous "pooling" idea, whether it is Whether it is a thread pool or an HTTP connection pool, you can see it.

¶Database connection pool

At this point, what we already know is that when our system accesses the MySQL database, the connection established is not created for every request, but obtained from the database connection pool. This solves the problem of repeated creation and The performance loss caused by destroying the connection is solved. But there is a small problem here. The business system is concurrent, but there is only one thread for MySQL to accept requests?

In fact, MySQL's architecture system has already provided such a pool, which is also a database connection pool. Both parties manage each connection through the database connection pool, so that on the one hand, threads do not need to compete for connections before, and more importantly, there is no need to repeatedly create and destroy connections.

At this point, the connection problem between the system and the MySQL database has been clearly explained. So how are these connections in the MySQL database handled, and who handles them?

¶Network connections must be handled by threads

Students who have a little knowledge of computing basics all know that connections in the network are handled by threads. The so-called network connection is simply a request, and each request will be processed by a corresponding thread. In other words, requests for SQL statements are processed by threads in MySQL.

How will these threads handle these requests? What will be done?

¶SQL interface

After obtaining the request, the thread that processes the request in MySQL obtains the SQL statement and hands it to the SQL interface for processing.

¶Query parser

If there is such a SQL now

SELECT stuName,age,sex FROM students WHERE id=1

@pdai: The code has been copied to the clipboard

But this SQL is written for us humans to read. How does the machine know what you are talking about? This is when the parser comes into play. It will parse the SQL statement passed by the SQL interface and translate it into a language that MySQL can understand. For details, see MySQL - How SQL is parsed in MySQL .

Now that the SQL has been parsed into a form that MySQL understands, is the next step just to execute it? In theory, this is the case, but MySQL is much more powerful than that. It will also help us choose the optimal query path.

What is the optimal query path? That is, MySQL will execute the query in the most efficient way it thinks.

How exactly is it done? This is about MySQL’s query optimizer

¶ MySQL Query Optimizer

We don't need to care about how the query optimizer is implemented internally. What I need to know is that MySQL will help me optimize this SQL statement in the best way it thinks it can and generate execution plans one by one. For example, if you When multiple indexes are created, MySQL will select and use the corresponding index based on the principle of minimum cost. The cost here mainly includes two aspects, IO cost and CPU cost.

IO cost : That is, the cost of loading data from disk to memory. By default, the IO cost of reading a data page is 1. MySQL reads data in the form of pages, that is, when a certain data is used, it does not Only this data will be read, and the data adjacent to this data will also be read into the memory. This is the famous principle of program locality, so MySQL will read a whole page each time, and the cost of one page is 1. Therefore, the cost of IO is mainly related to the page size.

CPU cost : After reading the data into the memory, there is also the cost of CPU operations such as checking whether the data meets the conditions and sorting. Obviously it is related to the number of rows. By default, the cost of checking records is 0.2.

The MySQL optimizer will calculate the index with the smallest cost of "IO cost + CPU" to execute

After the optimizer performs steps such as selecting the optimal index, it will call the storage engine interface and start executing the SQL statements that have been parsed and optimized by MySQL.

¶Storage engine

The query optimizer will call the storage engine's interface to execute SQL, which means that the actual execution of SQL is completed in the storage engine. Data is stored in memory or disk (storage engine is a very important component, which will be introduced in detail later)

¶Executor _

The executor is a very important component, because the operations of the previous components must ultimately call the storage engine interface through the executor to be executed. The executor finally calls the storage engine interface according to a series of execution plans to complete the execution of SQL.

¶First introduction to storage engines

Let’s illustrate with an updated SQL statement. The SQL is as follows

UPDATE students SET stuName = '小强' WHERE id = 1

@pdai: The code has been copied to the clipboard

When our system issues such a query to MySQL, MySQL will eventually call the storage engine through the executor for execution according to the series of processes we introduced above. The flow chart is the one above. When executing this SQL, the data corresponding to the SQL statement is either in memory or on the disk. If it is operated directly on the disk, the speed of such random IO reading and writing will definitely be unacceptable, so every time When executing SQL, its data will be loaded into memory. This memory is a very important component in InnoDB: Buffer Pool

¶ Buffer Pool

Buffer Pool is a very important memory structure in the InnoDB storage engine. As the name suggests, the buffer pool actually functions like Redis and serves as a cache, because we all know that MySQL data is ultimately stored on disk. , without this Buffer Pool, every database request we make will be searched on the disk, so there will inevitably be IO operations, which is definitely unacceptable. But with Buffer Pool, we will store the query results in the Buffer Pool when we query for the first time. In this way, when there is a subsequent request, we will query it from the buffer pool first. If not, we will search it on the disk. Then put it in the Buffer Pool, as shown below

According to the picture above, the execution steps of this SQL statement are roughly as follows:

The innodb storage engine will check whether the data with id=1 exists in the buffer pool.
If it is found that it does not exist, it will be loaded from the disk and stored in the buffer pool.
An exclusive lock will be added to this record (you cannot modify it while others are modifying it. This article will not focus on this mechanism. I will write a special article to explain it in detail in the future)

¶ undo log file: records how the data looked before it was modified

As the name suggests, undo means nothing is done and nothing happens. undo log is some logs where nothing happened (what the original thing was)

We have just said that when preparing to update a statement, the statement has been loaded into the Buffer Pool. In fact, there is also such an operation here, that is, when the statement is loaded into the Buffer Pool, it will be loaded into the Buffer Pool at the same time. Insert a log into the undo log file, that is, record the original value of the record with id=1.

What's the purpose of this ?

The biggest feature of the Innodb storage engine is that it supports transactions. If this update fails, that is, the transaction submission fails, then all operations in the transaction must be rolled back to the state before execution. That is to say, when the transaction fails, It will not affect the original data, just look at the pictures to speak

An additional word here, in fact, MySQL is also a system, just like the java functional system we usually develop. MySQL uses a system developed in its own corresponding language. It designs corresponding functions according to the functions it needs. Function, since what it can do, it must be how the designers originally defined it or it must have evolved based on actual scene changes. So everyone should calm down and treat MySQL as a system to understand and become familiar with it.

At this point, our executed SQL statement has been loaded into the Buffer Pool, and then we start to update this statement. The update operation is actually executed in the Buffer Pool. Then the problem comes. According to the set we usually develop Theoretically, when the data in the buffer pool is inconsistent with the data in the database, we consider the data in the cache to be dirty data. Then doesn't the data in the buffer pool become dirty data at this time? Yes, the current data is dirty data. The records in the Buffer Pool are for Xiaoqiang and the records in the database are for prosperity. How does MySQL handle this situation? Continue reading.

¶ redo log file: records how the data has been modified

In addition to loading files from the disk and saving the pre-operation records to the undo log file, other operations are completed in the memory. The characteristic of the data in the memory is that it is lost when the power is turned off. If the server where MySQL is located goes down at this time, all data in the Buffer Pool will be lost. At this time, the redo log file needs to show its magic.

Voice-over: redo log files are unique to InnoDB. They are at the storage engine level, not at the MySQL level.

redo records the value after data modification, regardless of whether the transaction is submitted or not. For example, what we will do at this time is update students set stuName='Xiaoqiang' where id=1; then this operation will be recorded in the redo log In buffer, what? Why is there another redo log buffer? It's very simple. In order to improve efficiency, MySQL puts these operations in memory to complete first, and then persists them to disk at a certain opportunity.

By now, we should all be familiar with how the MySQL executor calls the storage engine to load a SQL into the buffer pool and what logs are recorded. The process is as follows:

Prepare to update an SQL statement
MySQL (innodb) will first search for this data in the buffer pool (BufferPool). If it does not find it, it will search it on the disk. If it finds it, it will load this data into the buffer pool (BufferPool).
While loading into the Buffer Pool, the original record of this data will be saved to the undo log file.
innodb will perform update operations in the Buffer Pool
The updated data will be recorded in the redo log buffer

The steps mentioned above are all operations under normal circumstances, but the design and optimization of the program are not only done for these normal situations, but also for the problems that occur in critical areas and extreme situations to optimize the design.

If the server goes down at this time, the data in the cache will still be lost. It's really annoying, the data is always lost. Can it be saved directly to the disk instead of storing it in the memory? Obviously not, because as mentioned above, the purpose of operating in memory is to improve efficiency.

At this time, if MySQL is really down, it doesn't matter, because MySQL will consider this transaction to be a failure, so the data will still be as it was before the update, and there will be no impact.

Okay, the statement has been updated, so you need to submit the updated value, that is, you need to submit this transaction, because as long as the transaction is successfully submitted, the final changes will be saved to the database, and they will still be saved before submitting the transaction. Have other related operations

Persisting the data in the redo log buffer to the disk means writing the data in the redo log buffer to the redo log disk file. Generally, the strategy for writing the redo log buffer data to the disk is to flush the data to the disk immediately (specifically The strategic situation will be introduced in detail in the summary below), above picture

If the database server goes down after the redo log Buffer is flushed to the disk, what will happen to the data we updated? At this time, the data is in the memory. Isn't the data lost? No, the data will not be lost this time, because the data in the redo log buffer has been written to the disk and has been persisted. Even if the database is down, MySQL will also restore the redo log file when it is restarted next time. The content is restored to the Buffer Pool (my understanding here is that it is similar to the persistence mechanism of Redis. When Redis starts, it will check RDB or AOF or both, and restore the data according to the persistent file. into memory)

So far, what has been done by calling the storage engine interface from the executor ?

Prepare to update an SQL statement
MySQL (innodb) will first search for this data in the buffer pool (BufferPool). If it does not find it, it will search it on the disk. If it finds it, it will load this data into the buffer pool (BufferPool).
While loading into the Buffer Pool, the original record of this data will be saved to the undo log file.
innodb will perform update operations in the Buffer Pool
The updated data will be recorded in the redo log buffer
When MySQL commits a transaction, it will write the data in the redo log buffer to the redo log file. The disk flush can be set through the innodb_flush_log_at_trx_commit parameter.
- A value of 0 means not to flush the disk
- A value of 1 means flushing the disk immediately
- A value of 2 means that the os cache is flushed first.
When myslq restarts, the redo log will be restored to the buffer pool.

As of now, when the MySQL executor calls the storage engine interface to execute the SQL provided by the [Execution Plan], it has basically done what InnoDB has done, but it is not over yet. Next, we need to introduce the MySQL level log file bin log.

¶ bin log log file: record the entire operation process

The redo log introduced above is a log file unique to the InnoDB storage engine, while the bin log is a MySQL-level log. The things recorded by redo log tend to be of physical nature, such as: "What data was modified and what modifications were made". The bin log is more logical in nature, similar to: "The record with id 1 in the students table was updated." The main features of the two are summarized as follows:

nature	redo Log	bin Log
File size	The size of the redo log is fixed (it can also be set in the configuration, generally the default is enough)	bin log You can set the size of each bin log file through the configuration parameter max_bin log_size (but it is generally not recommended to modify it).
Method to realize	Redo log is implemented by the InnoDB engine layer (that is to say, it is unique to the Innodb storage engine)	Bin log is implemented by the MySQL layer, and all engines can use bin log
Recording method	redo log records in a loop writing method. When writing to the end, it will return to the beginning to write logs in a loop.	Bin log is recorded by appending. When the file size is larger than the given value, subsequent logs will be recorded to a new file.
scenes to be used	redo log is suitable for crash recovery (crash-safe) (this is actually very similar to the persistence feature of Redis)	bin log is suitable for master-slave replication and data recovery

How is the bin log file flushed to the disk ?

There are related strategies for flushing the bin log. The strategy can be modified through sync_bin log. The default is 0, which means writing to the os cache first. That is to say, when submitting a transaction, the data will not be directly sent to the disk. In this way, if Bin log data will still be lost during a crash. Therefore, it is recommended to set sync_bin log to 1 to directly write data to the disk file.

There are several modes for flashing bin log:

STATEMENT

SQL statement-based replication (statement-based replication, SBR), each SQL statement that modifies data will be recorded in the bin log

[Advantages]: No need to record changes in each row, reducing the amount of bin logs, saving IO, thereby improving performance

[Disadvantages]: In some cases, master-slave data may be inconsistent, such as when executing sysdate(), sleep(), etc.

ROW

Row-based replication (RBR) does not record the context information of each SQL statement, but only records which data has been modified.

[Advantages]: There will be no problem that the calls and triggers of stored procedures, functions, or triggers cannot be copied correctly under certain circumstances.

[Disadvantages]: A large amount of logs will be generated, especially when altering table, the logs will skyrocket.

MIXED

Mixed-based replication (MBR) based on STATMENT and ROW modes. General replication uses STATEMENT mode to save bin logs. For operations that cannot be copied in STATEMENT mode, use ROW mode to save bin logs.

Since bin log is also a log file, where does it record data?

In fact, when MySQL commits a transaction, it will not only write the data in the redo log buffer to the redo log file, but also record the modified data to the bin log file. The log file name and the location of the modified content in the bin log are recorded in the redo log. Finally, a commit mark is written at the end of the redo log, which means that the transaction was successfully submitted.

If the database crashes just after the data is written to the bin log file, will the data be lost?

The first thing that can be determined is that as long as there is no commit mark in the redo log at the end, it means that this transaction must have failed. But the data is not lost because it has been recorded to the redo log disk file. When MySQL restarts, the data in the redo log will be restored (loaded) into the Buffer Pool.

Okay, so far, we have basically introduced an update operation, but do you feel that something is missing that has not been done? Did you also find that the updated record at this time is only executed in the memory? Even if it is down and restored, the updated record is only loaded into the Buffer Pool. At this time, the record in the MySQL database is still The old value, that is to say, the data in the memory is still dirty data in our opinion, so what should we do at this time?

In fact, MySQL will have a background thread, which will flush the dirty data in our Buffer Pool to the MySQL database at a certain opportunity, thus keeping the memory and database data unified.

¶Summary of this article

At this point, the concepts and relationships between Buffer Pool, Redo Log Buffer and undo log, redo log, and bin log are basically the same.

Let's review again

Buffer Pool is a very important component of MySQL, because all additions, deletions, and modifications to the database are completed in the Buffer Pool.
Undo log records the appearance before data operation.
The redo log records what the data looks like after it has been manipulated (the redo log is unique to the Innodb storage engine)
The bin log records the entire operation record (this is very important for master-slave replication)

Process description from preparing to update a piece of data to transaction submission

First, the executor queries the data according to the execution plan of MySQL. It first queries the data from the cache pool. If it is not available, it will query the database. If the query is found, it will put it in the cache pool.
While the data is cached in the cache pool, it will be written to the undo log file.
The update action is completed in the BufferPool, and the updated data will be added to the redo log buffer.
After the transaction is completed, the transaction can be submitted. The following three things will be done while submitting.
- Flush the data in the redo log buffer into the redo log file
- Write this operation record to the bin log file
- Record the bin log file name and the location of the updated content in the bin log to the redo log, and add a commit mark at the end of the redo log.

This indicates that the entire update transaction has been completed

¶Conclusion _

So far, how the system deals with the MySQL database, submits an updated SQL statement to MySQL, what processes MySQL executes, and what things are done have all been explained from a macro perspective. More details of Buffer Pool will be explained

in detail in subsequent articles. Previous article: Mysql’s easy-to-understand MVCC

¶Article source

Original link: https://blog.csdn.net/weixin_41385912/article/details/112975752