MySQL database: the execution process of SQL statements

1. The MySQL driver of the client:

Before our system communicates with the MySQL database, it needs to establish a connection with the database. This function is done by the bottom layer of the MySQL driver for us. After the connection is established, we only need to send SQL statements to execute CRUD. As shown below:

One SQL request will establish one connection, and multiple requests will establish multiple connections. Assuming that our system is deployed in a tomcat container, tomcat can process multiple requests concurrently, which will cause multiple requests to establish multiple connections, and then close them after use. What will happen to this problem? ? The Java system is based on the TCP/IP protocol when connecting to the MySQL database through the MySQL driver, so if each request is to create and destroy a connection, such frequent creation and destruction of connections will inevitably greatly reduce the performance of our system.

In order to solve the above problems, the idea of ​​"pooling" is adopted to maintain a certain number of connection threads through the connection pool. When a connection is needed, it is directly obtained from the thread pool, and then returned to the thread pool after use. The thread pool greatly reduces the overhead of constantly creating and destroying threads, and we don't need to care about the creation and destruction of connections, and how the thread pool maintains these connections. Common database connection pools are Druid, C3P0, DBCP

 

2. Server layer of MySql architecture:

Before introducing the execution steps of SQL statements in the MySQL database on Server, let's first understand the overall structure of MySQL:

If the above picture is not clear, you can look at the following picture again:

From the above architecture diagram, it can be seen that the Server layer is mainly composed of connectors, query caches, resolvers/analyzers, optimizers, and executors. These parts will be mainly described below.

1. Connector:

When the client wants to operate the database, the premise is to establish a connection with the database; and the connector is used to establish a connection with the client, obtain permissions, maintain and manage the connection.

(1) Connection method:

MySQL supports both short and long connections. The short connection is to close immediately after the operation is completed. The long connection can be kept open, reducing the consumption of the server to create and release the connection. This connection can also be used during subsequent program access.

(2) Connection pool:

Like the client's connection pool, in order to reduce unnecessary performance loss caused by frequent creation and destruction of connections, the idea of ​​"pooling" is also adopted here to manage connections through the database connection pool. Generally we will use long connections in the connection pool, for example: druid, c3p0, dbcp, etc.

2. Query cache:

MySQL cache is turned off by default, which means that the use of cache is not recommended, and  the entire function of query cache is directly deleted in  MySQL 8.0 version

(1) Why doesn't MySql enable caching by default?

Mainly due to its usage scenarios:

① Let me talk about the data storage format in the cache: key (sql statement)-value (data value), so if the SQL statement (key) has a little difference, it will directly query the database;

② Since the data in the table is not static, most of it changes frequently, and when the data in the database changes, then the corresponding cache data related to this table needs to be removed;

3. Analysis/parser:

The work of the analyzer is mainly to parse the SQL statement to be executed, and finally get the abstract syntax tree, and then use the preprocessor to determine whether the table in the abstract syntax tree exists, if it exists, then determine whether the select projection column field is in Exist in the table and so on.

(1) Lexical analysis:

Lexical analysis is used to disassemble SQL into indivisible atomic symbols, called Tokens. And according to the dictionaries provided by different database dialects, they are classified into keywords, expressions, literals and operators.

(2) Syntax analysis:

Syntax analysis is to convert SQL statements into abstract syntax trees based on the Token (atomic symbol) that is disassembled by lexical analysis.

The following is a direct example to illustrate what a SQL abstract syntax book looks like:

SELECT id, name FROM t_user WHERE status = 'ACTIVE' AND age > 18

Then the abstract grammar book obtained after lexical analysis and grammatical analysis of the above SQL statement is as follows:

Note that for ease of understanding, the Token of the keyword in the abstract syntax tree is represented in green, the Token of the variable is represented in red, and the gray indicates that it needs to be further split.

(3) Preprocessor:

Preprocessing is used to  perform semantic verification on the generated  abstract syntax tree . Semantic verification is to verify the query table and select projection column fields to determine whether the table and field exist or not;

4. Optimizer:

The function of the optimizer is mainly to take the grammar tree obtained after the lexical analysis/syntax analysis of SQL, through a series of calculations through the content of the MySQL data dictionary and statistical information,  and finally get an execution plan , including the choice of which index to use.

In the optimization process, what is the series of calculations?

(1) Logic transformation: For example, there is 8>9 in the SQL where condition. The logic transformation is to directly simplify the constant expression in the syntax tree to false; in addition to the simplification, there are also constant expressions. Calculation etc.

(2) Cost optimization: by paying the cost of statistical analysis of data, to get whether the SQL execution can be indexed, and which indexes to go; in addition, in the multi-table related query, determine the order of the final table join, etc. ;

When analyzing whether to use the index query, it is obtained by performing dynamic data sampling and statistical analysis ; as long as it is statistically analyzed, there may be analysis errors. Therefore, this aspect should be considered when SQL execution does not use the index. the elements of

How to check the MySql execution plan? Just add the explain keyword before the executed SQL statement;

5. Actuator:

MySQL knows what you want to do through the analyzer, and knows what to do through the optimizer, so it enters the executor stage and starts to execute the statement. The executor finally calls the API interface provided by the storage engine to call the operating data according to a series of execution plans to complete the SQL execution.

When starting to execute, you must first determine whether the connected object has permission to perform operations on this table. If not, it will return an error of no permission; if so, execute it according to the generated execution plan.

 

Three, InnoDB storage engine:

The storage engine is a component that performs actual operations on the underlying physical data, and provides various APIs for operating data for the Server layer. The data is stored in memory or disk. MySQL supports plug-in storage engines, including InnoDB, MyISAM, Memory, etc. In general, the storage engine used by MySQL is InnoDB by default. As shown in the figure below, InnoDB storage engine is divided into memory structure (Memory Structures) and disk structure (Disk Structures) as a whole

1、Buffer Pool:

Buffer Pool (buffer pool) is a very important memory structure in the InnoDB storage engine. It functions like Redis and acts as a cache. MySQL data is ultimately stored in the disk. If there is no Buffer Pool, then every database request will be searched in the disk, so there will inevitably be IO operations. But with the Buffer Pool, only the first query will store the results of the query in the Buffer Pool, so that when there are subsequent requests, it will first query from the buffer pool. If you don’t find it in the disk, Then put it in the Buffer Pool, as shown below

image-20210105101150038

UPDATE students SET stuName = '小强' WHERE id = 1

For example, for this SQL, according to the picture above, the execution steps of the SQL statement are roughly like this:

  • (1) The innodb storage engine first searches the buffer pool for the existence of the data with id=1
  • (2) If the cache does not exist, load it to the disk and store it in the buffer pool
  • (3) An exclusive lock will be added to the record

Remarks:

The difference between buffer pool and query cache:

(1) Query cache: The query cache is located in the Server layer. MySQL Server will first check whether the SQL has been executed from the query cache. If it has been executed, the previously executed query results will be stored in the query cache in the form of Key-Value. in. The key is the SQL statement, and the value is the query result. We call this process query caching!

(2) Buffer Pool is located at the storage engine layer. Buffer Pool is a buffering mechanism designed by the MySQL storage engine to speed up the reading speed of data.

2. Undo log file: record what the data looks like before it is modified

The biggest feature of the Innodb storage engine is to support transactions. If the transaction fails to commit, then all operations in the transaction must be rolled back to the state before execution, and this rollback operation is done using the undo log file.

undo, as the name implies, means nothing happened, nothing happened. undo log is some logs where nothing happened (what was the original thing)

We have just introduced that when preparing to update a SQL statement, the data corresponding to the statement has been loaded into the Buffer pool. In fact, there is such an operation here, which is to load the statement into the Buffer Pool. At the same time, a log is inserted into the undo log file, that is, the original value of the record with id=1 is recorded, so that it can be rolled back after the transaction fails.

At this point, the data corresponding to the SQL statement we executed has been loaded into the Buffer Pool, and then we start to update this statement. The update operation is actually executed in the Buffer Pool. The problem is, after updating the data, the data in the Buffer Pool will be inconsistent with the database in the database. That is to say, the data in the Buffer Pool becomes dirty data? That's right, the current data is dirty data. The record in the Buffer Pool is "Xiaoqiang" and the record in the database is "Wangcai". How does MySQL handle this situation? Let's look down

3. Redo log file: record the appearance of the data after it has been modified

Preface: The redo log file is unique to InnoDB. It is at the storage engine level, not at the MySQL level.

In addition to loading files from the disk and saving the records before the operation to the undo log file, other operations are completed in the memory. The characteristics of the data in the memory are: power loss. If the server where MySQL is down at this time, then all the data in the Buffer Pool will be lost. At this time, the redo log file needs to show its magic

Redo means ready to do and going to do. Redo log records some operations that will be done. For example, what will be done at this time is to update students set stuName='Xiaoqiang' where id=1; Then this operation will be recorded in the redo log buffer. The redo log buffer is MySQL in order to improve efficiency, so these operations are first Put it in memory to complete

Assuming that the server is down at this time, the data in the cache is still lost. Can it be saved directly to disk instead of in memory? Obviously not, because it has also been introduced above that the purpose of operations in memory is to improve efficiency. At this point, if MySQL really goes down, it doesn't matter, because MySQL will consider this transaction to be a failure, so the data is still the same as before the update, and there will be no impact.

At this point, the SQL statement is also updated, then the updated value needs to be submitted, that is, the transaction needs to be submitted. As long as the transaction is successfully submitted, the last change will be saved to the database. To persist the data in the redo Log Buffer to the disk is to write the data in the redo log buffer to the redo log disk file.

If the database server goes down after the redo log Buffer is flushed to the disk, what should we do with the updated data? At this time, the data is in the memory, isn't the data lost? No, the data will not be lost this time, because the data in the redo log buffer has been written to the disk and has been persisted. Even if the database is down, MySQL will redo the log file the next time it restarts. The content is restored to the Buffer Pool

  • (1) Prepare to update a SQL statement
  • (2) MySQL (innodb) will first go to the buffer pool (Buffer Pool) to find the data, if not found, it will go to the disk to find it, if it finds it, it will load the data into the buffer pool (Buffer Pool)
  • (3) While loading into the Buffer Pool, the original record of this data will be saved to the undo log file
  • (4) Innodb will perform update operations in the Buffer Pool
  • (5) The updated data will be recorded in the redo log buffer
  • (6) When MySQL commits the transaction, it will write the data in the redo log buffer to the redo log file. The flushing of the disk can be set by the innodb_flush_log_at_trx_commit parameter, a value of 0 means no flushing to the disk, a value of 1 means flushing immediately Disk, a value of 2 means flashing to the os cache first. Under normal circumstances, it is flashed to the disk immediately
  • (7) When myslq restarts, the redo log will be restored to the buffer pool

4. Bin log log file: record the entire operation process

Preface: Bin log and redo log are somewhat similar, the main differences between the two are:

(1) Redo log is a log file unique to the InnoDB storage engine, and bin log is a MySQL-level log

(2) Redo log is suitable for crash recovery, and bin log is suitable for master-slave replication and data recovery

The things recorded in the redo log are biased towards physical properties, such as: "what data and what changes have been made". The bin log is biased towards logical nature, similar to: "The update operation is leased to the record with id 1 in the students table".

How is the bin log file  flushed to the disk? The flushing strategy of the bin log can be modified by sync_bin log. The default value is 0, which means that it is written to the os cache first, which means that when the transaction is committed, the data will not be directly sent to the disk. , So if the bin log data is down, the data will still be lost. Therefore, it is recommended to set sync_bin log to 1, which means to write data directly to the disk file.

Since bin log is also a log file, where does it record data? In fact, when MySQL commits a transaction, it will not only write the data in the redo log buffer to the redo log file, but also record the modified data in the bin log file, and will also record the modified bin this time. The location of the log file name and modified content in the bin log is recorded in the redo log, and finally a commit mark is written at the end of the redo log, which means that the transaction was successfully committed.

When the data is written to the bin log file, if the database is down just after the writing is completed, will the data be lost?

The first thing to be sure is that as long as there is no commit mark at the end of the redo log, it means that this transaction must have failed. But the data is not lost, because it has been recorded in the disk file of the redo log. When MySQL restarts, the data in the redo log will be restored (loaded) into the Buffer Pool.

Well, so far, we have basically introduced an update operation basically, but do you feel that there is something missing that has not been done? Did you also find that the updated record at this time is only executed in memory, even if the machine is down and resumed, it is just loading the updated record into the Buffer Pool. At this time, the record in the MySQL database is still The old value, that is to say, the data in the memory is still dirty data in our opinion, what should we do at this time?

In fact, MySQL will have a background thread, which will flush the dirty data in our Buffer Pool to the MySQL database at a certain time, so that the memory and database data will be unified.

5. Summary:

  • (1) First, the MySQL executor calls the storage engine API to query data according to the execution plan
  • (2) The storage engine first queries the data from the buffer pool of the buffer pool, if not, it will go to the disk to query, and if it is queried, it will be placed in the buffer pool.
  • (3) When the data is loaded into the Buffer Pool, the original record of this data will be saved to the undo log file
  • (4) Innodb will perform update operations in the Buffer Pool
  • (5) The updated data will be recorded in the redo log buffer
  • (6) The commit transaction will do the following three things while committing
  • (7) (First thing) flush the data in the redo log buffer to the redo log file
  • (8) (Second thing) Write this operation record to the bin log file
  • (9) (The third thing) Record the name of the bin log file and the location of the updated content in the bin log in the redo log, and add a commit mark at the end of the redo log
  • (10) Using a background thread, it will flush the updated data in our Buffer Pool to the MySQL database at a certain time, so that the memory and database data will be unified

 

Reference article:

https://juejin.cn/post/6897388295060684807

https://juejin.cn/post/6920076107609800711

Guess you like

Origin blog.csdn.net/a745233700/article/details/113927318