The Past and Present of a SQL Statement

This article will analyze the execution process of the next SQL statement in MySQL, including how the SQL query will flow inside MySQL and how the update of the SQL statement is completed.

MySQL Infrastructure Analysis

The figure below is a brief architecture diagram of MySQL. From the figure below, you can clearly see how the user's SQL statement is executed inside MySQL.

Let me first briefly introduce the basic functions of some components involved in the figure below to help you understand this picture, and the functions of these components will be introduced in detail later.

  • 连接器: Identity authentication is related to authority (when logging in to MySQL).
  • 查询缓存: When executing a query statement, the cache will be queried first (removed after MySQL 8.0, because this function is not very practical).
  • 分析器: If the cache is not hit, the SQL statement will pass through the analyzer. To put it bluntly, the analyzer first looks at what your SQL statement is doing, and then checks whether the syntax of your SQL statement is correct.
  • 优化器: Execute according to the optimal solution considered by MySQL.
  • 执行器: Execute the statement and return data from the storage engine.

insert image description here

Simply put, MySQL is mainly divided into the Server layer and the storage engine layer:

  • Server 层: Mainly including connectors, query caches, analyzers, optimizers, executors, etc., allFunctions Across Storage EnginesAll are implemented at this layer, such as stored procedures, triggers, views, functions, etc., and there is also a general log module binlog log module.
  • 存储引擎: It is mainly responsible for data storage and reading. It adopts a replaceable plug-in architecture and supports multiple storage engines such as InnoDB, MyISAM, and Memory. The InnoDB engine has its own log module redolog module. The most commonly used storage engine is InnoDB, which has been used as the default storage engine since MySQL 5.5.

Then let's take a look at the specific functions of these components in the server layer:

  • Connector

    • Connectors are mainly related to functions related to identity authentication and permissions, just like a high-level gatekeeper.
    • It is mainly responsible for user login to the database and user identity authentication, including verifying account passwords, permissions and other operations. If the user account password has passed, the connector will query all permissions of the user in the permission table, and then the logical judgment of permissions in this connection will depend on the permission data read at this time. That is to say, as long as the connection is not disconnected later, even if the administrator modifies the user's permissions, the user will not be affected
  • Query cache (removed after MySQL 8.0)

    • The query cache is mainly used to cache the SELECT statement we execute and the result set of the statement. After the connection is established, when the query statement is executed, the cache will be queried first. MySQL will first check whether the SQL has been executed, and cache it in the memory in the form of Key-Value. Key is the query estimate, and Value is the result set. If the cache key is hit, it will be directly returned to the client. If it is not hit, the subsequent operation will be performed, and the result will be cached after completion, which is convenient for the next call. Of course, when the cache query is actually executed, the user's permissions are still checked to see if there are query conditions for the table.
    • It is not recommended to use cache for MySQL queries, because query cache invalidation may be very frequent in actual business scenarios. If you update a table, all query caches on this table will be cleared. For data that is not updated frequently, it is still possible to use a cache. Therefore, in most cases, we do not recommend using query caching. The caching function was deleted after MySQL 8.0. Officials also believe that this function is rarely used in actual application scenarios, so they simply deleted it directly.
  • Analyzer

    • If MySQL does not hit the cache, it will enter the analyzer. The analyzer is mainly used to analyze what the SQL statement is for. The analyzer will also be divided into several steps:
      • The first step is lexical analysis . An SQL statement consists of multiple strings. First, keywords must be extracted, such as select, the table to be queried, field names, query conditions, and so on. After completing these operations, you will enter the second step.
      • The second step, syntax analysis , is mainly to judge whether the SQL you input is correct and conforms to the syntax of MySQL.
    • After completing these two steps, MySQL is ready to start execution, but how to execute, and how to execute is the best result? At this time, the optimizer is needed.
  • optimizer

    • The role of the optimizer is to execute what it thinks is the best execution plan (sometimes it may not be the best, this article involves an in-depth explanation of this part of knowledge), such as how to choose an index when there are multiple indexes, how to choose the association order when querying multiple tables, etc.
    • It can be said that after passing through the optimizer, it can be said that the specific execution of this statement has been determined.
  • Actuator

    • When the execution plan is selected, MySQL is ready to start execution. First, it will check whether the user has permission before execution. If there is no permission, an error message will be returned. If there is permission, it will call the interface of the engine and return the result of interface execution.

sentence analysis

Check for phrases

Having said so much above, how does a SQL statement execute? In fact, our SQL can be divided into two types, one is query, and the other is update (add, modify, delete). Let's analyze the query statement first, the statement is as follows:

select * from tb_student  A where A.age='18' and A.name=' 张三 ';

Combined with the above description, we analyze the execution flow of this statement:

  • First check whether the statement has permission. If there is no permission, an error message will be returned directly. If there is permission, before the MySQL8.0 version, the cache will be queried first, and this SQL statement is used as the key to query whether there is a result in memory. If there is a direct cache, if not, go to the next step.

  • Perform lexical analysis through the analyzer to extract the key elements of the SQL statement. For example, the above statement is extracted as a query select, and the name of the table to be queried is tb_student, and all columns need to be queried. The query condition is the id='1' of this table. Then judge whether the SQL statement has grammatical errors, such as whether the keywords are correct, etc. If there is no problem in the check, go to the next step.

  • The next step is for the optimizer to determine the execution plan. The above SQL statement can have two execution plans:

    • First query the student whose name is "Zhang San" in the student table, and then determine whether the age is 18.
    • First find out the students who are 18 years old among the students, and then query the students whose name is "Zhang San".
  • Then the optimizer chooses a solution with the best execution efficiency according to its own optimization algorithm (the optimizer believes that sometimes it is not necessarily the best). Then after confirming the execution plan, it is ready to start execution.

  • Perform permission verification, if there is no permission, an error message will be returned, if there is permission, the database engine interface will be called, and the execution result of the engine will be returned

update statement

The above is the execution process of a query SQL, so let's see how an update statement is executed? The SQL statement is as follows:

update tb_student A set A.age='19' where A.name=' 张三 ';

Let's modify Zhang San's age. In fact, this statement basically follows the process of the previous query, but when executing an update, it must record a log, which will introduce a log module. The log module that comes with MySQL is binlog (archive log), which can be used by all storage engines. Our commonly used InnoDB engine also comes with a log module redo log (redo log). Let’s discuss the execution process of this statement in InnoDB mode. The process is as follows:

  • First query the data of Zhang San, if there is a cache, it will also use the cache.
  • Then get the query statement, change the age to 19, and then call the engine API interface to write this line of data. The InnoDB engine saves the data in the memory and records the redo log at the same time. At this time, the redo log enters the prepare state, and then tells the executor that the execution is completed and can be submitted at any time.
  • After the executor receives the notification, it records the binlog, then calls the engine interface, and submits the redo log as the submission status.
  • update completed.

Why use two log modules here, can't we use one log module?

This is because MySQL did not have an InnoDB engine at the beginning (the InnoDB engine was inserted into MySQL by other companies in the form of a plug-in). The built-in engine of MySQL is MyISAM, but we know that the redo log is unique to the InnoDB engine, and other storage engines do not have it.

It's not that it's impossible to use only one log module, but the InnoDB engine supports transactions through redo log. Then, some students will ask again, I use two log modules, but it is not so complicated, why should the redo log introduce the prepare pre-submit state? Here we use the method of counter-evidence to explain why we do this?

  • First write the redo log and submit it directly, and then write the binlog . Assuming that after writing the redo log, the machine hangs up and the binlog log is not written, then after the machine restarts, the machine will restore the data through the redo log, but the binlog does not record the data at this time. When the machine is backed up later, this piece of data will be lost, and the master-slave synchronization will also lose this piece of data.
  • Write the binlog first, and then write the redo log . Suppose the machine restarts abnormally after writing the binlog. Since there is no redo log, the machine cannot recover this record, but there is another record in the binlog. Then the same reason as above will cause data inconsistency.

If the redo log two-phase commit method is used, it is different. After writing the binlog, and then submitting the redo log will prevent the above problems from occurring, thereby ensuring data consistency. So the question is, is there an extreme situation? Assuming that the redo log is in the pre-commit state and the binlog has been written, what will happen if an abnormal restart occurs at this time? This depends on the processing mechanism of MySQL. The processing process of MySQL is as follows:

  • Judging whether the redo log is complete, if it is judged to be complete, submit it immediately.
  • If the redo log is only pre-submitted but not in the commit state, it will judge whether the binlog is complete at this time. If it is complete, the redo log will be submitted, and if it is not complete, the transaction will be rolled back.

This solves the problem of data consistency.

Summarize

Let's summarize:

  • MySQL is mainly divided into the server layer and the engine layer. The server layer mainly includes connectors, query caches, analyzers, optimizers, and executors. There is also a log module (binlog). This log module can be shared by all execution engines. Redolog is only available in InnoDB.
  • The engine layer is plug-in, currently mainly including MyISAM, InnoDB, Memory, etc.
  • The execution flow of the query statement is as follows:权限校验(如果命中缓存)--->查询缓存--->分析器--->优化器--->权限校验--->执行器--->引擎
  • The update statement execution flow is as follows:分析器---->权限校验---->执行器--->引擎---redo log(prepare 状态)--->binlog--->redo log(commit 状态)

Guess you like

Origin blog.csdn.net/zyb18507175502/article/details/130100926