SQL statement execution order and execution process on MySQL architecture

MySQL infrastructure

Its architecture can be applied and played a good role in a variety of different scenarios . Mainly reflected in the storage engine architecture. The plug-in storage engine architecture separates query processing from other system tasks and data storage and extraction.

image

  • Connector: Identity authentication and permissions related (when logging in)
  • Query cache: When executing a query statement, it will first go to the query cache.
  • Analyzer: If there is no cache hit, the SQL statement will pass through the analyzer. The analyzer mainly analyzes what your SQL statement needs to do, and then checks whether the syntax is correct
  • Optimizer: execute according to the optimal plan of MySQL task
  • Executor: execute the statement, and then cause the return data from the storage

MySQL architecture is mainly divided into Server layer and storage engine layer:

**Server layer: **Mainly includes connectors, query caches, analyzers, optimizers, executors, etc., all cross-storage functions are implemented in this layer, such as stored procedures, triggers, views, functions, etc. , There is also a general logging module.

**Storage Engine: **Mainly responsible for data storage and reading. It adopts a replaceable plug-in architecture and supports multiple storage engines such as InnoDB, MyISAM, and Memory. The InnoDB engine has its own log module redolog module. The most commonly used storage engine is InnoDB, which has been used as the default storage engine since MySQL 5.5.

Basic introduction to the Server layer

1. Connector

Mainly related to identity authentication and authority related functions, analogous to a very high-level doorman

Mainly responsible for the user to log in to the database and perform user identity authentication, including verification of account passwords and permissions. If the user account password is passed, the connector will query all permissions of the user in the permission list, and then the permissions in this connection The logical judgment will depend on the permission data read at this time, that is, as long as the connection is not broken in the future, even if the administrator modifies the user's permissions, the user will not be affected.

2. Query cache (removed after 8.0)

The query cache is mainly used to cache the query statement we execute and the result set of the statement .

After the connection is established, when the query statement is executed, it will query the cache first. MySQL will first verify whether the SQL has been executed and cache it in the memory in the form of key-value. The key is the query estimate and the value is the result set. If the cache key is hit, the query result will be returned directly; if there is no hit, the subsequent operation will be executed, and the result will be cached after completion, so that it can be called for the next time. Of course, when the cache query is actually executed, the user permissions and whether there are query conditions for the table will still be checked.

The utility of the cache is very low. For data that is not frequently updated, the cache can be used. This is why the cache is deleted after the 8.0 version, and there are fewer application scenarios.

3. Analyzer

If MySQL does not hit the cache, it will enter the analyzer. The analyzer is mainly used to analyze SQL statements.

The analyzer will also be divided into several steps:

1) Lexical analysis

A SQL statement is composed of multiple strings. First, we need to extract keywords, such as SELECT, the name of the table to query, the name of the field, the query conditions, and so on. After completing these operations, you will enter the second step of analysis

2) Syntax analysis

Mainly determine whether the SQL statement you entered is correct, check the grammar

After completing these two steps, MySQL is ready to start execution. But how to execute, how to execute is the best result, then enter the optimizer

4. Optimizer

The function of the optimizer is to execute the optimal execution plan of its task, such as how to choose the index when there are multiple indexes, how to choose the order of association when querying multiple tables, and so on.

5. Actuator

When the execution plan is selected, MySQL starts to prepare for execution. Before execution, it will verify whether the user has permission. If there is no permission, it will return an error message. If there is permission, it will call the engine interface and return the result of the interface execution.

SQL statement execution process

Our common and frequently written SQL statements are divided into two types: one is query, the other is update (add, update, delete)

check sentence
select * from student A where A.age='18' and A.name='张三';

Combined with the above, analyze the execution flow of this statement:

  • First check whether the statement has permission. If there is no permission, an error message will be returned directly; if there is permission, the cache will be queried before 8.0, using this SQL statement as the key to query whether there is a cache in the memory. If yes, then return the result directly; if not, proceed to the next step

  • Perform lexical analysis through the analyzer. Extract the key elements of SQL statements, such as query select, table name student, fields to be queried*, and query conditions. Then determine whether the SQL statement has a syntax error

  • The execution plan is determined by the optimizer. For the above example, there can be two execution plans:

    1. First query the student named "Zhang San" in the student table, and then determine whether the age is 18

    2. First find out the 18-year-old student among the students, and then query the student named "Zhang San"

    The optimizer will choose which plan to execute according to its own optimization algorithm

  • Perform permission verification and authentication. If there is no permission, an error message will be returned; if there is permission, the database engine interface will be called and the engine execution result will be returned.

Update statement
update student A set A.age='19' where A.name='张三';

Next, we modify Zhang San's age. In fact, this statement will basically follow the above query process, but it needs to be logged when performing the update. MySQL's own log mode binlog (archive log), all storage engines can be used, InnoDB engine comes with a redo log (redo log)

Then we use the InnoDB engine to analyze the flow of the update statement:

  • First query the data of Zhang San, if there is a cache, it is also directly returned to the cache
  • Then get the query sentence, change the age to 19, and then call the engine API interface to write this piece of data. The InnoDB engine saves the data in memory and records the redo log. At this time, the redo log enters the prepare state, and then tells the executor that the execution can be submitted at any time.
  • After receiving the notification, the executor records the binlog, then calls the engine interface, and submits the redo log as the submission state
  • update completed

Doubt 1: Why do I need to use two log modules, one not?

A log module is also possible. The redo log log is unique to the InnoDB engine, and the InnoDB engine supports transactions through the redo log . Other storage engines do not have the ability to crash-safe (even if the database is restarted abnormally, the previous submission records will not be lost), the binlog log can only be used for archiving

Doubt 2: Why are the two log modules so complicated? Why do redo log introduce the prepare pre-commit state?

If (disproval method)

  • **Scenario 1: Write redo log first and submit directly, then write binlog. **After writing the redo log, the machine suddenly has an abnormality and the binlog log is not written, then the machine will restore the data through the redo log after restarting, but the binglog does not record the data, and it will be lost when the machine data is backed up. This data, while master-slave synchronization will also be lost
  • **Scenario 2: Write binlog first, then redo log. **Assuming that the binlog is written and the machine restarts abnormally, because there is no redo log, this record cannot be restored locally, but there is this record on the binlog, and the data will be inconsistent.

If you use the redo log two-phase commit method, it will be different. After writing the binlog, submitting the redo log will prevent the above problems and ensure data consistency. The MySQL processing flow is as follows:

Judge whether the redo log is complete. If it is judged complete, submit it immediately; if the redo log is only pre-committed but not in the commit state, then judge whether the binlog is complete. If it is complete, submit the redo log, otherwise rollback. This ensures data consistency issues

to sum up
  1. MySQL is mainly divided into the Server layer and the engine layer. The Server layer mainly includes connectors, query caches, analyzers, optimizers, executors, and a log module (binlog). This log module has all execution causes, redo log Only InnoDB has

  2. The engine layer is plug-in type, currently mainly includes: MyISAM, InnoDB, etc.

  3. The execution process of the query statement is as follows: permission verification => query cache => analyzer => optimizer => permission verification => executor => storage engine

  4. The update statement execution process is as follows: analyzer => authorization verification => executor => storage engine => redo log (prepare status) => binlog => redo log (commit status)

SQL statement execution order

image

Guess you like

Origin blog.csdn.net/weixin_44723496/article/details/113103928