MySQL principles and optimization (1) MySQL architecture and execution process

MySQL architecture and execution process

MySQL architecture summary

MySQL internal modules:

Insert image description here

1. Connector:用来支持各种语言与SQL的交互。
2. Management Services & Utilities:系统管理和控制工具,包括备份恢复、MySQL复制、集群等。
3. Connection Pool:连接池,管理需要缓冲的资源,包括用户密码权限线程等。
4. SQL Interface:用来接受用户的SQL命令,返回用户需要的查询结果。
5. Parser:用来解析SQL语句。
6. Optimizer:查询优化器。
7. Cache and Buffer:查询缓存,除了行记录的缓存之外,还有表缓存,key缓存,权限缓存等。
8. Pluggable Storage Engines:插件式存储引擎,它提供API给服务层使用,更具体的文件打交道。

Architecture layering

Divide MySQL into three layers:

  • Connection layer: Connect with the client.
  • Service layer: perform operations.
  • Storage engine layer: dealing with hardware

Insert image description here

How a query SQL statement is executed

Insert image description here

1. Connect

Communication protocol: MySQL supports a variety of communication protocols. It can use synchronous/asynchronous methods and supports long connections/short connections.

2. Caching

Query cache: MySQL cache is turned off by default. Usually it is done by an ORM framework or an independent cache server.

3. Parser

  • Lexical parsing: Lexical parsing is to break a complete SQL statement into individual words.
  • Syntax parsing: Grammar parsing performs some syntax checks on SQL, and then generates a data structure - a parsed number based on the SQL statement according to the syntax rules defined by MySQL. Any database middleware, such as Mycat, must have lexical and syntactic analysis functions.

4. Preprocessor

Preprocessor: Check the generated parse tree and resolve semantics that the parser cannot parse. For example, it will check whether the table and column names exist, and check the names and aliases to ensure there are no ambiguities.

5. Query optimizer, query execution plan

Query optimizer: Based on the execution plan generated by the parse tree, then select an optimal execution plan. The cost-based optimizer used in MySQL uses whichever execution plan has the smallest cost.

  • Query execution plan: The optimizer finally turns the parse tree into a query execution plan, which is a data structure.

6. Storage engine

While MySQL stores data, it also organizes the storage structure of the data. This storage structure is determined by our storage engine. Therefore, the storage engine is also called a table structure.
In MySQL, multiple storage engines are supported, and they are replaceable, so they are called plug-in storage engines.

Common storage engines: MyISAM, InnoDB

MyISAM

The scope of application is relatively small. Table-level locking limits read/write performance, so in web and data warehouse configurations, it is usually used for read-only or read-focused work.

Features:

  • Supports table-level locking (insert and update locked tables). Transactions are not supported.
  • Has high insertion and query (select) speeds.
  • Stores the number of rows in the table (count is faster).
  • Suitable for projects such as read-only data analysis. (System operation log)

InnoDB

InnoDB is a transaction-safe MySQL storage engine with commit, rollback, and crash recovery features to protect user data. Row-level locks improve user concurrency and performance. InnoDB stores user data in clustered indexes to reduce I/O for common queries based on primary keys. In order to ensure data integrity, InnoDB also supports the application of integrity constraints on foreign keys.

Features:

  • Supports transactions and foreign keys, so data integrity and consistency are higher.
  • Supports row-level locks and table-level locks.
  • Supports concurrent reading and writing, and non-blocking writing and reading (MVCC).
  • The special index storage method can reduce I/O and improve query efficiency.
  • Suitable for: tables with frequently updated rows, business systems with concurrent reading and writing or transaction processing.

7. Execute the engine and return the results

The storage engine stores the form of data, and the execution engine uses the response API provided by the storage engine to complete the operation.

How an update SQL statement is executed

The update process is basically the same as the query process. It also needs to be processed by the parser and optimizer, and finally handed over to the executor. The difference lies in the operation after obtaining the data that meets the conditions.

1. Buffer pool

First of all, InnoDB data is placed on the disk. InnoDB operating data has a smallest logical unit, called a page (index page and data page). For data operations, we do not directly operate the disk every time because the disk speed is too slow.
InnoDB uses a buffer pool technology, which puts pages read from the disk into a memory area. This memory area is called Buffer Pool.
The next time the same page is read , first determine whether it is in the cache pool. If so, read it directly without accessing the disk again.
When modifying data, first modify the pages in the buffer pool. When the memory data page is inconsistent with the data on the disk, we call it a dirty page . InnoDB has a dedicated background thread to write the data of the Buffer Pool to the disk. It writes multiple modifications to the disk at once every once in a while. This action is called dirtying.

2. InnoDB memory structure and disk structure

Buffer Pool is a very important structure in InnoDB. It is divided into several areas internally.
Insert image description here

memory structure

Buffer Pool

Buffer Pool caches page information, including data pages and index pages. The default size is 128M and can be adjusted.
What should I do if the memory buffer pool is full? (What should I do if the memory set by Redis is full?) InnoDB uses the LRU algorithm to manage the buffer pool (linked list implementation, not traditional LRU, divided into young and old), and the eliminated data is hot data.

The memory buffer plays a great role in improving read and write performance. When a data page needs to be updated , if the data page exists in the Buffer Pool, then it can be updated directly. Otherwise, you need to load it from disk to memory, and then operate on the data pages in memory. In other words, if the cache pool is not hit, at least one IO must be generated. Is there any way to optimize it?

Change Buffer write cache

If this data page is not a unique index and there is no data duplication, there is no need to load the index page from the disk to determine whether the data is duplicated (uniqueness check). In this case, the modifications can be recorded in the memory buffer pool first, thereby improving the execution speed of update statements (Insert, Delete, Update).
This area is the Change Buffer.
The final operation of recording the Change Buffer to the data page is called merge. When does the merge occur? There are several situations: it is triggered when accessing this data page, or through a background thread, or when the database is shut down or the redo log is full.
If most of the indexes in the database are non-unique indexes, and the business involves more writing and less reading, and the data will not be read immediately after writing, you can use Change Buffer (write cache).

Adaptive Hash Index
(redo)Log Buffer

If the database crashes or restarts before the dirty pages in the Buffer Pool have been flushed to disk, the data will be lost. If the write operation is half-written, the data file may even be damaged and the database will become unavailable.
In order to avoid this problem, InnoDB writes all modification operations to the page specifically to a log file, and performs recovery operations from this file when the database starts (implementing crash-safe) and uses it to achieve transaction durability.
This file is the redo log of the disk (called redo log), corresponding to ib_logfile0 and ib_logfile1 in the /var/lib/mysql/ directory, each 48M. The entire process of matching logs and disks is actually the WAL technology (Write-Ahead Logging)
in MySQL . The key point is to write the log first and then write to the disk.

Question : The same thing is writing to the disk, why not write directly to the db file? Why write the log first and then write to the disk?
Random I/O and sequential I/O:
The smallest unit of disk is a sector, usually 512 bytes.
The operating system deals with memory, and the smallest unit is the page.
The operating system deals with the disk, reads and writes the disk, and the smallest unit is the block.

If the data we need is randomly scattered in different sectors on different pages, then to find the corresponding data, we need to wait until the magnetic arm rotates to the specified page, and then the disk finds the corresponding sector before we can find the piece we need. Data, do this process at one time until all the data is found. This is random IO, and the speed of reading data is slow.
Suppose we find the first piece of data, and the other required data is behind this piece of data, then there is no need to re-address, we can get the data we need in sequence, this is called sequential IO.
Flashing the disk is random IO, while logging is sequential IO. Sequential IO is more efficient, so writing modifications to the log first can delay the flushing opportunity, thereby improving system throughput.
Of course, the redo log is not written directly to the disk every time. There is a memory area (Log Buffer) in the Buffer Pool specifically used to save the data of the upcoming log file. The default is 16M, which can also save disk IO.

Note: The contents of the redo log are mainly used for crash recovery. The data file of the disk comes from the Buffer pool. The redo log is written to the disk, not to the data file.

So when does the Log Buffer write to the log file?
When we write data to disk, the operating system itself has a cache. Flush is to write the operating system cache area to disk. The timing of writing the log buffer to disk is controlled by a parameter, and the default is 1.

SHOW V ARIABLES LIKE 'innodb_flush_log_at_trx_commit';

Insert image description here
Insert image description here
Features:

  • The redo log is implemented by the InnoDB storage engine, and not all storage engines have it.
  • It does not record the status of the data page after it is updated, but records the changes made to this page, which is a physical log.
  • The size of the redo log is fixed, and the previous content will be overwritten.

Insert image description here
The check point is the current position to be covered. If the Writer Pos overlaps with the check point, it means that the redo log is full. At this time, the redo log needs to be synchronized to the disk.

disk structure

The table space can be regarded as the highest level of the logical structure of the InnoDB storage engine. All data is stored in the table space. InnoDB's table spaces are divided into 5 categories.

1. System tablespace

By default, there is a shared table space (/var/lib/mysql/ibdata1) in the InnoDB storage engine, also called the system table space.
The InnoDB system table space contains the InnoDB data dictionary and double-write buffer , Change Buffer and Undo Logs . If
file-per-table is not specified, it also contains user-created table and index data.

  • Data dictionary: composed of internal system tables that stores metadata (definition information) for tables and indexes.
  • Double write cache (a major feature of InnoDB):

The page size of InnoDB is inconsistent with the page size of the operating system. The InnoDB page size is generally 16K, and the operating system page size is 4K. When InnoDB pages are written to disk, one page needs to be written four times.
Insert image description here
If the storage engine crashes while writing pages of data to the disk, it may happen that only part of the page is written, for example, only 4K is written, and then crashes. This situation is called partial page write (partial page write). , may result in data loss.

So before redo log, a copy of the page is needed . If a write failure occurs, use a copy of the page to restore the page, and then apply redo log. The copy of this page is double write, InnoDB's double write technology. Through it, the reliability of data pages is achieved.

Like redo log, double write consists of two parts, one is memory double write, and the other is double write on disk. Because double write is written sequentially, it does not cause a lot of overhead.

By default, all tables share a system table space. This file will grow larger and larger, and its space will not shrink.

Exclusive tablespaces (file-per-table tablespaces)

We can let each table occupy its own table space. This switch is set through innodb_file_per_table and is enabled by default.
After it is turned on, each table will open up a table space. This file is the ibd file in the data directory (for example: /var/lib/mysql/gupao/user_innodb.ibd), which stores the index and data of the table.
However, other types of data, such as rollback (undo) information, inserted cache index pages, system transaction information, double write buffer, etc., are still stored in the original shared table space.

general tablespaces

The general table space is also a shared table space, more similar to ibdata1.
You can create a general table space to store tables from different databases, and the data paths and files can be customized. grammar:

create tablespace ts2673 add datafile '/var/lib/mysql/ts2673.ibd' file_block_size=16K engine=innodb;

You can specify the table space when creating a table, and use ALTER to modify the table space to transfer the table space.

create table t2673(id integer) tablespace ts267

Data in different table spaces can be moved. To delete a table space, you need to delete all tables in it first.

Temporary tablespaces

Stores data in temporary tables, including temporary tables created by users and internal temporary tables on disk. Corresponds to the ibtmp1 file in the data directory. When the data server shuts down normally, the table space is deleted and regenerated next time.

redo log
undo log tablespaces

The undo log (undo log or rollback log) records the data status before the transaction occurred (excluding select). If an exception occurs when modifying data, you can use undo log to implement the rollback operation (to maintain atomicity).
When executing undo, the data is only logically restored to the state before the transaction, rather than operating on the physical page. Logs in logical format.

Redo log and undo log are closely related to transactions and are collectively called transaction logs.
The data of undo log is in the system table space ibdata1 file by default. Because the shared table space will not automatically shrink, you can also create a separate undo table space.

With these logs, let's summarize the update process. (The original value of name is wang)

update user set name = 'wg' where id = 1;
  1. When the transaction starts, this data is fetched from the memory or disk and returned to the Server's executor;
  2. The executor modifies the value of this row of data to wg;
  3. Record name = wang to undo log;
  4. Record name = wg to redo log;
  5. Call the storage engine interface and modify name = wg in the memory (Buffer Pool);
  6. Transaction commit.

3. Binlog

Binglog records the DDL and DML statements used in the form of events (because it records operations rather than data values, it is a logical log) and can be used for master-slave replication and data recovery.
Unlike redo log, its file content can be appended and there is no fixed size limit.
When the binglog function is turned on, we can export the binglog into a SQL statement and replay all operations to achieve data recovery.
Another function of binglog is to implement master-slave replication. Its principle is to read the binglog of the master server from the slave server and then execute it again.

With these two logs, let's take a look at the execution of the update statement:
Insert image description here

update teacher set name='盆鱼宴' where id=1;
  1. Query this data first, and if there is a cache, the cache will be used;
  2. Change the name to Fish Feast, then call the engine's API interface, write this row of data to the memory, and record the redo log. At this time, the redo log enters the prepare state, and then tells the executor that the execution is completed and can be submitted at any time.
  3. After receiving the notification, the executor records the binlog, then calls the storage engine interface and sets the redo log to the commit state.
  4. update completed.

Guess you like

Origin blog.csdn.net/baidu_41934937/article/details/108737659