Mysql Advanced (1) Mysql Execution Process and Architecture

How is a query executed?

select id from student;

​ The simple query statement above queries the id from the student table. So how does Mysql translate this statement into the instructions it needs to get the data and return it to the client? Here is a rough flow chart

image-20220713165549179

​ First of all, the data is stored on the Mysql server side. Applications or tools are clients. To read the database, the first step is to establish a connection and communication with the database. So how do the client communicate with the server?

letter of agreement

​ Mysql supports a variety of communication protocols, including TCP/IP protocol, Unix Socket protocol, Share Memory protocol, Named Pipes protocol, can use synchronous/asynchronous mode, and supports long connection/short connection

Communication type: synchronous or asynchronous

Synchronize

1. 同步通信依赖于被调用方,受限于被调用方的性能,也就是是说,应用操作数据库,线程会阻塞,等待数据返回
2. 一般只能做到一对一,很难做到一对多的通信

asynchronous

  1. Asynchrony can avoid application blocking and waiting, but it cannot calculate the execution time of SQL.
  2. If there is concurrency in asynchrony, each SQL execution must establish a separate link to avoid data confusion, but this will bring huge pressure to the server (a connection is a thread, and thread switching takes up a lot of CPU resources), and asynchronous communication It also brings coding complexity (ps: it has recently been overwhelmed by the company's Node.js project (asynchronous io)), so it is generally not recommended. If you use async, you must use a connection pool, queuing to get a connection from the connection pool instead of creating a connection

Generally speaking, we use synchronous connection to connect to the database.

Connection mode: long connection or short connection

Mysql supports both long connection and short connection. Generally speaking, it is a long connection, and it will put this long connection in the connection pool of the client.

On the server side, you can use the show status command to view how many connections Mysql currently has

show global status like 'Thread%'

image-20220713172916527

Every time a connection or a session is generated, a thread will be created on the server side to handle it.

field meaning
Threads_cached Number of threaded connections in cache
Threads_connected current number of open connections
Threads_created Number of threads created to handle connections
Threads_running The number of connections in non-sleeping state, usually refers to the number of concurrent connections

Keeping a long connection will consume memory. If the connection is inactive for a long time, the Mysql server will automatically disconnect. The default time is 8 hours, 28800 seconds

show global variables like 'wait_timeout' --非交互式超时时间,如JDBC程序
show global variables like 'interactive_timeout' --交互式超时时间,如数据库工具navicat

image-20220713174022203

The maximum number of connections allowed by the Mysql service is 15 by default, and the maximum can be set to 10,000.

show variables like 'max_connections'

image-20220713183251988

**Parameter level description: **Parameters in Mysql are divided into session and global levels, which are effective in the current session and globally, but not every parameter has two levels. For example, max_connections only has a global level. When there is no parameter, the default is the session level, including query and modification. For example, after modifying a parameter, the query has taken effect in this window, but it does not take effect in other windows. This is the common way to submit modification transactions

show variables like 'autocommit';
set autocommit = on; --默认为session级别,只在当前窗口有效

letter of agreement

  1. TCP/IP protocol

Usually we connect to MySQL through this protocol, and various major programming languages ​​implement connection modules according to this protocol.mysql -h xx.xx.xx.xx -u root -p

  1. Unix Socket protocol

Usually we use this protocol to log in to the MySQL server, because a physical file is required to connect to MySQL using this protocol. The storage location of the file is defined in the configuration file. The default address is /var/lib/mysql/mysql.sock, which is worth mentioning Surprisingly, this is the most efficient of all protocols.

  1. Share Memory agreement

Most people don’t know about this protocol, and it’s definitely not used, because it can only be used by windows. To use this protocol, you need to use the –shared-memory parameter in the configuration file at startup. Note that, using this protocol, only There can be a server, so this thing is generally useless, unless you suspect that other protocols cannot work normally. In fact, Microsoft's SQL Sever also supports this protocol.

  1. Named Pipes protocol

This protocol can only be used by windows. Like shared memory, using this protocol, there can only be one server on a host, even if different ports are used. Named Pipes is a protocol developed for LAN. A portion of memory is used by one process to pass information to another process, so the output of one process is the input of another process. The second process can be local (on the same computer as the first process) or remote (on a networked computer). Because of this, if there is no or disabled TCP/IP environment in your environment, and it is a windows server, then your database can still work anyway. To use this protocol, you need to add the --enable-named-pipe option at startup

way of communication

  • **Simplex:** When two computers communicate, the data transmission is one-way, similar to a remote control
  • **Half-duplex:** Between two computers, data transmission is two-way, you can send it to me, and I can also send it to you, but in this communication connection, only one server can communicate at the same time. Send data, that is, the data cannot be sent at the same time. I need to wait for the data you sent to me before I can continue to send the data. It is analogous to the walkie-talkie in life.
  • **Full duplex: **Data transmission is two-way, and can be transmitted at the same time, analogous to mobile phone calls

Mysql uses a half-duplex communication method . In a connection, either the client sends data to the server, or the server sends data to the client. These two actions cannot occur at the same time, so the client sends SQL statements to the server. At the end, (in one connection) the data cannot be divided into small pieces for transmission, no matter how big the sql statement is, it is sent at one time. If the statement is too long in a query, the query process may cause problems. At this time, max_allowed_packetthe value of the Mysql server configuration parameter must be adjusted, and if it is increased, an error will be reported.

image-20220816152100900

On the other hand, for the server, all the data is sent at one time, and the operation cannot be interrupted just because you have obtained the desired data, which will consume a lot of network and memory at this time. Therefore, in the program, it is necessary to avoid such operations without limit. For example, to find out all the data that meets the conditions at one time, it must be counted first. If the amount of data is large, it can be queried in batches.

query cache

It has been removed in mysql8.0 and is not recommended to use. It is recommended to hand over the cache to an independent cache server for operation.

Lexical grammar analysis and preprocessing (parser, preprocessor)

When mysql query enters some wrong query statements, the server will report a 1064 error, so how does mysql know that the input content is wrong? This is mainly done by Mysql's Parser parser and Preprocessor preprocessing module

image-20220816153102331

For example, a simple query statement select name from user where id = 1will be split into 8 symbols, including "=". Then the grammar analysis will do some grammar checks on the SQL statement, such as whether the quotation marks are closed, etc. After the lexical and grammar analysis is completed, Mysql will parse the SQL statement into a new data structure according to the defined grammar rules. We will use this data structure as called parse tree (select_lex)

image-20220829162352256

If a lexical syntax is correct SQL, but the table name or field name does not exist, when parsing the SQL statement, the preprocessor will check the generated parse tree to resolve the semantics that the parser cannot parse, for example, it will check the table and column names Existence, check name and alias, ensure there is no ambiguity

Query optimizer and query execution plan

What is an optimizer

The parse tree is a data structure that can be recognized by the executor. Does a SQL statement have only one execution method? Or is the SQL finally executed by the database the same as the SQL sent? The answer is no, a SQL statement can be executed in multiple ways, and ultimately return the same result, they are equivalent. But if there are so many execution methods, how are these execution methods obtained? Which one to choose in the end? And by what criteria do you choose?

This is the MYSQL query optimizer module (Optimizer).

The purpose of the query optimizer is to generate different execution plans (Execution Plan) according to the parse tree, and then select an optimal execution plan. MySQL uses a cost-based optimizer. Which execution plan has the least cost? Just use whichever.

-- 可以使用这个命令查看查询的开销
show status like "Last_query_cost"

what an optimizer can do

What optimization types can MySQL's optimizer handle? Give two simple examples:

  1. When we perform an associated query on multiple tables, which table's data is used as the reference table (which table is accessed first).
  2. Which index to choose when there are multiple indexes available.
  3. For the optimization of query conditions, such as removing identities such as 1=1, removing unnecessary parentheses, calculation of expressions, optimization of subqueries and join queries.

So what is the result after being processed by the optimizer, and how does MySQL deal with it?

Optimizer execution result

The optimizer will eventually turn the parse tree into an execution plan (execution_plans), which is also a data structure. Of course, this execution plan is not necessarily the optimal execution plan, because MySQL may not cover all execution plans. We can view the execution plan information by adding EXPLAIN in front of the SQL statement. The key is the index used, if you want to get detailed information, you can also useFORMAT = JSON

 explain select name from user_innodb where id = 1;
 explain format = json select name from user_innodb where id = 1;

image-20220830163130696

storage engine

basic introduction

While storing data on the table, it is also necessary to organize the storage structure of the data. This storage structure is determined by our storage engine. In MySQL, each table we create can specify its storage engine, instead of a database can only use one storage engine, the storage engine is used in units of tables, we can also call the storage engine a table type , and the storage engine can be modified after creation.

View the storage engines of existing tables in the database

show table status from <db>

image-20220830172318269

By default, each database has a folder, and any storage engine has a frm file, which is a table structure definition file. Different storage engines store data in different ways and generate different files. The innodb storage engine is a file ending with ibd, the memory storage engine does not, and the myisam storage engine contains two files, one ends with MYD and the other ends with MYI.

Database data file directory

 show variables like 'datadir';
-- 默认目录为/var/lib/mysql

Storage Engine Comparison

We can use this command to check the database's support for the storage engine

show engines;

image-20220830180723914

There is a description of the storage engine and support for transactions, the XA protocol, and savepoints. The XA protocol is used to implement distributed transactions (divided into local resource managers and transaction managers). Savepoints are used to implement sub-transactions (nested transactions). After creating a Savepoints, the transaction can be rolled back to this point without affecting the operations before the Savepoints were created

  • MyISAM

    The storage engine that comes with MySQL is upgraded from ISAM. The official website document https://dev.mysql.com/doc/refman/8.0/en/myisam-storage-engine.html, the scope of application is relatively small, table-level locking limits the performance of reading/writing, so in the Web and data warehouse In configuration, it is usually used for read-only or read-mostly work

  • InnoDB

    The default storage engine after MySQL 5.7 is suitable for tables that are frequently updated, and there are concurrent reads and writes or business systems with transaction processing. InnoDB is a transaction-safe (ACID-compatible) MySQL storage engine that has commit, rollback, and crash recovery. features to protect user data, InnoDB row-level locks (not upgraded to coarser-grained locks) and Oracle-style consistent reads improve multi-user concurrency and performance; InnoDB tables arrange data on disk to optimize queries based on primary keys , each InnoDB table has a primary key index called a clustered index, which organizes data to minimize I/O for primary key lookups; in order to ensure data integrity, InnoDB supports FOREIGN KEY constraints, which use foreign keys to check inserts, updates, and Delete to ensure they do not cause inconsistencies between related tables. Official website documentation https://dev.mysql.com/doc/refman/8.0/en/innodb-storage-engine.html

    features

    1. Supports transactions and foreign keys, so data integrity and consistency are higher.
    2. Supports row-level locks and table-level locks.
    3. Support read and write concurrency, write without blocking read (MVCC).
    4. The special index storage method can reduce IO and improve query efficiency.
  • Memory

    Store all data in RAM for fast access in an environment that needs to quickly find non-critical data. Put the data in the memory, and the reading and writing speed is very fast, but if the database restarts or crashes, all the data will disappear. It is only suitable for doing Temporary table, which stores the data in the table in memory, uses hash index by default.

  • CSV (3 files)

    Its tables are actually text files with comma-separated values. CSV tables allow data to be imported or dumped in CSV format for exchanging data with scripts and applications that read and write the same format. Because CSV tables have no indexes, they are usually in Keep data in Innodb tables during normal operation and only use CSV tables during the import or export phase.

    Features: Blank lines are not allowed, and indexes are not supported. The format is common and can be edited directly, suitable for importing and exporting between different databases

  • Archive (2 files)

    These compact, unindexed tables are used to store and retrieve large amounts of rarely referenced historical, archive, or security audit information.

    **Features:** Does not support index, does not support update delete.

Different storage engines provide different features. They have different storage mechanisms, indexing methods, locking levels and other functions.

In different business scenarios, we can choose different storage engines to meet our needs depending on the requirements of the database. This is why mysql supports so many storage engines.

How to choose a storage engine

If you have high requirements for data consistency and need to support transactions, you can choose InnoDB.

If there are more data queries and less updates, and the query performance requirements are relatively high, you can choose MyISAM.

If you need a temporary table for query, you can choose Memory.

If all storage engines cannot meet the requirements, you can develop a storage engine in the internal manual of the official website

How is an update SQL executed?

What is the difference between the update process and the query process? The basic process is the same, that is to say, it also needs to be processed by the parser, the optimizer, and finally handed over to the executor. The difference lies in the operation after getting the qualified data

buffer pool

First of all, for Innodb, the data is placed on the disk. To operate the data, the storage engine must first load the data in the disk into the memory before it can operate.

There is a question here, how much data do we load from the disk into the memory at a time, do we take as much as we need? Compared with memory operations, disk I/O operations are quite time-consuming. If the data we need is distributed in different places on the disk, it means that many I/O operations will be generated.

Therefore, both the operating system and the storage engine have a concept of pre-reading. When a piece of data on the disk is read, it is very likely that the location near it will also be read immediately. This is called locality Principle . In this way, we simply read more data each time we read, instead of reading as much as we need.

InnoDB sets a minimum unit for reading data from disk to memory, called a page, which is similar to the page of the operating system. The page size of the operating system is generally 4K, while in InnoDB, the page size is 16Kb, which is a Logical unit, if you need to modify the size of this value, you need to modify the source code and recompile and install.

Imagine that for the operation of the data page, the disk is directly operated every time and loaded from the disk to the memory. Will this be slow? Can these pages be cached to speed up the loading of data.

InnoDB uses a buffer pool technology, that is, puts the page read from the disk into a memory area. The next time you read the same page, first determine whether it is in this memory area. If so, read it directly. , and then operate without loading from disk again, this memory area is called Buffer Pool

When modifying data, first modify the pages in the memory buffer pool. When the data pages in the memory are inconsistent with the disk data, we call them dirty pages. When will the dirty pages be synchronized to the disk?

InnoDB has a special background to write the data of the Buffer Pool to the disk, and write multiple modifications to the disk at one time every once in a while. This action is called dirtying

InnoDB memory structure and disk structure

image-20230506170646449

The memory structure mainly includes Buffer Pool, Change Buffer, Log Buffer AHI.

  1. Buffer Pool

    The page information is cached in the Buffer Pool. The default size is 128M, which can be modified and adjusted.

  2. (redo)Log Buffer

    ​ Because flushing is not real-time, if the dirty pages in the BufferPool have not been flushed to the disk, the database is down or restarted, and the data will be lost. The data in the memory must have a persistent measure. In order to avoid this problem, InnoDB writes all page modification operations to a log file. If there is data that has not been synchronized to the disk, the database will recover from this log file when it starts (to achieve crash-safe). The transaction we are talking about The D (persistence) in ACID is implemented with it.

    ​ This log file is the redo log of the disk (called redo log), which corresponds to ib_logfile0 and ib_logfile1 in the /var/lib/mysql/ directory. There are two files by default, each 48M.show variables like 'innodb_log'

    1. It is also written to the disk, why not directly write to the db file, but write the log first and then write to the disk? What is the difference between writing a log file and writing to a data file?

      The data is scattered in different sectors, and writing to the db file requires a disk operation, which is random I/O, while logging is sequential I/O (written continuously), and sequential I/O is more efficient , so writing the modification to the log file can delay the timing of flushing the disk while ensuring the security of the memory data, thereby improving the system throughput.

    Features:

    1. Redo log is implemented by InnoDB storage engine, not all storage engines have it, support for crash recovery is a feature of InnoDB.
    2. Redo log is a physical log, which records "what modifications have been made on a certain data page"
    3. The size of the redo log is fixed, and the previous content will be overwritten. Once it is full, it will trigger the synchronization of the buffer pool to the disk to make room for subsequent modifications.
  3. undo log tablespace

    The undo log (undo log or rollback log) records the data state (excluding select) before the transaction occurs. If an exception occurs when modifying the data, the undo log can be used to implement the rollback operation (to maintain atomicity).

    When executing undo, it only logically restores the data to the state before the transaction, rather than operating on the physical page. It belongs to the logical format of the log. The data of the undo log is in the system tablespace Ibdata1 file by default, because The shared table bow and arrow will not shrink automatically, and an undo tablespace can also be created separately.

With these logs, let's summarize the flow of an update operation. For exampleupdate user set name = 'telangpu' where name = "baideng";

  1. When the transaction starts, the memory (buffer pool) or disk (data file) fetches this data and returns it to the server's executor;
  2. The executor of the server modifies the value of this line of data to telangpu;
  3. Record name=baideng to undo log;
  4. Record name=telangpu to redo log
  5. Call the storage engine interface to modify the data in the memory Buffer pool
  6. transaction commit

The background thread works, refreshes the data in the memory pool and refreshes the modified data to the disk.

Binlog

To deal with the log files of the Innodb architecture, the Server layer of Mysql also has a log file called binlog, which can be used by all storage engines.

Binlog records all DDL and DML statements in the form of events. It records operations rather than data. Binlog can be used for master-slave replication and data recovery. Unlike redo log logs, binlog log files can be appended. There is no fixed size limit.

When the binlog function is turned on, we can export the binlog into sql statements, and re-execute all operations to achieve data (archive) recovery

Guess you like

Origin blog.csdn.net/Hong_pro/article/details/130533081