Mysql architecture and internal modules

Mysql architecture and internal modules
Demo environment:
MySQL 5.7
Storage engine: InnoDB

1. How is a query SQL executed?

insert image description here

For a program or tool to operate a database, the first step is to establish a connection with the database.

1. Communication protocol

First, MySQL must run a service listening on the default port (3306).
Communication protocols
MySQL supports a variety of communication protocols.
The first one is the TCP/IP protocol. The connection modules of the programming language use the TCP protocol to connect to the MySQL server, such as mysqlconnector-java-xxxx.jar.
insert image description here

The second is Unix Socket. For example, our Linux server can connect to the MySQL server without going through the network protocol. It needs to use a physical file (mysql.sock) on the server.

mysql -uroot -p123456
show variables like 'socket';

There are also named pipes (Named Pipes) and memory sharing (Share Memory).

Communication method
The second is the communication method.
insert image description here

MySQL uses half-duplex communication.
Half-duplex means that either the client sends data to the server, or the server sends data to the client, and the two actions cannot
occur at the same time.
Therefore, when the client sends an SQL statement to the server, (in one connection) the data cannot be divided into small pieces and sent, no matter how
big your SQL statement is, it is sent at one time.
If the data packet sent to the server is too large, we must adjust the value of the MySQL server configuration max_allowed_packet parameter
(the default is 4M).
insert image description here

On the other hand, for the server, all the data is sent at one time, and the operation cannot be interrupted just because you have obtained the desired data.
Therefore, we must avoid such operations without limit in the program.
Connection method
The third is to connect this piece.
MySQL supports both short and long connections. A short connection is to close it immediately after the operation is completed. The long connection can be kept open, and this connection can also be used when the program accesses later.
For long periods of inactive connections, the MySQL server will disconnect.
insert image description here

The default is 28800 seconds, 8 hours.
https://dev.mysql.com/doc/refman/5.7/en/server-system-variables.html#sysvar_interactive_timeout

The default maximum number of connections for MySQL is 151 (version 5.7), and the maximum is 16384 (2^14).

show variables like 'max_connections';

insert image description here

View the current number of connections on port 3306

netstat-an|grep3306|wc-l

Use SHOW FULL PROCESSLIST; to view the execution status of the query.
insert image description here

Some common states:
insert image description here

2. Query Cache

MySQL comes with a cache module internally. The default is off. The main reason is that the application scenarios of MySQL's built-in cache are limited. The first one is that it requires that the SQL statements must be exactly the same. The second is that when any piece of data in the table changes, all caches in this table will be invalidated.

In MySQL 5.8, the query cache has been removed.

3. Parsing and preprocessing (Parser & Preprocessor)

What are we going to do next?
If you execute a random string fkdljasklf, the server reports a 1064 error:

[Err] 1064 - You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near ‘fkdljasklf’ at line 1

How does the server know that what I typed is wrong?
Or, when I entered a SQL with perfectly correct syntax, but the table name didn't exist, how did it find out?
This is MySQL's Parser parser and Preprocessor preprocessing module.
The main thing to do in this step is to perform lexical and grammatical analysis and semantic analysis on the SQL statement.

Lexical analysis
Lexical analysis is to break a complete SQL statement into individual words.
For example, a simple SQL statement:

select name from user where id = 1;

It will break into 8 symbols, record what type each symbol is, where it starts and where it ends.

Grammatical analysis
The second step is grammatical analysis. The grammatical analysis will check the syntax of SQL, such as whether the single quotation marks are closed, and then generate a data structure according to the SQL statement according to the grammatical rules defined by MySQL. We call this data structure a parse tree.
insert image description here

If the preprocessor (Preprocessor)
has an incorrect table name, an error will be reported during preprocessor processing.
It examines the resulting parse tree, resolving semantics that the parser cannot resolve. For example, it checks for the existence of table and column names, and checks names and aliases to ensure there is no ambiguity.

4. Query Optimizer and query execution plan

What optimizer?
Question: Is there only one way to execute a SQL statement? Or is the SQL that the database finally executes the same as the SQL we sent?

The answer to this is no. A SQL statement can be executed in many ways. But if there are so many execution methods, how are these execution methods obtained? Which one to choose in the end? According to what criteria to choose?

This is the MySQL query optimizer module (Optimizer).

The purpose of the query optimizer is to generate different execution plans based on the parse tree, and then select an optimal execution plan. MySQL uses a cost-based optimizer. Whichever execution plan has the least cost is used. .

Use the following command to view the cost of the query:

show status like 'Last_query_cost';

– It means that several 4K data pages need to be randomly read to complete the lookup.

If we want to know how the optimizer works, it generates several execution plans, what is the cost of each execution plan, and what should we do?

How does the optimizer get the execution plan?
https://dev.mysql.com/doc/internals/en/optimizer-tracing.html

First we need to enable optimizer tracing (it is off by default):

SHOW VARIABLES LIKE 'optimizer_trace';
set optimizer_trace="enabled=on";

Note that turning on this switch will consume performance, because it needs to write the results of optimization analysis into the table, so don't turn it on lightly, or turn it off after checking it (change it to off).

Then we execute a SQL statement, and the optimizer will generate an execution plan:

selectt.tcidfromteachert,teacher_contacttcwheret.tcid=tc.tcid;

At this time, the optimizer analysis process has been recorded in the system table, and we can query:

select*frominformation_schema.optimizer_trace\G

insert image description here

expanded_query is the optimized SQL statement.
All execution plans are listed in considered_execution_plans.
Remember to turn it off:

setoptimizer_trace="enabled=off";
SHOWVARIABLESLIKE'optimizer_trace';

What can an optimizer do?
What types of optimization can MySQL's optimizer handle?
For example:
1. When we perform an associated query on multiple tables, which table's data is used as the reference table.
2. select * from user where a=1 and b=2 and c=3, if there are 100 results for c=3, 200 results for b=2, and 300 results for a=1, what do you think
? Which filter to perform first?
3. If there are some equations of identity or inequality in the conditions, can they be removed?
4. Query data, whether the value can be directly obtained from the index.
5. count(), min(), max(), for example, whether the value can be directly obtained from the index.
6. Others.

The result obtained by the optimizer
The optimizer will eventually turn the parse tree into a query execution plan, which is a data structure.
Of course, is this execution plan necessarily the optimal execution plan? Not necessarily, because MySQL may not cover all execution plans.
MySQL provides a tool for executing plans. We can see the execution plan information by adding EXPLAIN in front of the SQL statement.

EXPLAINselectnamefromuserwhereid=1;

5. Storage Engine

Where is our data stored? Where is the execution plan executed? Who will execute it?

Basic introduction to storage engines
In relational databases, data is placed in tables. We can understand this table as an Excel spreadsheet. Therefore, while our tables store data, we also need to organize the storage structure of the data. This storage structure is determined by our storage engine, so we can also call the storage engine a table type.

In MySQL, multiple storage engines are supported, and they can be replaced, so they are called plug-in storage engines. Why engage in so many storage engines? Isn't one enough? It is because we have different requirements for data operations in different business scenarios. These different storage engines meet our business needs by providing different storage mechanisms, indexing methods, locking levels and other functions.

View the storage engine
View the storage engine of the database table:

show table status from `training`;

insert image description here

In MySQL, each table we create can specify its storage engine. It is not a database that can only use one storage engine. Also, the storage engine can be modified after the table is created.

The path to store data in the database:

show variables like 'datadir';

Each database has its own folder, take the training database as an example.
Any storage engine has a frm file, which is the table structure definition file.

insert image description here

We built three tables in the database, using different storage engines.
Different storage engines store data in different ways and generate different files.
Comparison of storage engines
Common storage engines

Prior to MySQL 5.5, the default storage engine is MyISAM, which comes with MySQL. After version 5.5, the default storage engine was changed to InnoDB, which was developed by a third-party company for MySQL. Why change it? The main reason is that InnoDB supports transactions and row-level locks, which is more suitable for scenarios with high business consistency requirements.

The storage engine supported by the database
We can use this command to check the support of the database for the storage engine:

SHOWENGINES;

There is a description of the storage engine and support for transactions, the XA protocol, and savepoints.
insert image description here

Introduction to storage engines on the official website:
https://dev.mysql.com/doc/refman/5.7/en/storage-engines.html

MyISAM(3个文件)
These tables have a small footprint. Table-level locking limits the performance in read/write workloads, so it is often used in read-only or read-mostly workloads in Web and data warehousing configurations.

The scope of application is relatively small. Table-level locking limits read/write performance, so it is often used for read-only or read-mostly work in Web and data warehouse configurations.

Features:
1 Support table-level locks (insert and update will lock the table). Transactions are not supported.
2 has a high insertion (insert) and query (select) speed.
3 stores the number of rows in the table (count is faster).
4 Projects suitable for data analysis such as read-only.

InnoDB(2 个文件)
The default storage engine in MySQL 5.7. InnoDB is a transaction-safe (ACID compliant) storage engine for MySQL that has commit, rollback, and crash-recovery capabilities to protect user data. InnoDB row-level locking (without escalation to coarser granularity locks) and Oracle style consistent nonlocking reads increase multi-user concurrency and performance. InnoDB stores user data in clustered indexes to reduce I/O for common queries based on primary keys.To maintain data integrity, InnoDB also supports FOREIGN KEY referential-integrity constraints.

The default storage engine in mysql 5.7. InnoDB is a transaction-safe (ACID compliant) MySQL storage engine with commit, rollback, and crash recovery features to protect user data. InnoDB row-level locks (not upgraded to coarser-grained locks) and Oracle-style consistent non-locking reads improve multi-user concurrency and performance. InnoDB stores user data in clustered indexes to reduce I/O for common queries based on primary keys. To preserve data integrity, InnoDB also supports foreign key referential integrity constraints.

Features:
1. It supports transactions and foreign keys, so the integrity and consistency of data are higher.
2 Supports row-level locks and table-level locks.
3 Support concurrent reading and writing, writing does not block reading.
4 The special index storage method can reduce IO and improve query efficiency.
5 It is suitable for tables that are frequently updated, and there are business systems with concurrent reads and writes or transaction processing.

Memory (1 file)
Stores all data in RAM, for fast access in environments that require quick lookups of non critical data. This engine was formerly known as the HEAP engine. Its use cases are decreasing; InnoDB with its buffer pool memory area provides a general-purpose and durable way to keep most or all data in memory, and NDBCLUSTER provides fast key-value lookups for huge distributed data sets.
Store all data in RAM for fast lookups in environments that require fast lookup of non-critical data access. This engine was formerly known as the heap engine. Its use cases are dwindling; InnoDB and its buffer pool memory regions provide a general-purpose, durable way to keep most or all data in memory, while ndbcluster provides fast key-value lookups for large distributed datasets.

Features:
Put the data in the memory, the reading and writing speed is very fast, but if the database restarts or crashes, all the data will disappear. Only suitable for temporary tables. Hash indexes are used by default. Store the data in the table into memory.

CSV(3个文件)
Its tables are really text files with comma-separated values. CSV tables let you import or dump data in CSV format, to exchange data with scripts and applications that read and write that same format. Because CSV tables are not indexed, you typically keep the data in InnoDB tables during normal operation, and only use CSV tables during the import or export stage.

Its tables are actually text files with comma separated values. The csv table allows data to be imported or dumped in csv format for exchanging data with scripts and applications that read and write the same format. Because csv tables don't have indexes, it's common to keep data in innodb tables during normal operation, and use csv tables only during the import or export phase.

Features:
Blank lines are not allowed, and indexes are not supported. The format is common and can be edited directly, suitable for importing and exporting between different databases.

Archive (2 files) These compact, unindexed tables
are intended for storing and retrieving large amounts of seldom-referenced historical, archived, or security audit information.
Archive or security audit information.

Features:
Does not support index, does not support update delete.

6. Query Execution Engine, return result

Execution engine, which uses the storage engine to provide the corresponding API to complete the operation of the storage engine. Finally, return the data to the client, even if there is no result.

2. Summary of MySQL architecture

Layered Architecture
Overall, we can divide MySQL into three layers.
insert image description here

Detailed explanation of the module
insert image description here

1.Connector: Used to support the interaction between various languages ​​and SQL, such as PHP, Python, Java JDBC
2.Management Services & Utilities: system management and control tools, including backup and recovery, MySQL replication, clustering, etc.
3.Connection Pool : Connection pool, manage resources that need to be buffered, including user password permission thread, etc.
4.SQL Interface: used to receive the user's SQL command, and return the query result required by the user
5.Parser: used to parse the SQL statement
6.Optimizer: query Optimizer
7. Cache and Buffer: Query cache, in addition to the cache of row records, there are also table cache, key cache, permission cache and so on.
8.Pluggable Storage Engines: Pluggable storage engines, which provide APIs for the service layer to deal with specific files.

3. How is an update SQL executed?

In the database, the update operation we are talking about actually includes update, insert and delete. What is the difference between the update process and the query process?
The basic process is also the same, that is to say, it is also processed by the parser and optimizer, and finally handed over to the executor.
The difference lies in the operation after getting the qualified data.
First of all, there is a memory buffer pool (buffer pool) in InnoDB.
We do not directly write data updates to the disk every time , because the cost of IO is too high, so we write them to the buffer pool first. When the data page in the memory is inconsistent with the disk data, we call it a dirty page.
InnoDB has a special thread that writes buffer pool data to disk, and writes multiple modifications to disk at once every once in a while. This is called dirtying.
insert image description here

There is a problem here. If the server fails before the dirty page is written to the disk, the data in the memory will be lost. Or halfway through brushing dirty, it will even destroy data files. So we must have a persistent mechanism.

redo log
InnoDB introduces a log file called redo log (redo log). We write all modification operations on memory data into the log file. If there is a problem with the server, we read the data from the log file and restore Data - Use this to achieve transactional persistence.

What are the characteristics of redo log?
1. Record the modified value, which belongs to the physical log.
2. The size of the redo log is fixed, and the previous content will be overwritten, so it cannot be used for data rollback/data recovery.
3. The redo log is implemented by the InnoDB storage engine, not all storage engines have it.

binlog
MySQL Server layer also has a log file called binlog, which can be used by all storage engines.
Binlog records all DDL and DML statements in the form of events (because it records operations rather than data values, which belong to logical logs), and can be used for master-slave replication and data recovery.

master-slave replication
insert image description here

Data Recovery
insert image description here

Unlike the redo log, its file content can be appended, and there is no fixed size limit.
With these two logs, let's take a look at how an update statement is executed:
insert image description here

For example, a statement: update teacher set name='jim' where name ='666'
1. First query this data, if there is a cache, it will also use the cache.
2. Change the name to jim, then call the API interface of the engine, write this line of data to the memory, and record the redo log at the same time. At this time, the redo log enters the prepare state, and then tells the executor that the execution is complete and can be submitted at any time.
3. The executor records the binlog after receiving the notification, and then calls the storage engine interface to set the redo log to the commit state.
4. The update is complete.

Question: Why use two-phase commit (XA)?
For example:
If we change the name to jim, if the redo log is written and the bin log is not written, MySQL restarts.
Because the redo log can restore data, it is jim who writes to disk. However, the logical log is not recorded in the bin log, so at this time, if the binlog is used to restore data or synchronize to the slave database, data inconsistency will occur.
So in the case of writing two logs, binlog acts as a transaction coordinator. Notify InnoDB to execute prepare or commit or rollback.
To put it simply, there are two log writing operations, similar to distributed transactions. Without two-phase commit, there is no guarantee that both will succeed or fail.

Guess you like

Origin blog.csdn.net/lx9876lx/article/details/129128706