Chapter 04_Logical Architecture

Chapter 04_Logical Architecture

1. Logical architecture analysis

1. 1 The server processes client requests

So what does the server process do to the request sent by the client process to produce the final processing result? Here the query request is
Example display:

Insert image description here

Let’s take a closer look at it:

Insert image description here

1.2 Connectors

1.3 Layer 1: Connection Layer

Before the system (client) accesses the MySQL server, the first thing it does is to establish a TCP connection.

After the three-way handshake establishes the connection successfully, the MySQL server performs identity authentication and authority acquisition on the account password transmitted by TCP.

  • If the username or password is incorrect, you will receive an Access denied for user error and the client program will end execution.
  • After the user name and password authentication is passed, the permissions owned by the account and the connection will be found from the permission table. Subsequent permission judgment logic will depend
    on the permissions read at this time.

After the TCP connection receives the request, it must be assigned to a thread specifically to interact with this client. So there will be a thread pool to carry out the subsequent
processes. Each connection obtains a thread from the thread pool, eliminating the overhead of creating and destroying threads.

1.4 Layer 2: Service Layer

  • SQL Interface: SQL interface
    • Receive the user's SQL command and return the results that the user needs to query. For example, SELECT ... FROM is to call SQL
      Interface

    • MySQL supports multiple SQL language interfaces such as DML (data manipulation language), DDL (data definition language), stored procedures, views, triggers, and custom functions.
  • Parser: parser
    • Perform syntax analysis and semantic analysis on SQL statements in the parser. Decompose the SQL statement into a data structure and
      pass this structure to subsequent steps. The subsequent delivery and processing of SQL statements is based on this structure. If an error is encountered during decomposition
      , it means that the SQL statement is unreasonable.
    • When the SQL command is passed to the parser, it will be verified and parsed by the parser, and a syntax tree will be created for it. The
      query syntax tree will be enriched based on the data dictionary and it will be verified whether the client has the authority to execute the query. After creating the syntax tree, MySQL
      will also optimize the syntax of the SQl query and rewrite the query.
  • Optimizer: query optimizer
    • After the syntax of the SQL statement is parsed and before the query, the query optimizer is used to determine the execution path of the SQL statement and generate an execution plan.
    • This execution plan indicates which indexes should be used for query (full table retrieval or index retrieval), what is the order of connections between tables, and finally the method provided by the storage engine will be called according to the steps in the execution plan to actually execute the query, and The query results are returned to the user.
    • It uses a "select-project-join" strategy for querying. For example:
这个SELECT查询先根据WHERE语句进行选取,而不是将表全部查询出来以后再进行gender过
滤。 这个SELECT查询先根据id和name进行属性投影,而不是将属性全部取出以后再进行过
滤,将这两个查询条件连接起来生成最终查询结果。
Caches & Buffers: 查询缓存组件
MySQL内部维持着一些Cache和Buffer,比如Query Cache用来缓存一条SELECT语句的执行结
果,如果能够在其中找到对应的查询结果,那么就不必再进行查询解析、优化和执行的整个过
程了,直接将结果反馈给客户端。
这个缓存机制是由一系列小缓存组成的。比如表缓存,记录缓存,key缓存,权限缓存等 。
这个查询缓存可以在不同客户端之间共享。
从MySQL 5.7.20开始,不推荐使用查询缓存,并在MySQL 8.0中删除。

1. 5 Layer 3: Engine Layer

The plug-in storage engine layer (Storage Engines) is truly responsible for the storage and retrieval of data in MySQL, and
performs operations on the underlying data maintained
at the physical server level . The server communicates with the storage engine through APIs. Different storage engines have different functions, so
we can choose according to our actual needs.

The storage engines supported by MySQL 8.0.25 by default are as follows:

1. 6 Storage Layer

All data, database and table definitions, the contents of each row of the table, and indexes are stored on the file system and are stored in the form of files.

and complete the interaction with the storage engine. Of course, some storage engines, such as InnoDB, also support direct management of raw devices without using a file system
, but the implementation of modern file systems makes this unnecessary. Under the file system, you can use local disks and
various storage systems such as DAS, NAS, and SAN.

SELECT id,name FROM student WHERE gender = '女';
小故事:
如果我问你9+8×16-3×2×17的值是多少,你可能会用计算器去算一下,最终结果 35 。如果再问你一遍9+8×16-
3×2×17的值是多少,你还用再傻呵呵的再算一遍吗?我们刚刚已经算过了,直接说答案就好了。

1.7 Summary

The MySQL architecture diagram is shown at the beginning of this section. In order to facilitate familiarity with the SQL execution process, we can simplify it as follows:

Simplified into a three-layer structure:
1. Connection layer: The client and server establish a connection, and the client sends SQL to the server;
2. SQL layer (service layer): performs query processing on SQL statements; has nothing to do with the storage method of database files;
3. Storage engine layer: Deals with database files and is responsible for data storage and reading.

2. SQL execution process

2. 1 SQL execution process in MySQL

MySQL query process:

1. Query cache : If the Server finds this SQL statement in the query cache, it will directly return the result to the client; if not
, it will enter the parser stage. It should be noted that because query caching is often inefficient,
this function was abandoned after MySQL8.0.

Query caching is useless in most cases. Why?

Query caching caches query results in advance so that you can get the results directly without executing them next time. It should be noted that in

The query cache in MySQL does not cache the query plan, but the corresponding results of the query. This means that the robustness of query matching is greatly reduced
, and only the same query operation will hit the query cache. Any difference in characters between the two query requests (for example: spaces, comments,
case) will cause the cache to miss. Therefore, MySQL's query cache hit rate is not high.

At the same time, if the query request contains certain system functions, user-defined variables and functions, and some system tables, such as
tables in the mysql, information_schema, and performance_schema databases, then the request will not be cached. Taking some system functions
as an example, two calls of the same function may produce different results. For example, the function NOW will produce the latest current
time each time it is called. If this function is called in a query request, even if the query The requested text information is the same, so two queries at different times should also get different results. If it is cached in the first query, it will be wrong
to directly use the results of the first query in the second query.
!

In addition, since it is a cache, there will be a time when its cache becomes invalid. MySQL's cache system will monitor each table involved. As long as the
structure or data of the table is modified, if the INSERT, UPDATE, DELETE, TRUNCATE TABLE, ALTER
TABLE, DROP TABLE or DROP DATABASE statement is used on the table, use All cached queries for this table will become invalid and
removed from the cache! For databases with heavy update pressure, the hit rate of the query cache will be very low.

2. Parser : Perform syntax analysis and semantic analysis on SQL statements in the parser.

The analyzer first does "lexical analysis". What you input is an SQL statement composed of multiple strings and spaces. MySQL needs to identify
what the strings in it are and what they represent. MySQL recognizes from the "select" keyword you entered that this is a query statement
. It also needs to recognize the string "T" as "table name T" and the string "ID" as "column ID".

SELECT employee_id,last_name FROM employees WHERE employee_id = 101;
接着,要做“语法分析”。根据词法分析的结果,语法分析器(比如:Bison)会根据语法规则,判断你输
入的这个 SQL 语句是否满足 MySQL 语法。
select department_id,job_id,avg(salary) from employees group by department_id;
如果SQL语句正确,则会生成一个这样的语法树:
In the query optimizer, it can be divided into logical query optimization stage and physical query optimization stage.
4. Actuator:
Up to now, no real table has been actually read or written, only an execution plan has been produced. So we enter the executor stage.
select * from test 1 join test 2 using(ID)
where test 1 .name='zhangwei' and test 2 .name='mysql高级课程';
方案 1 :可以先从表 test 1 里面取出 name='zhangwei'的记录的 ID 值,再根据 ID 值关联到表 test 2 ,再判
断 test 2 里面 name的值是否等于 'mysql高级课程'。
方案 2 :可以先从表 test 2 里面取出 name='mysql高级课程' 的记录的 ID 值,再根据 ID 值关联到 test 1 ,
再判断 test 1 里面 name的值是否等于 zhangwei。
这两种执行方法的逻辑结果是一样的,但是执行的效率会有不同,而优化器的作用就是决定选择使用哪一个方案。优化
器阶段完成后,这个语句的执行方案就确定下来了,然后进入执行器阶段。
如果你还有一些疑问,比如优化器是怎么选择索引的,有没有可能选择错等。后面讲到索引我们再谈。
The following are the process steps of Sql lexical analysis:
3. Optimizer: The optimizer will determine the execution path of the SQL statement, such as whether it is based on full table retrieval or index retrieval.

Example: The following statement executes a join between two tables:

It is necessary to determine whether the user has permission before execution. If not, a permission error will be returned. If you have permission, execute SQL

Query and return results. In versions below MySQL 8.0, if the query cache is set, the query results will be cached.

For example: in the table test, the ID field has no index, then the execution process of the executor is as follows:

At this point, the execution of this statement is completed. For tables with indexes, the execution logic is similar.

The flow of SQL statements in MySQL is: SQL statement → query cache → parser → optimizer → executor.

select * from test where id= 1 ;
调用 InnoDB 引擎接口取这个表的第一行,判断 ID 值是不是 1 ,如果不是则跳过,如果是则将这行存在结果集中;
调用引擎接口取“下一行”,重复相同的判断逻辑,直到取到这个表的最后一行。
执行器将上述遍历过程中所有满足条件的行组成的记录集作为结果集返回给客户端。

2. 2 SQL execution principle in MySQL 8

1. Confirm whether profiling is turned on

profiling=0 means closed, we need to turn profiling on, that is, set it to 1:

2. Execute the same SQL query multiple times
Then we execute a SQL query (you can execute any SQL query):
3. View profiles

View all profiles generated by the current session:

mysql> select @@profiling;
mysql> show variables like 'profiling';
mysql> set profiling= 1 ;
mysql> select * from employees;
mysql> show profiles;  # 显示最近的几次查询
4. View profile
Display the execution plan and view the execution steps of the program:

Of course, you can also query the specified Query ID, such as:

The execution time results of query SQL are the same as above.
In addition, you can also query richer content:
continue:
mysql> show profile;
mysql> show profile for query 7 ;
mysql> show profile cpu,block io for query 6 ;
mysql> show profile cpu,block io for query 7 ;

2. 3 SQL execution principle in MySQL 5. 7

The above operation was tested in MySQL5.7, and it was found that the query process executed by the same SQL statement twice before and after is still the same. Doesn't it use
cache? Here we need to explicitly enable query cache mode. Set as follows in MySQL5.7:

1. Enable query caching in the configuration file

Add a new line in /etc/my.cnf:

2. Restart the mysql service
3. Enable query execution plan

Since the service has been restarted, you need to re-execute the following instructions to enable profiling.

4. Execute the statement twice:
5. View profiles
query_cache_type= 1
systemctl restart mysqld
mysql> set profiling= 1 ;
mysql> select * from locations;
mysql> select * from locations;
6. View profile
Display the execution plan and view the execution steps of the program:
mysql> show profile for query 1 ;
mysql> show profile for query 2 ;
The conclusion is self-evident. When executing number 2, there is a lot less information than when executing number 1. It can be seen from the screenshot that the query statement is directly read from the cache.
retrieve data.

2.4 SQL syntax order

As the MySQL version is updated, its optimizer is also constantly upgraded. The optimizer will analyze the different performance consumption caused by different execution sequences
and dynamically adjust the execution sequence.

Requirement: Query the number of people over 20 years old in each department, and the number of people over 20 years old cannot be less than 2, and display the information of the department with the largest number of people.

The following is a frequently occurring query sequence:

2.5 SQL execution process in Oracle (understanding)

Oracle uses a shared pool to determine whether a SQL statement has a cache and execution plan. Through this step, we can know whether
hard parsing or soft parsing should be used.

Let’s first take a look at the execution process of SQL in Oracle:

As can be seen from the above picture, the SQL statement has gone through the following steps in Oracle.

1. Grammar check: Check whether the SQL spelling is correct. If it is incorrect, Oracle will report a syntax error.

2. Semantic check: Check whether the access object in SQL exists. For example, when we write a SELECT statement, if the column name is written incorrectly, the system
will prompt an error. The function of syntax checking and semantic checking is to ensure that the SQL statement is error-free.

3. Permission check: Check whether the user has the permission to access the data.

4. Shared pool check: The shared pool (Shared Pool) is a memory pool whose main function is to cache SQL statements and the execution plan of the statement
.
Oracle determines whether to perform soft parsing or hard parsing by checking whether the execution plan of the SQL statement exists in the shared pool.
So how to understand soft parsing and hard parsing?

In the shared pool, Oracle first performs a hash operation on the SQL statement, and then
searches the library cache (Library Cache) according to the hash value. If there is an execution plan for the SQL statement, it is directly used for execution and directly enters the "executor" link. , this is soft parsing.

If the SQL statement and execution plan are not found, Oracle needs to create a parse tree for parsing, generate an execution plan, and enter the "optimizer"
step, which is a hard parse.

5. 优化器:优化器中就是要进行硬解析,也就是决定怎么做,比如创建解析树,生成执行计划。
6. 执行器:当有了解析树和执行计划之后,就知道了 SQL 该怎么被执行,这样就可以在执行器中执
行语句了。

Shared pool is a term in Oracle, including library cache, data dictionary buffer, etc. We have already talked about the library cache area above, which mainly
caches SQL statements and execution plans. The data dictionary buffer stores object definitions in Oracle, such as tables, views, indexes and other objects
. When parsing the SQL statement, if relevant data is needed, it will be extracted from the data dictionary buffer.

The library cache step determines whether the SQL statement needs to be hard parsed. In order to improve the execution efficiency of SQL, we should try to
avoid hard parsing, because during the execution of SQL, creating parse trees and generating execution plans consumes a lot of resources.

You may ask, how to avoid hard parsing and use soft parsing as much as possible? In Oracle, bind variables are a major feature. Binding variables
is to use variables in SQL statements to change the execution results of SQL through different variable values. The advantage of this is that it can increase
the possibility of soft parsing, but the disadvantage is that the generated execution plan may not be optimized enough, so whether binding variables is needed depends on the situation
.

For example, we can use the following query statement:

You can also use bind variables like:

The efficiency of these two query statements is completely different in Oracle. If you query player_id = 10001 and then query
data such as 10002 and 10003, then each query will create a new query resolution. The second method uses bind variables
, so after the first query, there will be an execution plan for this type of query in the shared pool, which is soft parsing.

Therefore, we can reduce hard parsing and reduce Oracle's parsing workload by using bind variables. However, this method also has disadvantages.
When using dynamic SQL, different parameters will lead to different SQL execution efficiency, and SQL optimization will be more difficult.

Oracle's architecture diagram:

SQL> select * from player where player_id = 10001 ;
SQL> select * from player where player_id = :player_id;
Simplified diagram:
summary:

Oracle and MySQL have software implementation differences in SQL queries. Oracle proposed the concept of shared pool, which is used to
determine whether to perform soft parsing or hard parsing.

3. Database buffer pool (buffer pool)

After understanding the role of the buffer pool, we also need to understand another feature of the buffer pool: read-ahead.
The function of the buffer pool is to improve I/O efficiency, and there is a "locality principle" when we read data, which means that we use some data
According to data, there is a high probability that some data around it will be used, so the "pre-read" mechanism is used to load it in advance, which can reduce possible future disk I/О operations.
InnoDB存储引擎是以页为单位来管理存储空间的,我们进行的增删改查操作其实本质上都是在访问页
面(包括读页面、写页面、创建新页面等操作)。而磁盘 I/O 需要消耗的时间很多,而在内存中进行操
作,效率则会高很多,为了能让数据表或者索引中的数据随时被我们所用,DBMS 会申请占用内存来作为
数据缓冲池,在真正访问页面之前,需要把在磁盘上的页缓存到内存中的Buffer Pool之后才可以访
问。
这样做的好处是可以让磁盘活动最小化,从而减少与磁盘直接进行 I/O 的时间。要知道,这种策略对提
升 SQL 语句的查询性能来说至关重要。如果索引的数据在缓冲池里,那么访问的成本就会降低很多。

3. 1 Buffer pool vs query cache

Are buffer pools and query cache the same thing? no.
1. Buffer Pool
首先我们需要了解在 InnoDB 存储引擎中,缓冲池都包括了哪些。
在 InnoDB 存储引擎中有一部分数据会放到内存中,缓冲池则占了这部分内存的大部分,它用来存储各种
数据的缓存,如下图所示:
从图中,你能看到 InnoDB 缓冲池包括了数据页、索引页、插入缓冲、锁信息、自适应 Hash 和数据字典
信息等。
缓存池的重要性:
缓存原则:
“位置 * 频次”这个原则,可以帮我们对 I/O 访问效率进行优化。
首先,位置决定效率,提供缓冲池就是为了在内存中可以直接访问数据。
其次,频次决定优先级顺序。因为缓冲池的大小是有限的,比如磁盘有 200 G,但是内存只有 16 G,缓冲
池大小只有 1 G,就无法将所有数据都加载到缓冲池里,这时就涉及到优先级顺序,会优先对使用频次高
的热数据进行加载。
缓冲池的预读特性:
After understanding the role of the buffer pool, we also need to understand another feature of the buffer pool: read-ahead.
The function of the buffer pool is to improve I/O efficiency, and there is a "locality principle" when we read data, which means that we use
After using some data, there is a high probability that some data around it will also be used. Therefore, using the "pre-read" mechanism to load in advance can reduce future
Possible disk I/О operations.
2. Query cache
So what is query cache?
Query caching caches query results in advance so that you can get the results directly without executing them next time. It should be noted that in

The query cache in MySQL does not cache the query plan, but the corresponding results of the query. Because the hit conditions are strict, and as long as the data table
changes, the query cache will become invalid, so the hit rate is low.

3. 2 How does the buffer pool read data?

The buffer pool manager will try its best to save frequently used data. When the database reads a page, it will first determine the page.
Whether it is in the buffer pool, if it exists, it will be read directly. If it does not exist, the page will be stored in the buffer pool through memory or disk and then processed.
row read.
The structure and role of cache in the database are shown in the figure below:
If we update the data in the cache pool when executing a SQL statement, will the data be synchronized to the disk immediately?

3. 3 View/set buffer pool size

If you are using the InnoDB storage engine, you can check the buffer pool size by looking at the innodb_buffer_pool_size variable
. The command is as follows:

You can see that the buffer pool size of InnoDB at this time is only 134217728 / 1024 / 1024 = 128 MB. We can modify the buffer pool size, for example
to 256 MB, as follows:

show variables like 'innodb_buffer_pool_size';
set global innodb_buffer_pool_size = 268435456 ;
or:
Then take a look at the modified buffer pool size, which has been successfully modified to 256 MB:

3.4 Multiple Buffer Pool instances

This indicates that we want to create 2 Buffer Pool instances.

Let’s see how to check the number of buffer pools, using the command:

So how much memory space does each Buffer Pool instance actually occupy? In fact, it is calculated using this formula:

That is, the total size is divided by the number of instances, and the result is the size occupied by each Buffer Pool instance.

3.5 Extended questions

The Buffer Pool is a very core component of the MySQL memory structure. You can first imagine it as a black box.

Update data process under black box

[server]
innodb_buffer_pool_size = 268435456
[server]
innodb_buffer_pool_instances = 2
show variables like 'innodb_buffer_pool_instances';
innodb_buffer_pool_size/innodb_buffer_pool_instances
An error suddenly occurred while I was updating. I want to roll back to the version before the update. What should I do? Guarantee of data persistence and transaction recovery
How can we talk about crash recovery if we can’t even do it?

Answer: Redo Log & Undo Log

Guess you like

Origin blog.csdn.net/github_36665118/article/details/134139170