[Slow SQL Performance Optimization] The life cycle of a SQL


1. The execution process of a simple SQL in MySQL

A simple diagram illustrates the components of the MySQL architecture and the relationship between them. Next, I will use SQL statements to analyze it.

For example, the following SQL statement

  
  
  
  
  
SELECT department_id FROM employee WHERE name = 'Lucy' AND age > 18 GROUP BY department_id

Name is the index. Let's analyze it in chronological order.

1. Client : Such as MySQL command line tool, Navicat, DBeaver or other applications to send SQL queries to the MySQL server.

2. Connector : Responsible for establishing connections with clients, managing connections, and maintaining connections. When a client connects to a MySQL server, the connector verifies the client's username and password and then allocates a thread to handle the client's request.

3. Query cache : The query cache is used to cache previously executed queries and their results. When a new query request is received, MySQL first checks whether the same query and its results already exist in the query cache. If there are matching query results in the query cache, MySQL will directly return the cached results without executing the query again. However, if there are no matching query results in the query cache, MySQL will continue executing the query.
4. Analyzer :
  • Parse the query statement and check the syntax.
  • Verify table and column names are correct.
  • Generate query tree.
5. Optimizer : Analyze the query tree, consider various execution plans, estimate the costs of different execution plans, and select the best execution plan. In this example, the optimizer may choose to use the name index for the query because name is the index column.
6. Executor : According to the execution plan selected by the optimizer, send a request to the storage engine to obtain data rows that meet the conditions.
7. Storage engine (such as InnoDB ) :
  • Responsible for actually executing index scans, such as performing equivalent queries on the name index of the employee table. Querying all columns involves returning to the table and accessing the disk.
  • Before accessing the disk, first check whether the required data page already exists in the InnoDB buffer pool (Buffer Pool). If there are qualified data pages in the buffer pool, the cached data is used directly. If the required data page is not in the buffer pool, load the data page from disk into the buffer pool.
8.Actuator :
  • For each found record, determine again whether the record satisfies the index condition name. This is because the data page loaded into the memory based on the index condition may also contain records that do not meet the index condition, so the name condition must be judged again. If the name condition is met, the age > 18 filter condition will continue to be judged.
  • Group records that meet the condition based on department_id.
  • The executor returns the processed result set to the client.
Throughout query execution, these components work together to execute the query efficiently. The client is responsible for sending the query, the connector manages the client connection, the query cache attempts to reuse previous query results, the parser is responsible for parsing the query, the optimizer selects the best execution plan, the executor executes the plan selected by the optimizer, the storage engine (such as InnoDB) Responsible for managing data storage and access. The synergy of these components allows MySQL to efficiently execute queries and return result sets.
The operation of loading index data pages into memory according to the index column filter conditions is performed by the storage engine. After loading into memory, the executor will judge the filter conditions of index columns and non-index columns.

2. Query the execution order of SQL keywords

The execution sequence is as follows:
1. Operation of storage engine
(1) FROM: Data table used to query SQL. The executor will obtain the data of the relevant tables from the storage engine according to the execution plan selected by the optimizer.
(2) ON: Used with JOIN to specify connection conditions. The executor will obtain records matching the conditions from the storage engine according to the conditions given by ON. If the join condition involves an indexed column, the storage engine will use the index for optimization.
(3) JOIN: Specify the connection method between tables (such as INNER JOIN, LEFT JOIN, etc.). The executor will obtain the connection table data from the storage engine according to the execution plan selected by the optimizer. Then the executor processes the data connection based on the JOIN connection type and ON connection conditions.
(4) WHERE: The executor filters the data returned from the storage engine and only retains records that meet the conditions of the WHERE clause. If the filter condition has an index, the storage engine layer will filter it through the index and return it.
2. Operations on returned result sets
(5) GROUP BY: The executor groups records that meet the WHERE condition according to the columns specified by GROUP BY.
(6) HAVING: After executing the grouping, the executor filters the grouped records again according to the HAVING condition.
(7) SELECT: The executor obtains query results based on the execution plan and specified columns selected by the optimizer.
(8) DISTINCT: The executor deduplicates the query results and only returns unique records.
(9) ORDER BY: The executor sorts the query results according to the columns specified in the ORDER BY clause.
(10) LIMIT: The executor truncates the query results according to the restrictions specified in the LIMIT clause and only returns part of the records.
3. The execution process of table association query SQL in MySQL
  
  
  
  
  
SELECT s.id, s.name, s.age, es.subject, es.score FROM employee s JOIN employee_score es ON s.id = es.employee_id WHERE s.age >18 AND es.subject_id =3 AND es.score >80;
In this example, subject_id and score are joint indexes, and age is the index. Let’s analyze it in chronological order
1. Connector : When a client connects to the MySQL server, the connector is responsible for establishing and managing the connection. It verifies the username and password provided by the client, determines that the client has the appropriate permissions, and then establishes the connection.
2. Query cache : MySQL server will check the query cache before processing the query. If the result set already exists in the query cache, the server will directly return the results in the cache.
3. Parser : Parse and check the SQL syntax correctness. The parser breaks the query statement into its component parts, such as tables, columns, conditions, etc. In this example, the parser identifies the tables involved ( employee and employee_score ) and the required columns ( id, name, age, subject, score ).
4.优化器 根据解析器提供的信息生成执行计划。 优化器会分析多种可能的执行策略,并选择成本最低的策略。 在这个示例中,优化器会选择 age 索引和 subject_id score 的联合索引。 对于连接操作,优化器还要决定连接策略,例如是否使用 Nested-Loop Join Hash Join 等一些连接策略。 优化器还会根据表的大小、索引、查询条件和统计信息来决定哪张表作为驱动表,以及选择最佳的连接策略。 例如,如果两个表的大小差异很大, Nested-Loop Join 可能是一个好的选择,而对于大小相似的两个表, Hash Join Sort-Merge Join 可能更加高效。
5.执行器 根据优化器生成的执行计划执行查询,向存储引擎发送请求,获取满足条件的数据行。
6.存储引擎(如InnoDB 管理数据存储和检索。 存储引擎首先接收来自执行器的请求,该请求可能是基于优化器的执行计划。
  • 存储引擎首先接收来自执行器的请求。请求可能包括获取满足查询条件的数据行,以及使用哪种扫描方法(如全表扫描或索引扫描)。
  • 假设执行器已经决定使用索引扫描。在这个示例中,存储引擎可能会先对employee表进行索引扫描(使用age索引),然后对employee_score表进行索引扫描(使用subject_id和score的联合索引)。
  • 存储引擎会根据请求查询相应的索引。在employee索引中会找到满足age > 18条件的记录。在employee_score索引中找到满足subject_id = 3 AND score > 80条件的记录。
  • 一旦找到了满足条件的记录,存储引擎需要将这些记录所在的数据页从磁盘加载到内存中。存储引擎首先检查缓冲池(InnoDB Buffer Pool),看这些数据页是否已经存在于内存中。如果已经存在,则无需再次从磁盘加载。如果不存在,存储引擎会将这些数据页从磁盘加载到缓冲池中。
  • 加载到缓冲池中的记录可以被多个查询共享,这有助于提高查询效率。
7.执行器 :处理连接、排序、聚合、过滤等操作。
  • 在内存中执行连接操作,将employee表和employee_score表的数据行连接起来。
  • 对连接后的结果集进行过滤,只保留满足查询条件(age > 18、subject_id = 3、score > 80)的数据行。
  • 将过滤后的数据行作为查询结果返回给客户端。

前面说过,根据存储引擎根据索引条件加载到内存的数据页有多数据,可能有不满足索引条件的数据,如果执行器不再次进行索引条件判断, 则无法判断哪些记录满足索引条件的,虽然在存储引擎判断过了,但是在执行器还是会有索引条件age > 18、subject_id = 3、score > 80的判断。

我们再以 全局视野 来分析 一下
1.确定驱动表 : 首先, MySQL 优化器会选择一个表作为"驱动表"。 通常,返回记录数较少的表会被选为驱动表。 假设 employee_score 表中满足 subject_id = 3 AND score > 80 条件的记录数量较少,那么这张表可能被选为驱动表。 这是优化器的工作,它预估哪个表作为驱动表更为高效,制定执行计划。 虽然驱动表的选择很大程度上是基于预估的返回记录数,但实际选择还会受其他因素影响,例如表之间的连接类型、可用的索引等。
2.使用驱动表的索引进行筛选 : 优化器会首先对驱动表进行筛选。 如果 employee_score 是驱动表,优化器会使用 subject_id score 的联合索引来筛选出 subject_id = 3 AND score > 80 的记录。 这是执行器按照优化器的计划向存储引擎发出请求,获取需要的数据。 存储引擎负责访问索引,并根据索引定位到实际的数据页,从而获取数据行。
3.连接操作 : 执行器会基于上一步从驱动表中筛选出的记录对另一个表(即 employee 表)进行连接。 这时,执行器会使用 employee 表上的索引(如 id 索引)来高效地找到匹配的记录。
4.一步的筛选 : 在连接的过程中,执行器会考虑 employee 表的其他筛选条件,如 age > 18 ,通常连接后才过滤筛选,这也是执行器的工作,执行器在连接过程中或之后,根据优化器制定的计划进一步筛选结果集。 但是这里 employee 表的 age 索引其叶子节点包含 age 和主键 id 信息,在进行连接时,可以直接按照 age 范围扫描该索引,利用其叶子节点中的 id 信息进行高效的 JOIN 操作,因此在连接时就完成筛选,这个过程由 MySQL 优化器自动完成。 从上面可以看到,当存在可以被利用的索引时, MySQL 可以在连接过程中执行这些过滤操作。
5.返回结果 : 这是执行器最后的步骤,返回最终的查询结果。
四、总结
本文采用一张简单的架构图说明了MySQL查询中使用的组件和组件间关系。
解析了一条sql语句从客户端请求mysql服务器到返回给客户端的整个生命周期流程。
列举了单表sql、关联表sql 两种不同SQL在整个生命周期中的执行顺序以及内部组件逻辑关系。
通过如上案例的解析可以让开发者们掌握到单表sql、关联表sql的底层sql知识,为理解慢sql的产生和优化鉴定基础。
-end-

本文分享自微信公众号 - 京东云开发者(JDT_Developers)。
如有侵权,请联系 [email protected] 删除。
本文参与“OSC源创计划”,欢迎正在阅读的你也加入,一起分享。

阿里云严重故障,全线产品受影响(已恢复) 俄罗斯操作系统 Aurora OS 5.0 全新 UI 亮相 汤不热 (Tumblr) 凉了 多家互联网公司急招鸿蒙程序员 .NET 8 正式 GA,最新 LTS 版本 UNIX 时间即将进入 17 亿纪元(已进入) 小米官宣 Xiaomi Vela 全面开源,底层内核为 NuttX Linux 上的 .NET 8 独立体积减少 50% FFmpeg 6.1 "Heaviside" 发布 微软推出全新“Windows App”
{{o.name}}
{{m.name}}

Guess you like

Origin my.oschina.net/u/4090830/blog/10143833