源码剖析SQL语句在MySQL服务层的执行过程

MySQL 基本结构

服务层
- 连接器
- 解析器，优化器，执行器
引擎层
- InnoDB, Memory, etc.

MySQL Profiling

想知道 SQL 语句的执行过程，可以直接调试 MySQL 源码，但也有更方便的方式。

开启 SQL 语句性能分析。

mysql> set profiling = 'ON';
mysql> show variables like 'profiling';
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| profiling     | ON    |
+---------------+-------+
1 row in set, 1 warning (0.00 sec)
复制代码

查看历史语句分析记录，可以通过设置profiling_history_size更改保留的语句数量。

mysql> show profiles;
+----------+------------+-----------------------------------+
| Query_ID | Duration   | Query                             |
+----------+------------+-----------------------------------+
|       22 | 0.00195650 | show variables like '%profiling%' |
|       23 | 0.00035650 | set profiling_history_size = 3    |
|       24 | 0.00084950 | select * from profiling           |
+----------+------------+-----------------------------------+
3 rows in set, 1 warning (0.00 sec)
复制代码

查看相应语句的执行足迹，还有一些其余信息为了方便展示没有列出。

mysql> use information_schema;
mysql> select STATE,SOURCE_FUNCTION,SOURCE_FILE from profiling where QUERY_ID = 24 order by SEQ desc;
+--------------------------------+--------------------------------+----------------------+
| STATE                          | SOURCE_FUNCTION                | SOURCE_FILE          |
+--------------------------------+--------------------------------+----------------------+
| cleaning up                    | dispatch_command               | sql_parse.cc         |
| freeing items                  | dispatch_sql_command           | sql_parse.cc         |
| closing tables                 | mysql_execute_command          | sql_parse.cc         |
| query end                      | mysql_execute_command          | sql_parse.cc         |
| end                            | Sql_cmd_dml::execute           | sql_select.cc        |
| executing                      | Query_expression::ExecuteItera | sql_union.cc         |
| preparing                      | JOIN::optimize                 | sql_optimizer.cc     |
| statistics                     | JOIN::optimize                 | sql_optimizer.cc     |
| optimizing                     | JOIN::optimize                 | sql_optimizer.cc     |
| System lock                    | mysql_lock_tables              | lock.cc              |
| init                           | Sql_cmd_dml::execute           | sql_select.cc        |
| Opening tables                 | open_tables                    | sql_base.cc          |
| checking permissions           | check_access                   | sql_authorization.cc |
| starting                       | launch_hook_trans_begin        | rpl_handler.cc       |
| Executing hook on transaction  | launch_hook_trans_begin        | rpl_handler.cc       |
| starting                       | NULL                           | NULL                 |
+--------------------------------+--------------------------------+----------------------+
16 rows in set, 1 warning (0.00 sec)
复制代码

SQL 语句执行流程

查询缓存

历史执行的 SQL 语句和执行结果会以 Key:Value 的形式存入查询缓存中，如果新的 SQL 语句能够在查询缓存中取得结果就直接返回。但是查询缓存失效频繁，每当表上有更新时，就会清空表上的查询缓存，所以一般不开启查询缓存功能。

MySQL 8.0 移除了查询缓存，所以这部分当作补充知识了解就好了。

解析器

根据 profiling 的指示，解析的代码实现在 sql_parse.cc 文件中，入口是 dispatch_command 函数，主要功能是构造 THD 和 LEX 两个结构。

THD

代码中对于 THD 结构的说明如下。

/**
  @class THD
  For each client connection we create a separate thread with THD serving as
  a thread/connection descriptor
*/
复制代码

MySQL会为每个客户端连接创建一个线程，THD 结构为线程描述符，简单理解就是保存好多变量的地方，方便传参。

构造 THD 的核心代码如下。

bool THD::sql_parser() {
  extern int MYSQLparse(class THD * thd, class Parse_tree_root * *root);

  Parse_tree_root *root = nullptr;
  MYSQLparse(this, &root)
      
  lex->make_sql_cmd(root)
}
复制代码

LEX

代码中对于 LEX 结构的说明如下。

/**
  The LEX object currently serves three different purposes:

  - It contains some universal properties of an SQL command, such as
    sql_command, presence of IGNORE in data change statement syntax, and list
    of tables (query_tables).

  - It contains some execution state variables, like m_exec_started
    (set to true when execution is started), plugins (list of plugins used
    by statement), insert_update_values_map (a map of objects used by certain
    INSERT statements), etc.

  - It contains a number of members that should be local to subclasses of
    Sql_cmd, like purge_value_list (for the PURGE command), kill_value_list
    (for the KILL command).

  The LEX object is strictly a part of class Sql_cmd, for those SQL commands
  that are represented by an Sql_cmd class. For the remaining SQL commands,
  it is a standalone object linked to the current THD.
*/
复制代码

Parse_tree_root 结构的作用是构造 Sql_cmd 结构，该结构保存在 LEX::m_sql_cmd 字段中。

bool LEX::make_sql_cmd(Parse_tree_root *parse_tree) {
  m_sql_cmd = parse_tree->make_cmd(thd);
}
复制代码

简单理解，LEX 就是 SQL 的语法树，你可能已经观察到了，THD 和 LEX 彼此互相包含。

优化器

通过 profiling 我们可以看到下一步是走到了 mysql_execute_command 函数，这是一个足足有2000行的函数。~~（果然单函数行数不超过XX行都是忽悠人的）~~

化繁为简，这个2000行的函数最后会调用 lex->m_sql_cmd->execute(thd) 来执行命令，也正是对应了 profiling 给出的 Sql_cmd_dml::execute 函数，不难发现这里还发生了个多态。（~~C++基础捡起来~~）

这里给出 execute 的核心代码。

/**
  Execute a DML statement.
  This is the default implementation for a DML statement and uses a
  nested-loop join processor per outer-most query block.
  The implementation is split in two: One for query expressions containing
  a single query block and one for query expressions containing multiple
  query blocks combined with UNION.
*/

bool Sql_cmd_dml::execute_inner(THD *thd) {
  Query_expression *unit = lex->unit;

  if (unit->optimize(thd, /*materialize_destination=*/nullptr,
                     /*create_iterators=*/true, /*finalize_access_paths=*/true))
    return true;

  // Calculate the current statement cost.
  accumulate_statement_cost(lex);

  // Perform secondary engine optimizations, if needed.
  if (optimize_secondary_engine(thd)) return true;

  // We know by now that execution will complete (successful or with error)
  lex->set_exec_completed();
  if (lex->is_explain()) {
    if (explain_query(thd, thd, unit)) return true; /* purecov: inspected */
  } else {
    if (unit->execute(thd)) return true;
  }

  return false;
}
复制代码

简要的叙述下都做了什么：

优化器优化，unit->optimize 最后会调用 JOIN::optimize 函数，也正是 profiling 指示的。
计算成本。
如果需要，执行辅助引擎优化。
真正的执行命令。

我们这一节的重点就在 JOIN::optimize 函数上，这里给出该函数的注释说明。

/**
  Optimizes one query block into a query execution plan (QEP.)

  This is the entry point to the query optimization phase. This phase
  applies both logical (equivalent) query rewrites, cost-based join
  optimization, and rule-based access path selection. Once an optimal
  plan is found, the member function creates/initializes all
  structures needed for query execution. The main optimization phases
  are outlined below:

    -# Logical transformations:
      - Outer to inner joins transformation.
      - Equality/constant propagation.
      - Partition pruning.
      - COUNT(*), MIN(), MAX() constant substitution in case of
        implicit grouping.
      - ORDER BY optimization.
    -# Perform cost-based optimization of table order and access path
       selection. See JOIN::make_join_plan()
    -# Post-join order optimization:
       - Create optimal table conditions from the where clause and the
         join conditions.
       - Inject outer-join guarding conditions.
       - Adjust data access methods after determining table condition
         (several times.)
       - Optimize ORDER BY/DISTINCT.
    -# Code generation
       - Set data access functions.
       - Try to optimize away sorting/distinct.
       - Setup temporary table usage for grouping and/or sorting.

  @retval false Success.
  @retval true Error, error code saved in member JOIN::error.
*/
复制代码

简单理解流程如下：

我们自己写的 SQL 可能并不是效率最好的，优化器会把它转换成等价的效率更高的方式，比如去除某些不必要的条件，外连接转化为内连接等等。
通过查询条件计算表的最佳连接顺序，例如是先从 A 表取出某个字段，然后去 B 表匹配，还是反过来。

其实优化器是要是优化多表查询的，非多表查询优化空间不大。

执行器

还是根据 profiling 的指引，check_access、open_tables、JOIN::optimize 先后发生在 Sql_cmd_dml::execute 上，并不是严格的按照 profiling 给出的函数 one by one 执行，不过其实也不用太过纠结函数之前的嵌套关系问题。

这里给出 check_access 的注释说明。

/**
  @brief Compare requested privileges with the privileges acquired from the
    User- and Db-tables.
*/
复制代码

如果你用过 MySQL，那应该会知道 Grant / Revoke 相关的命令，这里就是判断当前用户是否有相应权限，补充一个知识点，权限是在建立连接时从权限表中获取的，在该连接存活期内的所有鉴权都根据建立连接时获取的权限判断。

如果鉴权成功，那么就会调用表关联的存储引擎接口打开表（创建表时可以指定存储引擎，默认是InnoDB存储引擎），通过 profiling 不难看出还调用了 mysql_lock_tables 去锁表。

launch_hook_trans_begin 函数中会使用 RUN_HOOK 宏函数去调用 Trans_delegate::trans_begin 钩子函数。

int launch_hook_trans_begin(THD *thd, TABLE_LIST *all_tables) {
    // ...
	RUN_HOOK(transaction, trans_begin, (thd, ret));
    // ...
}

int Trans_delegate::trans_begin(THD *thd, int &out) {
	// ...
    FOREACH_OBSERVER_ERROR_OUT(ret, begin, &param, out);
    // ...
}
复制代码

这里的 begin 是一个回调函数，函数签名如下。

/**
  This callback is called before a sql command is executed.

  @param param   The parameter for transaction observers
  @param out_val Return value from observer execution

  @retval 0 Success
  @retval 1 Failure
*/
typedef int (*begin_t)(Trans_param *param, int &out_val);
复制代码

该回调函数会在 SQL 命令执行前被调用，那真正的执行在哪呢，又回到了上文给出的 execute_inner 函数中，一路往里走，最终获取到结果的函数签名如下。

bool Query_expression::ExecuteIteratorQuery(THD *thd);
复制代码

根据该函数的参数和返回值不难判断出，最后的结果是保存在 THD 结构体中的，THD 是一个十分庞大贯穿始终的结构体。

它的定义足足有三千多行，并且 THD 对象所占用的内存直到连接断开才会被释放，这也正是有时候 MySQL 占用内存涨的特别快的原因。

Ending

本文以 SELECT 为例介绍了 SQL 语句在 MySQL 服务层的执行过程，最后调用存储引擎的接口获得结果，如果你对存储引擎的概念还不明确，可以理解为一个提供了增删改查接口的存储服务。

本文参考：①MySQL 8.0 源码；②《云数据库架构》；③MySQL 实战 45 讲。