MySQL · 源码分析 · 一条insert语句的执行过程

一序

本文只分析了insert语句执行的主路径，和路径上部分关键函数，很多细节没有深入。

二 SQL_parse

我们略过建立连接（其实是还没整理这一块），看sql解析部分。因为在执行execute command之前，一个SQL query首先会被放入mysql_parse()进行语法解析，源码在mysql-5.7.18/blob/master/sql/sql_parse.cc：

/**
  Parse a query.
  @param       thd     Current thread
  @param       rawbuf  Begining of the query text
  @param       length  Length of the query text
  @param[out]  found_semicolon For multi queries, position of the character of
                               the next query in the query text.
*/

void mysql_parse(THD *thd, Parser_state *parser_state)
{
  int error MY_ATTRIBUTE((unused));
  DBUG_ENTER("mysql_parse");
  DBUG_PRINT("mysql_parse", ("query: '%s'", thd->query().str));

  DBUG_EXECUTE_IF("parser_debug", turn_parser_debug_on(););

  /*
    Warning.
    The purpose of query_cache_send_result_to_client() is to lookup the
    query in the query cache first, to avoid parsing and executing it.
    So, the natural implementation would be to:
    - first, call query_cache_send_result_to_client,
    - second, if caching failed, initialise the lexical and syntactic parser.
    The problem is that the query cache depends on a clean initialization
    of (among others) lex->safe_to_cache_query and thd->server_status,
    which are reset respectively in
    - lex_start()
    - mysql_reset_thd_for_next_command()
    So, initializing the lexical analyser *before* using the query cache
    is required for the cache to work properly.
    FIXME: cleanup the dependencies in the code to simplify this.
  */
  mysql_reset_thd_for_next_command(thd);
  lex_start(thd);

  thd->m_parser_state= parser_state;
  invoke_pre_parse_rewrite_plugins(thd);
  thd->m_parser_state= NULL;

  enable_digest_if_any_plugin_needs_it(thd, parser_state);
  /* 检查query_cache，如果结果存在于cache中，直接返回 */
  if (query_cache.send_result_to_client(thd, thd->query()) <= 0)
  {
    LEX *lex= thd->lex;
    const char *found_semicolon;

    bool err= thd->get_stmt_da()->is_error();

    if (!err)
    {
       /* 解析语句 */
      err= parse_sql(thd, parser_state, NULL);
      if (!err)
        err= invoke_post_parse_rewrite_plugins(thd, false);

      found_semicolon= parser_state->m_lip.found_semicolon;
    }
  ...

   /* Actually execute the query */
          if (found_semicolon)
          {
            lex->safe_to_cache_query= 0;
            thd->server_status|= SERVER_MORE_RESULTS_EXISTS;
          }
          lex->set_trg_event_type_for_tables();
          MYSQL_QUERY_EXEC_START(
            const_cast<char*>(thd->query().str),
            thd->thread_id(),
            (char *) (thd->db().str ? thd->db().str : ""),
            (char *) thd->security_context()->priv_user().str,
            (char *) thd->security_context()->host_or_ip().str,
            0);
          if (unlikely(thd->security_context()->password_expired() &&
                       !lex->is_set_password_sql &&
                       lex->sql_command != SQLCOM_SET_OPTION &&
                       lex->sql_command != SQLCOM_ALTER_USER))
          {
            my_error(ER_MUST_CHANGE_PASSWORD, MYF(0));
            error= 1;
          }
          else
             //关键函数
            error= mysql_execute_command(thd, true);

          MYSQL_QUERY_EXEC_DONE(error);
	}
      }
    }
 else
  {
    /*
      Query cache hit. We need to write the general log here if
      we haven't already logged the statement earlier due to --log-raw.
      Right now, we only cache SELECT results; if the cache ever
      becomes more generic, we should also cache the rewritten
      query-string together with the original query-string (which
      we'd still use for the matching) when we first execute the
      query, and then use the obfuscated query-string for logging
      here when the query is given again.
    */
    thd->m_statement_psi= MYSQL_REFINE_STATEMENT(thd->m_statement_psi,
                                                 sql_statement_info[SQLCOM_SELECT].m_key);
    if (!opt_general_log_raw)
      query_logger.general_log_write(thd, COM_QUERY, thd->query().str,
                                     thd->query().length);
    parser_state->m_lip.found_semicolon= NULL;
  }

  DBUG_VOID_RETURN;
}

代码很长，这里关注点两个：

mysql_execute_command。在这之前就是parse_sql。
主要在parse_sql之前，有一个函数query_cache.send_result_to_client()，这个函数用来检查query cache。如果之前有一个一模一样的query（包括参数哦），那么其结果有可能已经缓存了，如果是这样，就直接返回结果(就是>0走else 逻辑：query hint cache)。

关于sql如何解析：r大概MySQL对query的parse是用BISON（sql_yacc.yy）来做的，实际上就是根据MySQL的语法规则，把SQL进行解析存在LEX结构体中，这里不做展开，待单独整理（shardingjdbc的sql解析也没看，有兴趣的可以自己对比下mysql的解析跟它的差异点）。我们接下来看看sql如何执行mysql_execute_command（）。

三 SQL执行（mysql_execute_command）

源码也在/sql/sql_parse.cc,这是一个超大的函数，常见的操作都在switch里面了。

/**
  Execute command saved in thd and lex->sql_command.
  @param thd                       Thread handle
  @todo
    - Invalidate the table in the query cache if something changed
    after unlocking when changes become visible.
    @todo: this is workaround. right way will be move invalidating in
    the unlock procedure.
    - TODO: use check_change_password()
  @retval
    FALSE       OK
  @retval
    TRUE        Error
*/

int
mysql_execute_command(THD *thd, bool first_level)
{
  int res= FALSE;
  LEX  *const lex= thd->lex;
  /* first SELECT_LEX (have special meaning for many of non-SELECTcommands) */
  SELECT_LEX *const select_lex= lex->select_lex;
  /* first table of first SELECT_LEX */
  TABLE_LIST *const first_table= select_lex->get_table_list();
  /* list of all tables in query */
  TABLE_LIST *all_tables;
  /* most outer SELECT_LEX_UNIT of query */
  SELECT_LEX_UNIT *const unit= lex->unit;
  DBUG_ASSERT(select_lex->master_unit() == unit);
  DBUG_ENTER("mysql_execute_command");
  /* EXPLAIN OTHER isn't explainable command, but can have describe flag. */
  DBUG_ASSERT(!lex->describe || is_explainable_query(lex->sql_command) ||
              lex->sql_command == SQLCOM_EXPLAIN_OTHER);

  thd->work_part_info= 0;
....

case SQLCOM_PREPARE:
  {
    mysql_sql_stmt_prepare(thd);
    break;
  }
  case SQLCOM_EXECUTE:
  {
    mysql_sql_stmt_execute(thd);
    break;
  }
....
case SQLCOM_INSERT:
  case SQLCOM_REPLACE_SELECT:
  case SQLCOM_INSERT_SELECT:
  {
    DBUG_ASSERT(first_table == all_tables && first_table != 0);
    DBUG_ASSERT(lex->m_sql_cmd != NULL);
    res= lex->m_sql_cmd->execute(thd);
    break;
  }

以跟Prepare Statement相关的地方，找两个常见的case：

mysql_sql_stmt_prepare 就是sql预编译绑定参数：

Prepare an SQL prepared statement. This is called from mysql_execute_command and should therefore behave like an ordinary query (e.g. should not reset any global THD data).

In case of success, OK packet is sent to the client, otherwise an error message is set in THD.

最后的结果都是在THD中。任何的数据结构想要访问query解析的结果，其都应该包含THD

详情参见官方文档介绍：https://dev.mysql.com/doc/refman/5.7/en/mysql-stmt-prepare.html

源码在sql/sql_prepare.cc

mysql_sql_stmt_execute 真正的执行

Execute prepared statement using parameter values from lex->prepared_stmt_params and send result to the client using text protocol. This is called from mysql_execute_command and therefore should behave like an ordinary query (e.g. not change global THD data, such as warning count, server status, etc). This function uses text protocol to send a possible result set.

In case of success, OK (or result set) packet is sent to the client, otherwise an error is set in THD.

官网介绍：https://dev.mysql.com/doc/refman/5.7/en/mysql-stmt-execute.html

源码在sql/sql_prepare.cc

好了，回到正题，看下插入sql的分支.

case SQLCOM_INSERT:
  case SQLCOM_REPLACE_SELECT:
  case SQLCOM_INSERT_SELECT:
  {
    DBUG_ASSERT(first_table == all_tables && first_table != 0);
    DBUG_ASSERT(lex->m_sql_cmd != NULL);
    res= lex->m_sql_cmd->execute(thd);
    break;
  }

这里跟mysql 5.6版本的不一致，没有直接到mysql_insert.

一下子断了联系，搜了下先看看m_sql_cmd是Sql_cmd类型的，是个抽象接口，具体的执行有子类实现

找到了Sql_cmd_insert，嗯就是它了，看下源码在sql/sql_insert.h，具体函数如下：

bool Sql_cmd_insert::execute(THD *thd)
{
  DBUG_ASSERT(thd->lex->sql_command == SQLCOM_REPLACE ||
              thd->lex->sql_command == SQLCOM_INSERT);

  bool res= false;
  LEX *const lex= thd->lex;
  SELECT_LEX *const select_lex= lex->select_lex;
  TABLE_LIST *const first_table= select_lex->get_table_list();
  TABLE_LIST *const all_tables= first_table;

  if (open_temporary_tables(thd, all_tables))
    return true;

  if (insert_precheck(thd, all_tables))
    return true;

  /* Push ignore / strict error handler */
  Ignore_error_handler ignore_handler;
  Strict_error_handler strict_handler;
  if (thd->lex->is_ignore())
    thd->push_internal_handler(&ignore_handler);
  else if (thd->is_strict_mode())
    thd->push_internal_handler(&strict_handler);

  MYSQL_INSERT_START(const_cast<char*>(thd->query().str));
  res= mysql_insert(thd, all_tables);
  MYSQL_INSERT_DONE(res, (ulong) thd->get_row_count_func());
...

终于找到mysql_insert了。进入 mysql_insert 源码在sql/sql_insert.cc

/**
  INSERT statement implementation
  @note Like implementations of other DDL/DML in MySQL, this function
  relies on the caller to close the thread tables. This is done in the
  end of dispatch_command().
*/

bool Sql_cmd_insert::mysql_insert(THD *thd,TABLE_LIST *table_list)
{
  DBUG_ENTER("mysql_insert");

  LEX *const lex= thd->lex;
  int error, res;
  bool err= true;
  bool transactional_table, joins_freed= FALSE;
  bool changed;
  bool is_locked= false;
  ulong counter= 0;
  ulonglong id;
....
 if ((res= table_list->view_check_option(thd)) == VIEW_CHECK_SKIP)
      continue;
    else if (res == VIEW_CHECK_ERROR)
    {
      error= 1;
      break;
    }
    error= write_record(thd, insert_table, &info, &update);
    if (error)
      break;
    thd->get_stmt_da()->inc_current_row_for_condition();
  }
  } // Statement plan is available within these braces

  error= thd->get_stmt_da()->is_error();
  free_underlaid_joins(thd, select_lex);
  joins_freed= true;
....

这个函数也很长（貌似mysql里面的函数都挺长的），insert相关的filed，value都在thd里面。没有细看过程。

进入 write_record，源码也在sql/sql_insert.cc

/**
  Write a record to table with optional deletion of conflicting records,
  invoke proper triggers if needed.
  SYNOPSIS
     write_record()
      thd   - thread context
      table - table to which record should be written
      info  - COPY_INFO structure describing handling of duplicates
              and which is used for counting number of records inserted
              and deleted.
      update - COPY_INFO structure describing the UPDATE part (only used for
               INSERT ON DUPLICATE KEY UPDATE)
  @note
  Once this record is written to the table buffer, any AFTER INSERT trigger
  will be invoked. If instead of inserting a new record we end up updating an
  old one, both ON UPDATE triggers will fire instead. Similarly both ON
  DELETE triggers will be invoked if are to delete conflicting records.
  Call thd->transaction.stmt.mark_modified_non_trans_table() if table is a
  non-transactional table.
  RETURN VALUE
    0     - success
    non-0 - error
*/

int write_record(THD *thd, TABLE *table, COPY_INFO *info, COPY_INFO *update)
{
  int error, trg_error= 0;
  char *key=0;
  MY_BITMAP *save_read_set, *save_write_set;
  ulonglong prev_insert_id= table->file->next_insert_id;
  ulonglong insert_id_for_cur_row= 0;
  MEM_ROOT mem_root;
  DBUG_ENTER("write_record");

  /* Here we are using separate MEM_ROOT as this memory should be freed once we
     exit write_record() function. This is marked as not instumented as it is
     allocated for very short time in a very specific case.
  */
  init_sql_alloc(PSI_NOT_INSTRUMENTED, &mem_root, 256, 0);
  info->stats.records++;
  save_read_set=  table->read_set;
  save_write_set= table->write_set;

  info->set_function_defaults(table);

  const enum_duplicates duplicate_handling= info->get_duplicate_handling();

  if (duplicate_handling == DUP_REPLACE || duplicate_handling == DUP_UPDATE)
  {
       DBUG_ASSERT(duplicate_handling != DUP_UPDATE || update != NULL);
       while ((error=table->file->ha_write_row(table->record[0])))
      {
      。。。。省略400行
      //处理 INSERT ON DUPLICATE KEY UPDATE 等复杂情况
  }
  else if ((error=table->file->ha_write_row(table->record[0])))
  {
    DEBUG_SYNC(thd, "write_row_noreplace");
    info->last_errno= error;
    myf error_flags= MYF(0);
    if (table->file->is_fatal_error(error))
      error_flags|= ME_FATALERROR;
    table->file->print_error(error, error_flags);
    /*
      If IGNORE option is used, handler errors will be downgraded
      to warnings and don't  have to stop the iteration.
    */
    if (thd->is_error())
      goto before_trg_err;
    table->file->restore_auto_increment(prev_insert_id);
    goto ok_or_after_trg_err;
  }
。。。。

调用存储引擎的接口 ha_write_row. 源码在sql/handler.cc

int handler::ha_write_row(uchar *buf)
{
  int error;
  Log_func *log_func= Write_rows_log_event::binlog_row_logging_function;
  DBUG_ASSERT(table_share->tmp_table != NO_TMP_TABLE ||
              m_lock_type == F_WRLCK);

  DBUG_ENTER("handler::ha_write_row");
  DBUG_EXECUTE_IF("inject_error_ha_write_row",
                  DBUG_RETURN(HA_ERR_INTERNAL_ERROR); );
  DBUG_EXECUTE_IF("simulate_storage_engine_out_of_memory",
                  DBUG_RETURN(HA_ERR_SE_OUT_OF_MEMORY); );
  MYSQL_INSERT_ROW_START(table_share->db.str, table_share->table_name.str);
  mark_trx_read_write();

  DBUG_EXECUTE_IF("handler_crashed_table_on_usage",
                  my_error(HA_ERR_CRASHED, MYF(ME_ERRORLOG), table_share->table_name.str);
                  set_my_errno(HA_ERR_CRASHED);
                  DBUG_RETURN(HA_ERR_CRASHED););

  MYSQL_TABLE_IO_WAIT(PSI_TABLE_WRITE_ROW, MAX_KEY, error,
    { error= write_row(buf); })

  MYSQL_INSERT_ROW_DONE(error);
  if (unlikely(error))
    DBUG_RETURN(error);

  if (unlikely((error= binlog_log_row(table, 0, buf, log_func))))
    DBUG_RETURN(error); /* purecov: inspected */

  DEBUG_SYNC_C("ha_write_row_end");
  DBUG_RETURN(0);
}

handler 是各个存储引擎的基类，这里我们使用InnoDB引擎。再往下是进入引擎的write_row。

四 innodb执行插入

我之前整理的引擎部分与server层可以对接上了。handler对应ha_innobase 插入的表信息保存在handler中。看看write_row。源码在innobase/handler/ha_innodb.cc。

/********************************************************************//**
Stores a row in an InnoDB database, to the table specified in this
handle.
@return error code */

int
ha_innobase::write_row(
/*===================*/
	uchar*	record)	/*!< in: a row in MySQL format */
{
	dberr_t		error;
	ulint		sql_command;
	int		error_result = 0;
	bool		auto_inc_used = false;

....
/* Step-1: Validation checks before we commence write_row operation. */
/* Step-2: Intermediate commit if original operation involves ALTER
	table with algorithm = copy. Intermediate commit ease pressure on
	recovery if server crashes while ALTER is active. */
/* Step-3: Handling of Auto-Increment Columns. */
/* Step-4: Prepare INSERT graph that will be executed for actual INSERT
	(This is a one time operation) */
	if (m_prebuilt->mysql_template == NULL
	    || m_prebuilt->template_type != ROW_MYSQL_WHOLE_ROW) {

		/* Build the template used in converting quickly between
		the two database formats */

		build_template(true);
	}

	innobase_srv_conc_enter_innodb(m_prebuilt);

	/* Step-5: Execute insert graph that will result in actual insert. */
	error = row_insert_for_mysql((byte*) record, m_prebuilt);

	DEBUG_SYNC(m_user_thd, "ib_after_row_insert");

	/* Step-6: Handling of errors related to auto-increment. */
  /* Step-7: Cleanup and exit. */
}

源码很长，截取了部分注释，大概7步，这里主要进入下一级入口是第5步的row_insert_for_mysql。

源码在innobase/row/row0mysql.cc

/** Does an insert for MySQL.
@param[in]	mysql_rec	row in the MySQL format
@param[in,out]	prebuilt	prebuilt struct in MySQL handle
@return error code or DB_SUCCESS*/
dberr_t
row_insert_for_mysql(
	const byte*		mysql_rec,
	row_prebuilt_t*		prebuilt)
{
	/* For intrinsic tables there a lot of restrictions that can be
	relaxed including locking of table, transaction handling, etc.
	Use direct cursor interface for inserting to intrinsic tables. */
	if (dict_table_is_intrinsic(prebuilt->table)) {
		return(row_insert_for_mysql_using_cursor(mysql_rec, prebuilt));
	} else {
		return(row_insert_for_mysql_using_ins_graph(
			mysql_rec, prebuilt));
	}
}

可以看下insert的时候，进行的分支判断：row_insert_for_mysql_using_cursor,row_insert_for_mysql_using_ins_graph.

分支判断这里知识点， Intrinsic table：主要被优化器使用，在之前版本中使用的是MyISAM引擎来作为查询过程中产生的临时表引擎；在5.7版本中，InnoDB对临时表(intrinsic table)做了大量优化，包括独立的临时表表空间，独立的undo回滚端，减少redo log的记录。对应的row_insert_for_mysql_using_cursor直接跳过了加锁的lock_table过程。可以看看这篇了解详情。

http://mysqlserverteam.com/mysql-5-7-innodb-intrinsic-tables/

row_insert_for_mysql_using_ins_graph

源码在 /innobase/row/row0mysql.cc

/** Does an insert for MySQL using INSERT graph. This function will run/execute
INSERT graph.
@param[in]	mysql_rec	row in the MySQL format
@param[in,out]	prebuilt	prebuilt struct in MySQL handle
@return error code or DB_SUCCESS */
static
dberr_t
row_insert_for_mysql_using_ins_graph(
	const byte*	mysql_rec,
	row_prebuilt_t*	prebuilt)
{
	trx_savept_t	savept;
	que_thr_t*	thr;
	dberr_t		err;
	ibool		was_lock_wait;
	trx_t*		trx		= prebuilt->trx;
	ins_node_t*	node		= prebuilt->ins_node;
	dict_table_t*	table		= prebuilt->table;
   ....

>>>>>

待补充吧，点太多，不能展开，先看看图。