Mysql45讲读书笔记 07讲行锁功过:怎么减少行锁对性能的影

一 序

   本文属于极客时间的MySQL45讲读书笔记系列。

MySQL的行锁是在引擎层由各个引擎自己实现的。但并不是所有的引擎都支持行锁,比如MyISAM引擎就不支持行锁。InnoDB是支持行锁的,以及如何通过减少锁冲突来提升业务并发度。

 从两阶段锁说起

在下面的操作序列中,事务B的update语句执行时会是什么现象呢?假设字段id是表t的主键。

实际上事务B的update语句会被阻塞,直到事务A执行commit之后,事务B才能继续执行。因为事务A持有的两个记录的行锁,都是在commit的时候才释放的。

   例子要表达的观点是在InnoDB事务中,行锁是在需要的时候才加上的,但并不是不需要了就立刻释放,而是要等到事务结束时才释放。这个就是两阶段锁协议。

知道了这个设定,对我们使用事务有什么帮助呢?那就是,如果你的事务中需要锁多个行,要把最可能造成锁冲突、最可能影响并发度的锁尽量往后放。

老师这里举个用户购买电影票的例子,设计用户账户update,及影院账户update。多个用户下单的情况下,显然影院账户是热点,应该尽可能的往后房。做过支付系统里面的账务的应该知道,热点账户处理是重点设计点之一。

死锁和死锁检测

当并发系统中不同线程出现循环资源依赖,涉及的线程都在等待别的线程释放资源时,就会导致这几个线程都进入无限等待的状态,称为死锁。这里我用数据库中的行锁举个例子。  

这时候,事务A在等待事务B释放id=2的行锁,而事务B在等待事务A释放id=1的行锁。 事务A和事务B在互相等待对方的资源释放,就是进入了死锁状态。当出现死锁以后,有两种策略:

  • 一种策略是,直接进入等待,直到超时。这个超时时间可以通过参数innodb_lock_wait_timeout来设置。
  • 另一种策略是,发起死锁检测,发现死锁后,主动回滚死锁链条中的某一个事务,让其他事务得以继续执行。将参数innodb_deadlock_detect设置为on,表示开启这个逻辑。

因为超时时间无法准确评估,innodb采用第二种策略,即:主动死锁检测(回滚掉代价较小的一方,可以使用命令SHOW ENGINE INNODB STATUS 来查看死锁日志),而且innodb_deadlock_detect的默认值本身就是on。主动死锁检测在发生死锁的时候,是能够快速发现并进行处理的,但是它也是有额外负担的。

你可以想象一下这个过程:每当一个事务被锁的时候,就要看看它所依赖的线程有没有被别人锁住,如此循环,最后判断是否出现了循环等待,也就是死锁。

那如果是我们上面说到的所有事务都要更新同一行的场景呢?

每个新来的被堵住的线程,都要判断会不会由于自己的加入导致了死锁,这是一个时间复杂度是O(n)的操作。假设有1000个并发线程要同时更新同一行,那么死锁检测操作就是100万这个量级的。虽然最终检测的结果是没有死锁,但是这期间要消耗大量的CPU资源。因此,你就会看到CPU利用率很高,但是每秒却执行不了几个事务。

根据上面的分析,我们来讨论一下,怎么解决由这种热点行更新导致的性能问题呢?问题的症结在于,死锁检测要耗费大量的CPU资源。

一般中小公司是没有修改MySQL内核源码的人,指不定改出啥大坑。所以就不要指望靠数据库来硬抗了,更多应该从业务架构上寻找思路:

     1. 上面说的那样减少大事物,拆分成小的。

      2. 数据库排队改成外面的排队,如果觉得排队业务无法接受,能否换个方式,拿热点账户举例,我就不是一个实体账户了,拆分为10个虚拟账户,这样再去update余额,总账户是这些子账户的和,只要每天的对账对平就行。

小结

 介绍了MySQL的行锁,涉及了两阶段锁协议、死锁和死锁检测这两大部分内容。

老师留下了一个问题:如果你要删除一个表里面的前10000行数据,有以下三种方法可以做到:

  • 第一种,直接执行delete from T limit 10000;
  • 第二种,在一个连接中循环执行20次 delete from T limit 500;
  • 第三种,在20个连接中同时执行delete from T limit 500。

方案一,事务相对较长,则占用锁的时间较长,会导致其他客户端等待资源时间较长。
方案二,串行化执行,将相对长的事务分成多次相对短的事务,则每次事务占用锁的时间相对较短,其他客户端在等待相应资源的时间也较短。这样的操作,同时也意味着将资源分片使用(每次执行使用不同片段的资源),可以提高并发性。
方案三,人为自己制造锁竞争,加剧并发量。
方案二相对比较好,具体还要结合实际业务场景。

举个之前备份数据的例子,加个某个表只保留3个月记录大概1500W条,表里数据1亿多条,这种情况下,见了备份表。dba也不会一下全部执行,通常会建议单线程的批量去删,比如一次500条。而且会选择业务低峰期操作。评估下数据库的负载,建议分多长时间跑完。之前真的遇到过迁移数据那种的把数据库cpu 100%的。删完也不会释放表空间,还需要dba去压缩处理。

  **********************补充学习*******************************

innodb  一致性读不会加锁,就不需要做死锁检测;需要加锁的流程如下:

  • 1.InnoDB的初始化一个事务,当事务尝试获取(请求)加一个锁,并且需要等待时(wait_lock),innodb会开始进行死锁检测(deadlock_mark)
  • 2.进入到lock_deadlock_check_and_resolve ,名字很明显了,要检测死锁和解决死锁
  • 3.检测死锁过程中,也是有计数器来进行限制的
  • 4.死锁检测的逻辑之一是等待图的处理过程,如果通过锁的信息和事务等待链构造出一个图,如果图中出现回路,就认为发生了死锁。
  • 5.死锁的回滚,内部代码的处理逻辑之一是比较undo的数量

对于行数据的加锁是由函数 lock_rec_lock 完成,就是先尝试快速加锁lock_rec_lock_fast,快速加锁失败再去lock_rec_lock_slow,代码版本:mysql 5.7.29 ,源码如下:

/*********************************************************************//**
Tries to lock the specified record in the mode requested. If not immediately
possible, enqueues a waiting lock request. This is a low-level function
which does NOT look at implicit locks! Checks lock compatibility within
explicit locks. This function sets a normal next-key lock, or in the case
of a page supremum record, a gap type lock.
@return DB_SUCCESS, DB_SUCCESS_LOCKED_REC, DB_LOCK_WAIT, DB_DEADLOCK,
or DB_QUE_THR_SUSPENDED */
static
dberr_t
lock_rec_lock(
/*==========*/
	bool			impl,	/*!< in: if true, no lock is set
					if no wait is necessary: we
					assume that the caller will
					set an implicit lock */
	ulint			mode,	/*!< in: lock mode: LOCK_X or
					LOCK_S possibly ORed to either
					LOCK_GAP or LOCK_REC_NOT_GAP */
	const buf_block_t*	block,	/*!< in: buffer block containing
					the record */
	ulint			heap_no,/*!< in: heap number of record */
	dict_index_t*		index,	/*!< in: index of record */
	que_thr_t*		thr)	/*!< in: query thread */
{
	ut_ad(lock_mutex_own());
	ut_ad(!srv_read_only_mode);
	ut_ad((LOCK_MODE_MASK & mode) != LOCK_S
	      || lock_table_has(thr_get_trx(thr), index->table, LOCK_IS));
	ut_ad((LOCK_MODE_MASK & mode) != LOCK_X
	      || lock_table_has(thr_get_trx(thr), index->table, LOCK_IX));
	ut_ad((LOCK_MODE_MASK & mode) == LOCK_S
	      || (LOCK_MODE_MASK & mode) == LOCK_X);
	ut_ad(mode - (LOCK_MODE_MASK & mode) == LOCK_GAP
	      || mode - (LOCK_MODE_MASK & mode) == LOCK_REC_NOT_GAP
	      || mode - (LOCK_MODE_MASK & mode) == 0);
	ut_ad(dict_index_is_clust(index) || !dict_index_is_online_ddl(index));

	/* We try a simplified and faster subroutine for the most
	common cases */
	switch (lock_rec_lock_fast(impl, mode, block, heap_no, index, thr)) {
	case LOCK_REC_SUCCESS:
		return(DB_SUCCESS);
	case LOCK_REC_SUCCESS_CREATED:
		return(DB_SUCCESS_LOCKED_REC);
	case LOCK_REC_FAIL:
		return(lock_rec_lock_slow(impl, mode, block,
					  heap_no, index, thr));
	}

	ut_error;
	return(DB_ERROR);
}
/*********************************************************************//**
This is the general, and slower, routine for locking a record. This is a
low-level function which does NOT look at implicit locks! Checks lock
compatibility within explicit locks. This function sets a normal next-key
lock, or in the case of a page supremum record, a gap type lock.
@return DB_SUCCESS, DB_SUCCESS_LOCKED_REC, DB_LOCK_WAIT, DB_DEADLOCK,
or DB_QUE_THR_SUSPENDED */
static
dberr_t
lock_rec_lock_slow(
/*===============*/
	ibool			impl,	/*!< in: if TRUE, no lock is set
					if no wait is necessary: we
					assume that the caller will
					set an implicit lock */
	ulint			mode,	/*!< in: lock mode: LOCK_X or
					LOCK_S possibly ORed to either
					LOCK_GAP or LOCK_REC_NOT_GAP */
	const buf_block_t*	block,	/*!< in: buffer block containing
					the record */
	ulint			heap_no,/*!< in: heap number of record */
	dict_index_t*		index,	/*!< in: index of record */
	que_thr_t*		thr)	/*!< in: query thread */
{
	ut_ad(lock_mutex_own());
	ut_ad(!srv_read_only_mode);
	ut_ad((LOCK_MODE_MASK & mode) != LOCK_S
	      || lock_table_has(thr_get_trx(thr), index->table, LOCK_IS));
	ut_ad((LOCK_MODE_MASK & mode) != LOCK_X
	      || lock_table_has(thr_get_trx(thr), index->table, LOCK_IX));
	ut_ad((LOCK_MODE_MASK & mode) == LOCK_S
	      || (LOCK_MODE_MASK & mode) == LOCK_X);
	ut_ad(mode - (LOCK_MODE_MASK & mode) == LOCK_GAP
	      || mode - (LOCK_MODE_MASK & mode) == 0
	      || mode - (LOCK_MODE_MASK & mode) == LOCK_REC_NOT_GAP);
	ut_ad(dict_index_is_clust(index) || !dict_index_is_online_ddl(index));

	DBUG_EXECUTE_IF("innodb_report_deadlock", return(DB_DEADLOCK););

	dberr_t	err;
	trx_t*	trx = thr_get_trx(thr);

	trx_mutex_enter(trx);

	if (lock_rec_has_expl(mode, block, heap_no, trx)) {

		/* The trx already has a strong enough lock on rec: do
		nothing */

		err = DB_SUCCESS;

	} else {

		const lock_t* wait_for = lock_rec_other_has_conflicting(
			mode, block, heap_no, trx);

		if (wait_for != NULL) {

			/* If another transaction has a non-gap conflicting
			request in the queue, as this transaction does not
			have a lock strong enough already granted on the
			record, we may have to wait. */

			RecLock	rec_lock(thr, index, block, heap_no, mode);

			err = rec_lock.add_to_waitq(wait_for);

		} else if (!impl) {

			/* Set the requested lock on the record, note that
			we already own the transaction mutex. */

			lock_rec_add_to_queue(
				LOCK_REC | mode, block, heap_no, index, trx,
				true);

			err = DB_SUCCESS_LOCKED_REC;
		} else {
			err = DB_SUCCESS;
		}
	}

	trx_mutex_exit(trx);

	return(err);
}
/**
Enqueue a lock wait for normal transaction. If it is a high priority transaction
then jump the record lock wait queue and if the transaction at the head of the
queue is itself waiting roll it back, also do a deadlock check and resolve.
@param[in, out] wait_for	The lock that the joining transaction is
				waiting for
@param[in] prdt			Predicate [optional]
@return DB_LOCK_WAIT, DB_DEADLOCK, or DB_QUE_THR_SUSPENDED, or
	DB_SUCCESS_LOCKED_REC; DB_SUCCESS_LOCKED_REC means that
	there was a deadlock, but another transaction was chosen
	as a victim, and we got the lock immediately: no need to
	wait then */
dberr_t
RecLock::add_to_waitq(const lock_t* wait_for, const lock_prdt_t* prdt)
{
	ut_ad(lock_mutex_own());
	ut_ad(m_trx == thr_get_trx(m_thr));
	ut_ad(trx_mutex_own(m_trx));

	DEBUG_SYNC_C("rec_lock_add_to_waitq");

	m_mode |= LOCK_WAIT;

	/* Do the preliminary checks, and set query thread state */

	prepare();

	bool	high_priority = trx_is_high_priority(m_trx);

	/* Don't queue the lock to hash table, if high priority transaction. */
	lock_t*	lock = create(m_trx, true, !high_priority, prdt);

	/* Attempt to jump over the low priority waiting locks. */
	if (high_priority && jump_queue(lock, wait_for)) {

		/* Lock is granted */
		return(DB_SUCCESS);
	}

	ut_ad(lock_get_wait(lock));

	dberr_t	err = deadlock_check(lock);

	ut_ad(trx_mutex_own(m_trx));

	/* m_trx->mysql_thd is NULL if it's an internal trx. So current_thd is used */
	if (err == DB_LOCK_WAIT) {
		thd_report_row_lock_wait(current_thd, wait_for->trx->mysql_thd);
	}
	return(err);
}

关于死锁检测的代码:

/**
Check and resolve any deadlocks
@param[in, out] lock		The lock being acquired
@return DB_LOCK_WAIT, DB_DEADLOCK, or DB_QUE_THR_SUSPENDED, or
	DB_SUCCESS_LOCKED_REC; DB_SUCCESS_LOCKED_REC means that
	there was a deadlock, but another transaction was chosen
	as a victim, and we got the lock immediately: no need to
	wait then */
dberr_t
RecLock::deadlock_check(lock_t* lock)
{
	ut_ad(lock_mutex_own());
	ut_ad(lock->trx == m_trx);
	ut_ad(trx_mutex_own(m_trx));

	const trx_t*	victim_trx =
			DeadlockChecker::check_and_resolve(lock, m_trx);

	/* Check the outcome of the deadlock test. It is possible that
	the transaction that blocked our lock was rolled back and we
	were granted our lock. */

	dberr_t	err = check_deadlock_result(victim_trx, lock);

	if (err == DB_LOCK_WAIT) {

		set_wait_state(lock);

		MONITOR_INC(MONITOR_LOCKREC_WAIT);
	}

	return(err);
}
/** Checks if a joining lock request results in a deadlock. If a deadlock is
found this function will resolve the deadlock by choosing a victim transaction
and rolling it back. It will attempt to resolve all deadlocks. The returned
transaction id will be the joining transaction instance or NULL if some other
transaction was chosen as a victim and rolled back or no deadlock found.
@param[in]	lock lock the transaction is requesting
@param[in,out]	trx transaction requesting the lock
@return transaction instanace chosen as victim or 0 */
const trx_t*
DeadlockChecker::check_and_resolve(const lock_t* lock, trx_t* trx)
{
	ut_ad(lock_mutex_own());
	ut_ad(trx_mutex_own(trx));
	check_trx_state(trx);
	ut_ad(!srv_read_only_mode);

	/* If transaction is marked for ASYNC rollback then we should
	not allow it to wait for another lock causing possible deadlock.
	We return current transaction as deadlock victim here. */
	if (trx->in_innodb & TRX_FORCE_ROLLBACK_ASYNC) {
		return(trx);
	} else if (!innobase_deadlock_detect) {
		return(NULL);
	}

	/*  Release the mutex to obey the latching order.
	This is safe, because DeadlockChecker::check_and_resolve()
	is invoked when a lock wait is enqueued for the currently
	running transaction. Because m_trx is a running transaction
	(it is not currently suspended because of a lock wait),
	its state can only be changed by this thread, which is
	currently associated with the transaction. */

	trx_mutex_exit(trx);

	const trx_t*	victim_trx;

	/* Try and resolve as many deadlocks as possible. */
	do {
		DeadlockChecker	checker(trx, lock, s_lock_mark_counter);
        //这里是死锁检测
		victim_trx = checker.search();

		/* Search too deep, we rollback the joining transaction only
		if it is possible to rollback. Otherwise we rollback the
		transaction that is holding the lock that the joining
		transaction wants. */
        //如果死锁检测的深度太深,需要回滚当前的事务
		if (checker.is_too_deep()) {

			ut_ad(trx == checker.m_start);
			ut_ad(trx == victim_trx);

			rollback_print(victim_trx, lock);

			MONITOR_INC(MONITOR_DEADLOCK);

			break;

		} else if (victim_trx != NULL && victim_trx != trx) {
           //如果需要回滚的是其他事务,那么调用trx_rollback进行回
			ut_ad(victim_trx == checker.m_wait_lock->trx);

			checker.trx_rollback();

			lock_deadlock_found = true;//检测到死锁

			MONITOR_INC(MONITOR_DEADLOCK);
		}

	} while (victim_trx != NULL && victim_trx != trx);

	/* If the joining transaction was selected as the victim. */
	if (victim_trx != NULL) {

		print("*** WE ROLL BACK TRANSACTION (2)\n");

		lock_deadlock_found = true;
	}

	trx_mutex_enter(trx);

	return(victim_trx);
}
发布了521 篇原创文章 · 获赞 94 · 访问量 56万+

猜你喜欢

转载自blog.csdn.net/bohu83/article/details/105187854