B-tree 锁分析

文章目录

写在最前面

上上周末跟许久不见的好友吃饭、闲聊。讨论到护肤品功效的问题，我的观点是「无功无过，即为过」，好友不解，觉得我太过苛刻。

但是细细想来，这个观点还是有一定道理的，就好像玩游戏接受惩罚，大家都后退一步，你不跟着退，从表象上来看你就是那个站出来要接受惩罚的人，不是嘛？

为什么需要锁？

只包含 search、insertion、deletion 以及 split 和 merge 叶子节点和内部节点等的基础操作的 B-tree 算法是比较容易理解，但是上述操作遇上多线程，同时又必须确保数据的一致性和完整性，问题就变得复杂了。

B-tree + 并发操作 => 需要锁

用锁来保护什么？

保护的内容

思考下 B-tree 中，我们究竟需要用锁来保护什么？

保护 B-tree 叶子存储的内容（locks）
保护 B-tree 自身数据结构（latches）

二者的区别

在这里插入图片描述

注：区别 1

B-tree locking, or locking in B-tree indexes, means two things. First, it means concurrency control among concurrent database transactions querying or mod- ifying database contents and its representation in B-tree indexes. Second, it means concurrency control among concurrent threads modifying the B-tree data structure in memory, including in particular images of disk-based B-tree nodes in the buffer pool.

除了在上述使用上的区别外，locks 和 latches 在容灾恢复的使用时机上也有区别：

注：区别 2

Latches and locks also differ both during system recovery and while waiting for the decision of a global transaction coordinator. While waiting, no latches are required, but retaining locks is essential to guarantee a local transaction’s ability to abide by the global coordinator’s final decision. During recovery with- out concurrent execution of new transactions, locks are not required, because concurrency control during forward processing prior to the system crash al- ready ensured that active transactions do not conflict. Latches, however, are as important during recovery as during normal forward processing if recov- ery employs multiple threads and shared data structures such as the buffer pool.

注：本来想翻译的，但是总觉得自己翻译的更难理解，不如直接摘抄论文原句方便理解还不会出错

保护 B-tree 的物理结构

在多线程访问内存的数据情况下，必须要协调访问的先后顺序。无论数据是常驻内存（如 buffer pool 的查找表或内存数据库中的页），还是临时驻留在内中（如缓冲池中的磁盘页映像）。

常见的 Latch 有以下几种：

互斥锁：拒绝任何形式的并发访问
读写锁：读共享，写互斥

issues

Latches 确保数据结构的一致性当存在多线程访问的情况下，需要确保以下几点：

buffer pool 不能被写的线程修改，当读的线程存在的情况下。
page identifier 从一个页到另一个，例如从 parent node 到 child node 在 B-tree 索引，这个 page identifer 不能被设置成无效对于另外的线程来说
pointer chasing 不仅适用于 parent-child pointer 也适用于 neighbor pointer。例如在一个 chain of leaf pages 执行 range scan 或者 key to lock in key range locking 时，并发查询的执行计划、事务、和线程可能会执行升序或者降序查询，这将会导致死锁。

注：latch 通常依赖开发人员的规则来避免死锁，而不是依赖于自动的死锁检测和解决。
在 B-tree 插入过程中，child node 可以会溢出，需要插入到其 parent node 上，在 parent node 也溢出的情况下，则需要插入到 grandparent node。在极端的例子里， b tree 的旧根节点必须被分割，并被新的根节点替换。但是在根到叶的搜索以及页到根的修改之间会引入死锁。

Lock Coupling（锁耦合）

并发多线程，在 B-tree 插入节点的时候，保证数据一致性的方式：

对整个 B-tree 加全局的互斥锁
读的时候用共享锁，更新的时候用互斥锁
update Latches + 共享锁和互斥锁

update Latches：Used on resources that can be updated. Prevents a common form of deadlock that occurs when multiple sessions are reading, locking, and potentially updating resources later.
在 insert 从 root 到 leaf 的遍历过程中主动拆分节点。

优点：避免了第一种方式的瓶颈和故障点

缺点：因为分割过早浪费了部分存储空间
先基于共享锁进行一次从 root-to-leaf 的搜索，如果插入的节点导致了分割，则重新开始一次从 root-to-leaf 的在到达要分割节点的时候获取一个互斥锁。

Load Balancing and Reorganization

Load Balancing and Reorganization 比如节点的 removal 和 load balancing ，一般都是通过异步操作来实现的，因为这些改变与内容无关，数据库的逻辑内容不会改变，只会改变其表示形式。

保护 B-tree 的逻辑结构

Locks 分离事务的读取和修改数据库内容。

对于可串行化的事务隔离级别：读锁和写锁必须被保持直到事务结束。
对于其他较弱的事务隔离级别（读未提交、读已提交、可重复读）：这种几种事务的隔离级别中，Locks 可以被短暂的获取，但是会带了数据不一致的问题，比如脏读、幻读、不可重复读。

串行化的锁 =Lock（ key value）+ Lock（key gap)

注：key range locking 基于 hierarchical or multigran-ularity locking （层次化或多层次锁定来实现）

多层次锁定常常用于热点数据存储，以期达到最小粒度的锁的使用和最大的并发度。

Key Range Locking

In the simplest form of key range locking, a key and the gap to the neighbor are locked as a unit.

在这里插入图片描述

Key Range Locking and Ghost Records

在很多 B-tree 的实现中，请求用户删除事务并非真的物理删除一条记录，相反采用了标记删除的方式，通常是标记一个字段为「pseudodeleted」或者「ghost record」。

缺点

因为 ghost records 是物理存在的，所以查询必须显示的过滤掉它们的存在
即使已经删除，但是 ghost record 仍占用存储空间，因为回收的过程是异步进行的。

优点

因为 ghost record 的存在，所有对已存在 ghost record 的插入会转变为更新操作。这带来的好处，
- 降低锁的粒度为 key value
- 对于日志文件系统，这将大大减少在磁盘上记录的日志信息
ghost record 的存在降低了事务在回滚过程中的复杂程度

Locking in Nounnique Indexes

如果是非唯一的聚集索引，则会出现多个行 value 对应一个 key 的情况。根据排列组合的算法，该种情况下锁的使用包括：

key value locking lock each value (锁 key 以及所有行的 value)
lock each unique pair of value and row identifier(锁 key 和指定行的 value)

注：后者的并发度会更高。

思考：这两种锁方式的选择跟数据在内存或者磁盘上的存储形式是否有关系？

Increment Lock Modes

Increment Lock 与读写锁不兼容
Increment Lock 持有者不锁定持有的内容，这意味着，持续有 Increment Lock 的事务在不读取锁的情况下无法确定当前值
Read Lock + Increment Lock = Write Lock

注：increment lock 应该就是乐观锁吧

写在最后

开头从「上周末」改成「上上周末」，又一次完美的为自己找好借口，成功的拖延自己给自己定的目标。大抵这种状态总是不好的，所以要继续、努力的改正掉。

人生是不是就是不断反复克服和妥协自己的过程呢？