CMU Database Systems - Concurrency Control Theory

Concurrency control is one of the most difficult issues inside the database theory

First look at the transaction concurrency control, transaction

Defined as follows,

In fact, transaction key is to meet the ACID properties,

Formal definition of the left because of the intuitive understanding
which Consistency may be more difficult to understand what others are more intuitive, consistency is actually not a significant problem for stand-alone database, but for distributed database that is a major problem

 

The question then is how to make the design meet Transaction ACID?

A simple way is, Strawman System

Serial execution guarantee consistency and isolate, copy protection atomicity entire library, this method is used in early SQLlite, which is mainly used in embedded scene, traffic and data volumes are small

But the performance of this method is too low

Here's a look at the various ACID properties, generally designed to meet the how

 

Atomicity

Atomic issues,

The most commonly used method is Log, Undo & Redo, so that if the transaction ABORT, according to the undo log can be rolled back, to ensure atomicity

There are ways Shadow Paging, also MVCC, I modified when the page you want to modify a copy go change, typical use is CouchDB

 

Consistency

First Consistency is logical, what does that mean?
Is nothing to do and achieve, to do a transfer, the data on both sides of the logic should be right, no matter how you achieve the ground floor, what data structures

It refers to database consistency, did update after update, no matter from what point of view, he should be the same, such as in a different transaction, different client, different. . .
For stand-alone database, which is actually relatively easy to reach

transaction consistency,是指应用层面的,外部的一致性,不光是数据库内部的,这种一致性需要应用自己去保障

 

Isolation

隔离性,每个transaction执行的时候,不会受到其他的transaction的干扰

因为我们要提高数据库的性能,所以不可能让transaction串行执行,所以transaction一定是并发执行的,这样一定会存在interleave的问题

因为把transaction在物理上去做隔离一定是比较低效的,所以实际的做法都是让各个transaction interleave的执行,但其中要注意避免冲突,conflict

避免冲突一般都是要加锁,所以自然会有悲观和乐观锁的分别 

这里先不谈怎么避免冲突

我们先看下interleave执行会带来哪些问题?

对于这个例子,两个transaction,一个是转账,一个是加利息

T1,T2,如果顺序执行的结果是一样的

这里注意,T1,T2本身谁先执行,这个是要应用控制的,对于数据库而言,无论谁先执行都是对的

那么可以看到interleave执行的结果可能是good,也可能是bad

如果判断是好是坏?这个很直觉,和串行执行结果一样就是好的,否则就是坏的

所以这里给出一堆概念,只是就是想说明,你interleave执行的结果一定要和串行执行一样

这样给transaction调度带来很大的flexible,因为只要满足serializable schedule,就可以任意的并发调度

Serializable Schedule的定义,一个Schedule和任意一个Serial ScheduleEquivalent的,即执行结果相同

 

那么我们怎么判断一个sechdule是否是serializable?

我们先看看,如果不满足serializable schedule,会发生什么?Conflict

冲突的双方一定是在不同的transaction中,并且其中至少有一个是write操作

所以Conflict分为3种,read-read是不会冲突的

那么现在的思路,我们只要去看看sechdule中是否存在这些conflict,如果不存在,我们就可以认为这个sechdule是serializable

形式化的表达就是,如果S是和任意一个serial schedule冲突等价的,那么S就是conflict serializable;因为如果存在上面的冲突就不可能和serial schedule冲突等价

这里需要注意,我们判断冲突的时候,一般只会看是否同时对一个object有读写,比如对于Unrepeatable Read,我们不会看后面还是不是有那个read,或者对于dirty reads,如果后面没有abort,也不会有问题;
所以这里是充分但不必要条件,不满足conflict serializable,也不一定就得到错误的结果,但是满足,得到的结果一定是正确的

 

 

下面就要找一种方法,可以判断S是否是conflict serializable

我们可以把任意不冲突的operation进行swap,看看最终能不能变成一个serial schedule

例子,

这个方法看着比较简单,但如果transaction比较多的话,会很难操作

所以需要一个更形式化的方法,称为依赖图

其实就是把冲突的依赖用线连起来,如果有环,说明是无法conflict serializable的

比如,你看右边的例子,是无法conflict serializable的

而example2,是可以conflict serializable,因为依赖图里面没有环

 

前面讲的都是Conflict Serialization

还有一种更为宽泛的叫做,View Serialization

定义很难理解,从例子上看,就是有些不符合conflict serialization的case,算出来结果也是对的,比如例子里面,因为是blind write,所以A的结果只会有最后一个write决定,所以这个schedule还是可以强行等价于一个serial schedule的

 

 View Serialization可以比Conflict Serialization有更多,更灵活的schedule,但是这个难于判断,很难实现

所以总体来说,关系是这样的,越大调度越灵活,但是机制和判断越复杂

 

Durability

 

Guess you like

Origin www.cnblogs.com/fxjwind/p/10979823.html