数据库之事务隔离

Read phenomena

three different read phenomena when Transaction 1 reads data that Transaction 2 might have changed:

dirty reads: a transaction is allowed to read data from a row that has been modified by another running transaction and not yet committed. （与不可重复读的区别就在于事务2不需要提交就能造成语句1两次执行的结果不同）
non-repeatable reads: during the course of a transaction, a row is retrieved twice and the values within the row differ between reads.
phantom reads: in the course of a transaction, new rows are added or removed by another transaction to the records being read. (满足查询条件的行的集合因为其它最近提交的事务而发生了改变)

Isolation level

read uncommitted: [lowest isolation level] one transaction may see not-yet-committed changes made by other transactions, even direty reads can occur in this isolation.
read committed: restrict dirty read, any data read is committed at the moment it is read (a lock-based concurrency control keeps write locks until the end of the transaction, but read locks are released as so as the SELECT operation is performed, so non-repeatable reads can occur in this isolation level).
repeatable reads: a lock-based concurrency DBMS implementation keeps read and write locks (acquired on selected data) until the end of the transaction. However, range-locks are not managed, so phantom reads can occur in this isolation level. [不管读多少次数据集，均能得到一致的结果]
seralizable: [highest isolation level] requires read and write locks (acquired on selected data) to be released at the end of the transaction. Also range-locks must be acquired when a SELECT query uses a ranged WHERE clause to avoid the phantom reads phenomenon. [并发执行的事务其执行结果完全等价于顺序执行] [通过在repeatable read级别的基础上添加范围锁保证一个事务内的两次查询结果完全一样而不会出现第一次查询结果是第二次查询结果的子集]

Multiversion concurrency control

Out of ACID properties, isolation is the most difficult to deal with in practice. Isolation guarantees that all transactions even if running at the same time, are executed “as if” executed serially. In practice, having Isolation while keeping reasonable performance, requires quite a few compromises and brings us to the topic of transaction isolation levels. There are two very different approaches to implementing isolated transactions and concurrency control: lock-based concurrency control and multiversioned concurrency control.

LBCC: implementing concurrency/isolation is based solely on locking; in other words, if one transaction uses some piece of data (usually row), a lock is set on this row, and is kept until the transaction succeeds or fails. It has a few different isolation levels metioned above.
MVCC: create a “previous version” (“snapshot”) of that row, and will supply that “previous version” of the row to any other transaction which may try running concurrently. With MVCC-based databases, there are only two different isolation levels:
- read committed: At read committed level, MVCC-based DB implements reads using snapshots taken at the moment of the read query issued. Writes still request write (exclusive) lock.
- repeatable read: At repeatable read level, MVCC-based DB implements reads using shapshots taken at the moment of beginning of the transaction.

举个简单的例子：(https://www.zhihu.com/question/279538775/answer/407458020)

一个事务A（txnId=100）修改了数据X，使得X=1，并且commit了

另外一个事务B（txnId=101）开始尝试读取X，但是还X=1。但B没有提交。

第三个事务C（txnId=102）修改了数据X，使得X=2。并且提交了

事务B又一次读取了X。这时

如果事务B是Read Committed。那么就读取X的最新commit的版本，也就是X=2

如果事务B是Repeatable Read。那么读取的就是当前事务（txnId=101）之前X的最新版本，也就是X被txnId=100提交的版本，即X=1。

注意，这里B不论是Read Committed，还是Repeatable Read，都不会被锁，都能立刻拿到结果。这也就是MVCC存在的意义。

The main advantage of the MVCC-based DBs lies with the fact that with MVCC writers don’t block readers and vice versa (对数据库任何修改的提交都不会直接覆盖之前的数据，而是产生一个新的版本与老版本共存，使得读取时可以完全不加锁).

PostgreSQL的事务隔离

在PostgreSQL里，你可以请求四种可能的事务隔离级别中的任意一种。但是在内部，实际上只有三种独立的隔离级别，分别对应读已提交，可重复读和可串行化。如果你选择了读未提交的级别，实际上你获得的是读已提交，并且在PostgreSQL的可重复读实现中，幻读是不可能的，所以实际的隔离级别可能比你选择的更严格。（这是 SQL 标准允许的：四种隔离级别只定义了哪种现像不能发生，但是没有定义那种现像一定发生）

Notes

range lock: If you use a range in the WHERE clause the database will lock each possible tuple in this range and also the next tuple (before and after). If there is no next tuple available in a direction it will lock completely in this direction.

lost update: Update done to a data item by a transaction is lost as it is overwritten by the update done (or update but rollback) by another transaction.