Post-written log paper analysis

 

Post log

Write behind logging

Basic idea

The advantages of NVM are byte addressable, high performance close to memory, and there is little gap between sequential access and random access. 2016 Nian VLDB Conference on " the Write behind logging " paper specifically for NVM designed a new logging and recovery protocols. The main idea is to remove the traditional append only the redo and undo log, but still need to retain undo information to roll back the uncommitted transactions. Before the transaction is committed, it is necessary to forcibly flush all the modifications of the transaction, and then record the commit mark in the log , which is the WBL mentioned here . During the recovery process, uncommitted transactions are rolled back through undo information by analyzing the commit mark .

And this paper made a series of optimizations based on this idea, and the mechanism is introduced below. First of all, let me complain. This paper is not very clear and it is difficult to understand. The following is an in-depth understanding of the mechanism, and I hope to correct it if there is any impropriety.

mechanism

1. Several concepts

Tuple structure in DTT table : transaction ID+ table ID+ change position

Tuple structure in the data page :

tuple id+trx id+begin commit timestamp + end commit timestamp + tuple ID of the previous version number + data

Cp : The data of the committed transaction after the timestamp is not guaranteed to have been persisted to disk

2. A transaction operation process

Begin;

Perform operations to modify data pages in DRAM

Add a primitive ancestor to the DTT table, the primitive ancestor does not include the inserted value

Commit

1 ) Record the commit timestamp t1 of each transaction

2 ) Scan the DTT table to get the transaction related tuples

3 ) Calculate cp and cd values

4 ) Persist the tuple in the DTT table to disk, at this time the submission timestamp t1 is added to the tuple

5 ) Persist WBL composed of cp and cd to NVM

6 ) Notify completion of group submission and release DTT

Rollback

    1 ) Roll back through the information in DTT .

3. Diagram of a transaction operation process

 image.png

If the system fails at the time of trx6 commit , the last WBL is obtained by traversing from the WBL log file when restarting, namely {4 , ( 5,100 ) } , and the number of active transactions is 4 , and transactions greater than 5 are not committed. Once the analysis is completed, the recovery is complete, and the new transaction can be accepted.

But how to deal with dirty data on the disk? Will enable the recovery of a single thread, scan the table records, if the record time stamp greater than 5 , such as transaction 6 record, he is not visible, it is about recycling out; for 1,3,2,5 are visible , Do not deal with it, for 4 , he also recycles it in the uncommitted transaction list submitted by the group.

4. Shortcomings and doubts

1 ) The article does not specify how the records are recycled, whether the subsequent transaction access is performed for judgment processing, or it is just another recycling thread that scans for judgment. If the amount of data is particularly large, wouldn't it be costly to scan? After all the scans are completed, will the unused WBL be recovered?

2 ) If the requirements cannot be met in a high-availability scenario, the corresponding WAL still needs to be replicated

3 ) The subsequent visibility judgment is more complicated, and there is no detailed description in the article

Original and reference

http://www.vldb.org/pvldb/vol10/p337-arulraj.pdf

http://mysql.taobao.org/monthly/2019/01/01/

https://github.com/cmu-db/peloton/wiki/Write-Ahead-Logging


Guess you like

Origin blog.51cto.com/yanzongshuai/2606303
Recommended