Database transaction

Preface

Transaction is an important concept in the database system. Understanding this concept is the prerequisite for developing applications that interact with the database in a correct way. However, many students' understanding of affairs is relatively one-sided and superficial. They only equate it with ACID. They do not understand the real motivation of database systems to introduce transactions. What ACID means for transactions and the most important thing: How does the database system guarantee the ACID characteristics of transactions? ?

1. Know things

1.1 Why do we need database transactions

Transfer is a common operation in life, such as transferring 100 yuan from account A to account B. From the user's point of view, this is a logical single operation, but in a database system, it will be completed in at least two steps:

  • 1. Reduce the amount of account A by 100 yuan
  • 2. Increase the amount of account B by 100 yuan.

Database transaction

The following problems may occur during this process:

  • 1. The first step of the transfer operation was executed successfully, and the money in the A account was reduced by 100 yuan, but the second step failed to execute or the system crashed without being executed, resulting in the B account did not increase by 100 yuan.
  • 2. The system crashed just after the transfer operation was completed, and the transfer record before the crash was lost when the system restarted and recovered.
  • 3. At the same time, another user transferred money to account B, and the amount of account B was abnormal due to operations on account B at the same time.

In order to solve these problems, the concept of database transaction needs to be introduced.

1.2 What is a database transaction

Definition: A database transaction is a collection of operations that constitute a single logical unit of work.
A typical database transaction is as follows

BEGIN TRANSACTION  //事务开始
SQL1
SQL2
COMMIT/ROLLBACK   //事务提交或回滚

There are several points to explain about the definition of affairs:

  • 1. A database transaction can include one or more database operations, but these operations constitute a logical whole.
  • 2. All of these database operations that constitute a logical whole are executed successfully or all are not executed.
  • 3. All operations that constitute a transaction either all affect the database or none of them, that is, regardless of whether the transaction is executed successfully, the database can always maintain a consistent state.
  • 4. The above is still true even if the database fails and concurrent transactions exist.

1.3 How does the transaction solve the problem

For the above transfer example, all operations related to the transfer can be included in one transaction

BEGIN TRANSACTION 
A账户减少100元
B账户增加100元
COMMIT
  • 1. When the database operation fails or the system crashes, the system can recover from the transaction boundary, and there will be no situation in which the amount of A account decreases but B account does not increase.
  • 2. When there are multiple users operating the database at the same time, the database can perform concurrent control based on the transaction, so that the transfer operations of multiple users to the B account are isolated from each other.

Transactions make the system more convenient for fault recovery and concurrency control, thus ensuring the consistency of the database state.

1.4 Overview of ACID characteristics of transactions and implementation principles

Atomicity: All operations in the transaction as a whole are as indivisible as atoms, either all succeed or all fail.

Consistency (Consistency): The execution result of the transaction must make the database from a consistent state to another consistent state. The consistency state refers to: 1. The state of the system satisfies the integrity constraints of the data (master code, referential integrity, check constraints, etc.) 2. The state of the system reflects the real state of the real world that the database should describe, such as the two before and after the transfer The sum of the amounts of each account should remain unchanged.

Isolation: Transactions executed concurrently will not affect each other, and their impact on the database is the same as when they are executed serially. For example, if multiple users transfer money to one account at the same time, the result of the final account should be the same as the result of their transfer in order.

Durability: Once a transaction is committed, its update to the database is durable. Any transaction or system failure will not cause data loss.

In the ACID characteristics of transactions, C means consistency is the fundamental pursuit of transactions, and the damage to data consistency mainly comes from two aspects

  • 1. Concurrent execution of transactions
  • 2. Transaction failure or system failure

The database system uses concurrency control technology and log recovery technology to avoid this situation.

Concurrency control technology ensures the isolation of transactions, so that the consistent state of the database will not be destroyed due to concurrent operations.
Log recovery technology guarantees the atomicity of transactions, so that the consistent state will not be destroyed due to transaction or system failure. At the same time, the submitted changes to the database will not be lost due to system crashes, ensuring the durability of the transaction.
Database transaction

2. Concurrency exception and concurrency control technology

2.1 Common Concurrency Exception

Before explaining the concurrency control technology, let's briefly introduce the common concurrency exceptions in the database.

  • Dirty writes
    Dirty writes means that the transaction has rolled back the committed modifications of other transactions to data items, such as the following situation
    Database transaction

The rollback of data A in transaction 1 caused the committed modification of A in transaction 2 to be rolled back.

  • Lost update
    Lost update means that the transaction has overwritten the committed changes to the data by other transactions, causing these changes to appear to be lost.
    Database transaction

Transaction 1 and transaction 2 read the value of A as 10, transaction 2 first adds 10 to A and commits the modification, then transaction 2 reduces A by 10 and commits the modification, and the value of A is finally, resulting in modification of A by transaction 2. Seems to be lost

  • Dirty reads
    Dirty reads refer to a transaction that reads uncommitted data from another transaction
    Database transaction

During the processing of transaction 1 to A, transaction 2 reads the value of A, but then transaction 1 rolls back, causing transaction 2 to read A that is uncommitted dirty data.

  • Non-repeatable read
    Non-repeatable read refers to the inconsistency of read results of the same data by a transaction. The difference between dirty read and non-repeatable read is that the former reads the uncommitted dirty data of the transaction, and the latter reads the data that has been committed by the transaction, but because the data has been modified by other transactions, it is read twice before and after. The results are different, such as the following situation
    Database transaction

Due to the committed modification of transaction 2 to A, the results of two reads before and after transaction 1 are inconsistent.

  • Phantom read
    Phantom read refers to when a transaction reads a certain range of data, the results of the two reads are inconsistent because of the operations of other transactions. The difference between phantom reading and non-repeatable reading is that non-repeatable reading is for a certain row of data, while phantom reading is for uncertain multiple rows of data. Therefore, phantom reading usually appears in range queries with query conditions, such as the following situation:
    Database transaction

Transaction 1 queries the data of A5. Because transaction 2 inserts a piece of A=4 data, the results of the two queries of transaction 1 are different.

2.2 Transaction isolation level

  1. Transactions are isolated. In theory, the execution of transactions should not affect each other, and their impact on the database should be the same as when they are executed serially.
  2. However, complete isolation will lead to very low system concurrency performance and reduce resource utilization. Therefore, the requirements for isolation will actually be relaxed, which will also reduce the requirements for database consistency to a certain extent.
  3. The SQL standard defines different isolation levels for transactions, from low to high
  • READ UNCOMMITTED
  • READ COMMITTED
  • Repeatable read (REPEATABLE READ)
  • Serialization (SERIALIZABLE)

The lower the transaction isolation level, the more concurrency exceptions may appear, but generally speaking, the stronger the concurrency capability the system can provide.

The correspondence between different isolation levels and possible concurrent exceptions is shown in the following table. One point needs to be emphasized. This correspondence is only theoretical and may not be accurate for specific database implementations. For example, mysql
's Innodb storage engine passes Next-Key Locking technology eliminates the possibility of phantom reading at the repeatable reading level.
Database transaction

Dirty writing is not allowed in all transaction isolation levels, and serialization can avoid all possible concurrent exceptions, but it will greatly reduce the concurrent processing capacity of the system.

2.3 The realization of transaction isolation-common concurrency control technology

Concurrency control technology is the key to achieve transaction isolation and different isolation levels. There are many ways to achieve it. According to the different strategies it adopts for possible conflicting operations, it can be divided into two categories: optimistic concurrency control and pessimistic concurrency control.

  • Optimistic concurrency control: For operations that may conflict with concurrent execution, it is assumed that they will not really conflict, and concurrent execution is allowed, and the conflict is not resolved until a conflict occurs, such as rolling back the transaction.
  • Pessimistic concurrency control: For operations that may conflict with concurrent execution, it is assumed that they must conflict, and the parallel operations are executed serially by letting the transaction wait (lock) or abort (time stamp sorting).

2.3.1 Concurrency control based on blocking

Core idea: For concurrent operations that may conflict, such as read-write, write-read, and write-write, make them mutually exclusive through locks.
Locks are usually divided into two types: shared locks and exclusive locks

  • 1. Shared lock (S): Transaction T adds a shared lock to data A, and other transactions can only add a shared lock to A but cannot add an exclusive lock.
  • 2. Exclusive lock (X): Transaction T adds an exclusive lock to data A, and other transactions can neither add shared locks nor exclusive locks to A

Lock-based concurrency control process:

  1. The transaction applies for the corresponding lock according to the type of operation it performs on the data item (read application shared lock, write application exclusive lock)
  2. The request for the lock is sent to the lock manager. The lock manager decides whether to grant a lock for the request according to whether the current data item already has a lock and whether the applied and held lock conflicts.
  3. If the lock is granted, the transaction that applies for the lock can continue to execute; if it is rejected, the transaction that applies for the lock will wait until the lock is released by other transactions.

possible problems:

  • Deadlock: Multiple transactions hold locks and cyclically wait for the locks of other transactions, causing all transactions to be unable to continue execution.
  • Starvation: The data item A has been added a shared lock, resulting in the transaction has been unable to obtain A's exclusive lock.

For concurrent operations that may conflict, locks make them from parallel to serial execution, which is a pessimistic concurrency control.

2.3.2 Concurrency control based on timestamp

Core idea: For concurrent operations that may conflict, select a transaction based on the timestamp sorting rule to continue execution, and other transactions roll back.

The system will assign a timestamp to each transaction at the beginning. This timestamp can be the system clock or an accumulating counter value. When the transaction rolls back, it will be assigned a new timestamp, the transaction that started first The timestamp is less than the timestamp of the start of the transaction.

Each data item Q has two timestamp-related fields:
W-timestamp(Q): the maximum timestamp of all transactions that successfully execute write(Q)
R-timestamp(Q): all transactions that successfully execute read(Q) Maximum timestamp

The time stamp sorting rules are as follows:

  1. Suppose transaction T issues read(Q), and the timestamp of T is TS
    a. If TS(T)<W-timestamp(Q), then the Q that T needs to read has been overwritten. This
    read operation will be rejected and T will roll back.
    b. If TS(T)>=W-timestamp(Q), execute the read operation and
    set R-timestamp(Q) to the maximum value of TS(T) and R-timestamp(Q)
  2. Suppose transaction T issues write(Q)
    a. If TS(T)<R-timestamp(Q), the write operation is rejected and T rolls back.
    b. If TS(T)<W-timestamp(Q), the write operation is rejected and T rolls back.
    c. Other situations: The system executes the write operation and sets W-timestamp(Q)
    to TS(T).

The essence of timestamp-based sorting and lock-based implementation is the same: for concurrent operations that may conflict, the concurrent execution is replaced by serial, so it is also a pessimistic concurrency control. There are two main differences between them:

  • Lock-based is to let conflicting transactions wait, and based on timestamp sorting is to roll back conflicting transactions.
  • The execution order of transactions based on lock conflicts is based on the order in which they apply for locks, and those that apply first are executed first; while sorting based on timestamps is based on specific timestamp sorting rules.

2.3.3 Concurrency control based on validity check

The core idea: The transaction updates the data first in its own workspace, and the validity check is performed when it is written back to the database, and the transaction that does not meet the requirements is rolled back.

The transaction execution process based on validity check will be divided into three stages:

  1. Reading phase: data items are read in and stored in the local variables of the transaction. All write operations are performed on local variables and do not actually update the database.
  2. Validity check stage: Check the validity of the transaction to determine whether the write operation can be performed without violating serialization. If it fails, roll back the transaction.
  3. Write stage: The transaction has passed the validity check, then the result in the temporary variable is updated to the database.

Validity checking is usually done by comparing the timestamp of the transaction, but it is different from the rules based on timestamp sorting.

This method allows concurrent execution of potentially conflicting operations, because each transaction operates local variables in its own workspace, and it rolls back until conflicts are found in the validity check phase. So this is an optimistic concurrency strategy.

2.3.4 Concurrency control based on snapshot isolation

Snapshot isolation is an implementation of multi-version concurrency control (mvcc).

The core idea is: the database maintains multiple versions (snapshots) for each data item, and each transaction only updates its own private snapshot, and the validity check is performed before the transaction is actually submitted, so that the transaction is normally submitted for update or failed. roll.

Due to snapshot isolation, transactions cannot see updates to data items by other transactions. In order to avoid the problem of missing updates, the following two solutions can be used to avoid:

  • The first committer wins: For the transaction T that performs the check, it is judged whether there are other transactions that have written the update to the database. If it is, T rolls back, otherwise T commits normally.
  • The first updater wins: The lock mechanism ensures that the first transaction that obtains the lock commits its update, and then the transaction that attempts to update is aborted.

Operations that may conflict between transactions are isolated from each other through snapshots of different versions of data items, and conflict detection is not performed until they are actually written to the database. So this is also an optimistic concurrency control.

2.3.5 Summary on concurrency control technology

The above is only an introduction to several common concurrency control technologies, and does not involve the explanation of particularly complicated principles. The reason for doing this is to really clarify the principles and implementation details. There are too many things to be involved, and the length is too long. From the perspective of the author and the reader, it is not an easy task, so only the core idea of ​​its realization A brief introduction to the main points of implementation is given, and the other parts are covered in one stroke. The second is that the implementation of concurrency control is too diverse. There are many variants of implementation based on blocking. The implementation of multi-version concurrency control of mvcc is even more diverse, and it is often combined with other concurrency control methods such as blocking. use.

3. Failure and failure recovery technology

3.1 Why do we need fault recovery technology

Failures may occur during database operation. These failures include transaction failures and system failures.

  • Transaction failure: For example, illegal input, system deadlock, causing the transaction to be unable to continue execution.
  • System failure: For example, the system crashes or stops due to software bugs or hardware errors.

These failures may cause damage to the transaction and database state, so a technology must be provided to recover from various failures to ensure database consistency, atomicity and durability of transactions. The database usually records the operation of the database in the form of a log to recover in the event of a failure, so it can be called a log recovery technology.

3.2 The execution process of the transaction and possible problems

Database transaction

The execution process of the transaction can be simplified as follows:

  1. The system will open up a private workspace for each transaction
  2. Transaction read operations will copy data items from the disk to the work area, and all updates before the write operation will act on the copy in the work area.
  3. The write operation of the transaction will output the data to the buffer in the memory, and the buffer manager will write the data to the disk after a suitable time.

Due to the immediate and delayed modifications of the database, the following situations may exist during the execution of the transaction:

  • A failure occurred before the transaction was committed, but some of the changes made by the transaction to the database have been written to the disk database. This leads to the destruction of the atomicity of the transaction.
  • The transaction had been committed before the system crashed, but the data was still in the memory buffer and not written to disk. The submitted changes will be lost when the system is restored. This is the destruction of transaction durability.

3.3 Types and formats of logs

  • &lt;T,X,V1,V2&gt;: Describe a database write operation, T is the unique identifier of the transaction that performs the write operation, X is the data item to be written, V1 is the old value of the data item, and V2 is the new value of the data item.
  • &lt;T,X,V1&gt;: The undo operation of the database write operation restores the X data item of transaction T to the old value V1. Insert during the transaction recovery phase.
  • &lt;T start&gt;: Transaction T begins
  • &lt;T commit&gt;: Transaction T commit
  • &lt;T abort&gt;: Transaction T aborted

Regarding logs, there are the following two rules

  • 1. The system will append corresponding log records at the end of the log file before modifying the database.
  • 2. When the commit log record of a transaction is successfully written to the disk, it is said that the transaction has been committed, but the changes made by the transaction may not be written to the disk

3.4 The core idea of ​​log recovery

  • Undo transaction undo: restore all data items updated by the transaction to the old value in the log, and insert a &lt;T abort&gt;record when the transaction is undone .
  • Redo transaction redo: restore all data items updated by the transaction to the new value in the log.

The transaction is normally rolled back/redo is aborted due to a transaction failure. When the
system recovers from a crash, it will redo first and then undo.

The following transactions will be undone: Only &lt;T start&gt;records are included in the log, but neither &lt;T commit&gt;records nor records are included &lt;T abort&gt;.

The following transactions will be redo: the log includes &lt;T start&gt;records, but also &lt;T commit&gt;records or &lt;T abort&gt;records.

Suppose the log record is as follows when the system recovers from a crash

<T0 start>
<T0,A,1000,950>
<T0,B,2000,2050>
<T0 commit>
<T1 start>
<T1,C,700,600>

Since T0 has both a start record and a commit record, the transaction T0 will be redone and the corresponding redo operation will be performed.
Since T1 only has a start record, T1 will be cancelled, and the corresponding undo operation will be executed. After the cancellation is completed, an abort record will be written.

3.5 Recovery process for transaction failure abort/normal rollback

  1. Scan the log from back to front &lt;T,X,V1,V2&gt;, and write the old value V1 to the data item X for each record of the shape of the transaction T.
  2. Write a special read-only record in the log &lt;T,X,V1&gt;, which means that the data item is restored to the old value V1.
    This is a read-only compensation record and does not need to be undone based on it.
  3. Once the &lt;T start&gt;log record is found , stop scanning and write a
    &lt;T abort&gt;log record to the log .
    Database transaction

3.6 Recovery process when the system crashes (with checkpoint)

A checkpoint is &lt;checkpoint L&gt;a special log record of the form. L is a collection of transactions that have not yet been committed when the checkpoint record is written. The system guarantees that the modifications to the database by the transactions that have been committed before the checkpoint have been written to disk, and no redo is required. . Checkpoints can speed up the recovery process.

The recovery process when the system crashes is divided into two phases: redo phase and undo phase.

Redo phase:

  1. The system scans the log forward from the last checkpoint, and sets the undo-list of the transaction to be redone to the L list in the checkpoint log record.
  2. If &lt;T,X,V1,V2&gt;the update record or &lt;T,X,V&gt;the compensation cancellation record is found, the operation will be redone.
  3. When &lt;T start&gt;records are found , T is added to the undo-list.
  4. If found &lt;T abort&gt;or &lt;T commit&gt;recorded, remove T from the undo-list.

Withdrawal phase:

  1. The system scans the log backward from the end
  2. If the log record belonging to the transaction in the undo-list is found, the undo operation is performed
  3. When a &lt;T start&gt;record of T of the transaction in the undo-list is found , a &lt;T abort&gt;record is written ,
    and T is removed from the undo-list.

4. If the undo-list is empty, the undo phase ends

Summary: First redo the updates of all transactions in the log record in order, and perform the undo operation of the update operation in the reverse order for the transactions that need to be undone.

3.6.1 An example of system crash recovery

The log before recovery is as follows, the system crashed after writing the last log record

<T0 start>
<T0,B,2000,2050>
<T2 commit>
<T1 start>
<checkpoint {T0,T1}>   //之前T2已经commit,故不用重做
<T1,C,700,600>
<T1 commit>
<T2 start>
<T2,A,500,400>
<T0,B,2000>
<T0 abort>   //T0回滚完成,插入该记录后系统崩溃

Database transaction

4. Summary

Transaction is the basic unit of database system for concurrency control, the basic unit of database system failure recovery, and thus the basic unit of maintaining database state consistency. ACID is the basic feature of transactions. The database system guarantees the ACID of transactions through concurrency control technology and log recovery technology, so that the following conceptual architecture of database transactions can be obtained.

Database transaction

Guess you like

Origin blog.51cto.com/14954398/2584587