How to achieve database read consistency

1 Introduction

Data consistency is an important indicator of data accuracy, so how to achieve data consistency? This article will learn how to achieve read and write consistency of data from the perspective of transaction characteristics and transaction level.

2 Consistency

1. Data consistency: usually refers to whether the logical relationship between associated data is correct and complete.

For example, a system implements read-write separation. The read database is the backup database for the write database. The education information Xiao Li previously entered in the system is high school. After Xiao Li studied hard, he successfully obtained an undergraduate degree. Xiao Li turned the information into an undergraduate in time, but due to the long system backup time today, when Xiao Li changed the information, the data had already been backed up. When the company's HR inquired about Xiao Li's information through the system, it was found that he was still an undergraduate, and Xiao Li's application was rejected. This is the data inconsistency problem.

2. Consistency of the database: refers to the change of the database from one consistent state to another consistent state. This is the definition of transactional consistency.

For example, there are 100 items of product A in the warehouse and 10 items of product A in the store. At 10:00 in the morning, the warehouse sends A50 pieces of goods to the store, and finally there are 50 pieces of goods in the warehouse and 60 pieces of goods in the store, so the goods are always the same. After the store receives the goods, there are still 100 pieces of commodity A in the warehouse, which leads to a database inconsistency problem. It is correct that the total number of goods A in the warehouse and store is 110, which is the consistency of the database.

3 Database transactions

A database transaction is a sequence of database operations that access and possibly operate various data items. These operations are either all performed or not performed at all, and are an inseparable unit of work. A transaction consists of all database operations performed between the beginning of the transaction and the end of the transaction.

The nature of the transaction:

  • Atomicity: All operations in a transaction are indivisible in the database, either all completed or not at all.
  • Consistency: The execution result of several transactions executed in parallel must be consistent with the result of serial execution in a certain order.
  • Isolation: The execution of a transaction is not disturbed by other transactions, and the intermediate results of transaction execution must be transparent to other transactions.
  • Durability: For any committed transaction, the system must ensure that the transaction's changes to the database are not lost, even if the database fails

4 Concurrency issues

Dirty reads, repeated reads, and phantom reads may occur in a database in a concurrent environment.

1. Dirty reads

Transaction A reads the uncommitted data of transaction B. If transaction B rolls back, the data read by transaction A is dirty.
Example: Order A requires 20 pieces of product A, and order B requires 10 pieces of product A. There are 20 items of commodity A in the warehouse. Order B first inquires, finds that the inventory is sufficient, and deducts it. During the deduction process, order A conducts a query and finds that only 10 stocks are not enough for the order quantity, and an exception is thrown. At this time, the submission of order B failed. The stock quantity is 20 again. At this time, the warehouse staff went to check the inventory and found that the quantity was 20, but the order A said that the inventory was insufficient, which was very strange.

 

2. Non-repeatable read

Repeat reading refers to the inconsistency between the data read at the beginning and the same batch of data read at any time before the end of the transaction within a transaction.
Example: The warehouse administrator queries the quantity of commodity A, and the result is 20. This is the delivery of order A, with 10 items deducted. At this time, when the administrator checked commodity A again, he found that the quantity of commodity A was 10, which was different from the result of the first query.

 

3. Phantom reading

When transaction A performs a read operation, it needs to count the total amount of data twice. After the previous query of the total amount of data, after transaction B performs the operation of adding new data and submits it, the total amount of data read by transaction A at this time. Different from the previous statistics, like hallucinations, a few more data were added for no reason, which became hallucinations.
Example: The operator queries the number of 10 orders that can be produced, calls the interface to issue 10 orders, and transaction A adds 10 orders. The operator obtains 10 orders and puts them in the warehouse, and the query finds that it becomes 30 orders.

 

5 Transaction isolation level

Read Uncommitted (uncommitted read)
A transaction can read uncommitted data of other transactions, there will be dirty reads, so it is called RU, it does not solve any problems.

Read Committed (Committed read)
A transaction can only read the data that has been committed by other transactions, but cannot read the uncommitted data of other transactions. It solves the problem of dirty reads, but there will be the problem of non-repeatable reads.

Repeatable Read (repeatable read)
It solves the problem of non-repeatable read, that is, reading the same data multiple times in the same transaction results in the same, but at this level, there is no definition to solve the problem of phantom reading.

Serializable (serialization)
In this isolation level, all transactions are executed serially, that is, operations on data need to be queued, and there is no concurrent operation of transactions, so it solves all problems.

6 Solve data read consistency

There are two solutions to solve the read consistency problem: lock-based concurrent operations (LBCC) and multi-version-based concurrent operations (MVCC)

6.1 LBCC

Since it is necessary to ensure that the data read before and after is consistent, when reading the data, lock the data I want to operate, and do not allow other transactions to modify it. This scheme is called Lock Based Concurrency Control (LBCC).

LBCC implements concurrency control through pessimistic locking.

If transaction A locks the data, other transactions cannot read or write the data until the lock is released. In this way, concurrent calls are changed to sequential calls. For most systems today, the performance is completely inadequate.

6.2 MVCC

To keep the data read before and after a transaction consistent, we can create a backup or snapshot for it when modifying the data, and then read the snapshot later. No matter how long the transaction is executed, the data seen inside the transaction is not affected by other transactions. Depending on the time when the transaction starts, each transaction may see different data for the same table at the same time. We call this scheme Multi Version Concurrency Control (MVCC).

MVCC is based on optimistic locking.

In InnoDB, MVCC is implemented through version chains in Undo log and Read-View consistency view.

6.2.1 Undo log

The undo log is a log of the innodb engine. Before a transaction is modified, the original value of the record will be saved and then modified, so that errors in the modification process can restore the original value or read by other transactions. The undo log is a log used to undo rollbacks. Before the transaction is committed, MySQL will first record the data before the update to the undo log log file. When the transaction rolls back or the database crashes, you can use the undo log to perform go back.

The operations on data changes are different, and the content recorded in the undo log is also different:

  • When adding a new record, when creating the corresponding undo log, you only need to record the primary key value of this record. If you want to roll back the insert operation, you only need to delete the record according to the corresponding primary key value.
  • When deleting a record, when creating the corresponding undo log, you need to record all the content of the data. If you want to roll back the delete statement, you need to generate the corresponding insert statement for the recorded data content and insert it into the database.
  • When updating a record, if the primary key is not updated, when creating the corresponding undo log, if you want to roll back the update statement, you need to record the content before the change. If you want to roll back the update statement, you need to update the recorded data according to the primary key. go back.
  • When updating a record, if there is an updated primary key, when creating the corresponding undo log, you need to record all the content of the data. If you want to roll back the update statement, delete the changed data first, and then execute the insert statement to put The backed up data is inserted into the database.

undo log version chain

Each piece of data has two hidden fields, trx_id and roll_pointer. trx_id represents the id of the most recent transaction, and roll_pointer points to the undo log generated before you updated the transaction.
Transaction ID: MySQL maintains a global variable. When a transaction ID needs to be assigned to a transaction, the value of the variable is assigned to the transaction as the transaction ID, and then the variable is incremented by 1.

Example:

  • The transaction A id is 1 to insert a piece of data X, the trx_id of this data is 1, and the roll_pointer is empty (the first insertion).
  • The transaction B id is 2 and this data is updated. The trx_id of this data is 2, and the roll_pointer points to the undo log of transaction A.
  • The transaction C id is 3 and the data is updated again. The trx_id of this data is 3, and the roll_pointer points to the undo log of transaction B.

Therefore, when multiple transactions are executed serially, each transaction modifies a row of data, and the hidden fields trx_id and roll_pointer will be updated. At the same time, the undo logs of multiple transactions will be concatenated through the roll_pointer pointer to form an undo log version chain.

6.2.2 Read-View consistency view

InnoDB maintains an array for each transaction, which is used to save the current active transaction ID at the moment the transaction is started. There are two water level values ​​in this array: low water level (transaction ID minimum value) and high water level (transaction ID maximum value + 1); these two water level values ​​constitute the consistency view (Read-View) of the current transaction.

ReadView mainly contains 4 more important contents:

  • m_ids: A list of transaction ids representing read and write transactions that are active in the current system when ReadView is generated.
  • min_trx_id: Indicates the smallest transaction id in the active read and write transactions in the current system when ReadView is generated, that is, the smallest value in m_ids.
  • max_trx_id: Indicates the id value that should be assigned to the next transaction in the system when ReadView is generated.
  • creator_trx_id: Indicates the transaction id of the transaction that generated the ReadView.

With this information, when accessing a record, just follow the steps below to determine whether a version of the record is visible:

  • If the value of the trx_id attribute of the accessed version is the same as the value of the creator_trx_id in ReadView, it means that the current transaction is accessing its own modified record, so this version can be accessed by the current transaction.
  • If the value of the trx_id attribute of the accessed version is less than the min_trx_id value in ReadView, it indicates that the transaction that generated this version has been committed before the current transaction generates ReadView, so this version can be accessed by the current transaction.
  • If the value of the trx_id attribute of the accessed version is greater than the max_trx_id value in ReadView, it indicates that the transaction that generates this version is opened after the current transaction generates ReadView, so this version cannot be accessed by the current transaction.
  • If the value of the trx_id attribute of the accessed version is between the min_trx_id and max_trx_id of ReadView, it is necessary to judge whether the value of the trx_id attribute is in the m_ids list. Can not be accessed; if not, it means that the transaction that generated the version when the ReadView was created has been committed, and the version can be accessed.
  • If the data of a certain version is not visible to the current transaction, then follow the version chain to find the data of the next version, continue to judge the visibility according to the above steps, and so on, until the last version in the version chain. If the last version is also not visible, it means that the record is completely invisible to the transaction, and the query result does not contain the record.

6.2.3 How to find data

1. Snapshot Read

Snapshot read, also known as consistent read, reads historical versions of data. Simple SELECTs without locking are all snapshot reads, that is, non-blocking reads without locking, which can only search for data whose creation time is less than or equal to the current transaction ID or rows whose deletion time is greater than the current transaction ID (or undeleted).

2. Current reading

The current read looks for the latest data recorded. Locked SELECTs and additions, deletions and modifications to data will perform current reads.

6.2.4 Data example

 

as the picture shows:

Transaction A id = 1 Initializes data
transaction B id = 2 Performs query operations (MVCC only reads data whose creation time is less than the current transaction ID or rows whose deletion time is greater than the current transaction ID)
The result of transaction B is (commodity A: 10 , Commodity B: 5)

Transaction C id = 3 Inserted commodity C
Transaction B id = 2 Performed a query operation (MVCC only reads data whose creation time is less than the current transaction ID or deletes rows whose time is greater than the current transaction ID)
The result of transaction B is (product A: 10, Commodity B: 5)

Transaction D id = 4 Delete commodity B
Transaction B id = 2 Perform a query operation (MVCC only reads data whose creation time is less than the current transaction ID or rows whose deletion time is greater than the current transaction ID)
The result of transaction B is (product A: 10 , Commodity B: 5)

Transaction E id = 4 Modify the quantity of commodity A
Transaction B id = 2 Perform a query operation (MVCC only reads the data whose creation time is less than the current transaction ID or deletes the row whose time is greater than the current transaction ID)
The result of transaction B is (commodity A : 10, Commodity B: 5)

Therefore, when transaction E is committed, the data obtained by the current read is significantly different from the snapshot data read by transaction B.

6.2.5 Solved problems

MVCC can solve the problem of read consistency very well. It can only see the results of updates submitted by transactions before this point in time, but cannot see the results of updates submitted by transactions after this point in time. And reduce the probability of deadlock and solve the blocking problem between read and write.

7 Summary

Both LBCC and MVCC can solve the problem of read consistency. Which method to use depends on the business scenario to choose the most appropriate method. MVCC and locks can also be used in combination. There is no best, only better.


Author: Chen Changhao

{{o.name}}
{{m.name}}

Guess you like

Origin my.oschina.net/u/4090830/blog/5580720