Newbie’s Perspective: Understand the basic knowledge of 3TS Tencent transaction processing verification system in one article

I recently participated in the study of the 3TS open source project. As a newbie, I didn’t know how to start analyzing the results. I read a lot of articles, tutorials and books and finally understood it, so I wanted to write an article to help those like me. Xiaobai, didn’t all the big guys grow up from Xiaobai? Be careful of arrogance and impetuosity, forge ahead, come on, see you at the top!

Isolation ensures the relative isolation of data and transactions, and manages the execution order of multiple concurrent read and write requests (transactions). The choice of isolation level is a trade-off between data reliability and performance .

What is a transaction?
A transaction is composed of a series of operations. For example, A transfers 10,000 to B, and the combination of two operations in which A's deposit is reduced by 10,000 and B's deposit is increased by 10,000 is a transaction; a transaction has four major characteristics: (atomicity, isolation, and durability
) The purpose of sex is to achieve consistency)

  • Atomicity: Transactions either all execute successfully (commit) or both fail (rollback).
  • Isolation: Transactions do not interfere with each other. Isolation level is the focus to be discussed later
  • Durability: Once a transaction is committed, it is saved permanently.
  • Consistency: The data status before and after transaction execution is legal, that is, the integrity constraints are not violated.

There are two specific implementations of transactions, one is a lock-based protocol, and the other is based on timestamps. The current mainstream implementation is one based on timestamps, which is the MVCC mechanism;

What is an exception?
Under the protection of ACID, controlling the concurrent execution of different transactions in the database (to improve the efficiency of the system) sometimes leads to inconsistencies and exceptions due to insufficient isolation.
Transactions are a prerequisite for concurrency control. Concurrency control is to control the concurrent execution of different transactions and improve the efficiency of the system. Insufficient isolation can lead to exceptions when transactions are executed concurrently

1. Interpret the result graph

Insert image description here

1.1 Abnormal test cases

Let’s talk about the x-axis (abnormal test case) first. The exceptions defined by the SQL standard are as follows:
1. Dirty Write : An uncommitted transaction writes data written by another uncommitted transaction, which is called dirty write.
2. Dirty Read : Transaction a reads the write data of transaction B that has not yet been committed. Later, transaction b is rolled back, so what is read is dirty data.
3. Non-Repeatable Read : Transaction A reads a value, but does not modify it in any way. Another concurrent transaction B modifies the value and commits it. When transaction A reads it again, it finds that it is not the value it read for the first time, but after B modified it. The value is non-repeatable reading.
ps: Generally, databases use MVCC to generate a Read View at the beginning of the first statement of a transaction. All reads after the transaction are based on the same Read View to avoid non-repeatable read problems.
4. Phantom reading (Phantom) : Transaction A queries the value of a range, and another concurrent transaction B inserts data into this range and submits it. Then transaction A queries the same range again and finds that there is one more record, or a certain record has been deleted. Other transactions are deleted, and transaction A finds that one record is missing.
ps: Non-repeatable reading is oriented to the "same record", while phantom reading is oriented to the "same range". Although MVCC uses snapshots to solve non-repeatable reads, it still cannot avoid phantom reads. Phantom reads need to be solved through range locks.

The consistency model Coo can associate all data anomalies. Specifically, it defines a partially ordered pair (POP) graph. Any scheduling plan can be represented by the state of the pop graph. Finally, the exception is checked through the pop life cycle. If the represented pop If there is a cycle in the graph, then there is scheduling and there is data anomaly. If it does not exist, then consistency is satisfied.
Back to us, exceptions represented and defined by life cycle states are divided into three categories, namely RAT, WAT, and IAT, corresponding to the 33 exception test cases in the result table.

1.2 Transaction isolation level

The y-axis in the result graph is the isolation level of the transaction (I didn’t realize the English abbreviation at once)

There are two specific implementations of database transactions, one is based on lock protocols, and the other is based on timestamps. The current mainstream implementation is one based on timestamps, which is the familiar MVCC mechanism;

PostgreSQL uses MVCC (multi-version concurrency control technology) to prevent dirty reads. For Pg, read uncommitted and read committed are the same.
The repeatable read isolation level (default isolation level) of the MySQL InnoDB engine proposes solutions to avoid phantom reads based on different query methods:

  • For snapshot reads (ordinary select statements), phantom reads are solved through MVCC.
  • For current reading (select... for update and other statements), phantom reading is solved through next-key lock (record lock + gap lock).

However, the MySQL repeatable read isolation level does not completely solve the problem of phantom reads, but only avoids the phenomenon of phantom reads to a large extent.

(1)SQL standard isolation level

In order to solve the data anomalies caused by concurrent operations in the database, the database system uses different isolation levels to prevent the occurrence of exceptions. Four isolation levels are defined in the SQL standard. Each level specifies the modifications in a transaction, which Which ones are visible between transactions and which ones are invisible. The lower the isolation level, the higher the concurrency can be performed, but the implementation complexity and overhead are also greater.

isolation level explain possible exceptions
Read Uncommitted (RU) Before a transaction is committed, the changes it makes can be seen by other transactions. Dirty Read, Unrepeatable Read,
Read Committed (RC for short) After a transaction commits, the changes it makes will be seen by other transactions. Unrepeatable Read, Phantom Read
Read Repeatable (RR for short) The data seen during the execution of a transaction is always consistent with the data seen when the transaction is started. Of course, under the repeatable read isolation level, uncommitted changes are also invisible to other transactions. Phantom Read
Serializable (SER for short) In MySQL, only a single transaction is allowed to be executed at the same time. "Write" will add a "write lock", and "read" will add a "read lock". When a read-write lock conflict occurs, the transaction accessed later must wait for the completion of the previous transaction before it can continue to execute.

(2) Snapshot Isolation (SI)

The four isolation levels defined by the SQL standard are only applicable to lock-based transaction concurrency control. Later, someone wrote a paper and mentioned a new isolation level - SI, which will not cause dirty reads or non-repeatability . There are three types of read exceptions: phantom read and phantom read, and the read operation will not be blocked. In most application scenarios, SI can run well, but it still does not reach a serializable isolation level because it will suffer from write skew . Write skew is essentially a read-write conflict between concurrent transactions (read-write conflicts may not necessarily lead to write skew, but there must be read-write conflicts when write skew occurs), but Snapshot Isolation only checks write-write conflicts when the transaction is committed. . In order to avoid write skew, the application must adapt according to the specific situation, such as using SELECT... FOR UPDATE, or introducing write conflicts at the application layer. Doing so is equivalent to throwing a job of database transactions to the application layer.

(3)Serializable Snapshot Isolation,简称 SSI

Someone proposed SI-based serializability - SSI (PostgreSQL already supports SSI).

In order to analyze the serializability problem of transaction scheduling under SI, a paper proposed a method called Dependency Serialization Graph (DSG). By analyzing the rw, wr, ww dependencies between transactions, a directed graph can be formed. If there are no cycles in the graph, it means that the transaction scheduling sequence in this case is serializable. This algorithm is perfect in theory, but it has a fatal shortcoming, that is, it is relatively complex and difficult to use in industrial production environments. Under Snapshot Isolation, the ring formed by DSG must have two rw-dependency edges, which is "continuous" (one in and one out).

1.3 Test results

There are two types of check results of the 3TS-Coo consistency check tool:

  1. Anomaly (A for short): The database cannot recognize data anomalies, resulting in data inconsistency, meaning there is no equivalent serializable execution (or Partially Ordered Pair (POP) life cycle )
  2. Consistency:
    • Database passes (P) exception test cases with serializable results (no pop cycle), data remains consistent
    • The transaction is rolled back when rules (R), deadlock detection (D) or timeout (timeout, T) are reached

2. Tools

If a worker wants to do his job well, he must first sharpen his tools

Too much basic introduction makes it difficult to write a report. Here we introduce the basic concepts of building a 3TS test environment;

2.1 cmake

Cmake is a cross-platform build tool for building large C++ projects. It supports building, testing and packaging software. It uses platform-independent configuration files to control the compilation process and generate project files, such as VS project files or Makefiles, suitable for the compiler environment of your choice. The following is its workflow chart:
Insert image description here
configure and CMake are used to generate Makefile, which is responsible for configuring and adapting the source code to the current system.
Make performs the actual compilation process according to the rules in the Makefile and generates executable files or libraries.
Finally, make install is responsible for copying the final compiled files to the specified installation directory for use by other programs in the system.

Some reference links:

  • Official website: http://cmake.org.cn/
  • Compilation tool: https://blog.csdn.net/LSW1737554365/article/details/132079584
  • I watched a cmake tutorial video on site B and forgot to save the link.

2.2 ODBC

ODBC (Open Database Connectivity) is a standard interface technology used to manage multiple database systems. It provides a unified API (application programming interface) that allows applications to communicate with multiple different types of databases (such as Oracle, MySQL, SQL Server, etc.) through the same method.
So after installing ODBC, if you want to test any database in this project, you only need to install the database you want to test and the corresponding version of the database driver, configure it to generate a makefile, and then make odbc official website:
https://www.unixodbc.org /

2.3 PostgreSQL

Official website: https://www.postgresql.org/

# ubantu安装后,系统会创建一个数据库超级用户 postgres,密码为空
sudo apt-get update
sudo apt-get install postgresql postgresql-client

# PostgreSQL 安装完成后,自带了一个命令行工具 SQL Shell(psql),Linux 系统可以直接切换到 postgres 用户来开启命令行工具
sudo -i -u postgres

# 进入PostgreSQL,进入成功显示:postgres=#
psql
# 退出PostgreSQL,退出成功显示:postgres@用户名:~$ 
\q

# 查看用户名和密码,复制查看对应用户的加密的密码(注意去掉md5的前缀)到网站解密:https://www.somd5.com/
SELECT rolname,rolpassword FROM pg_authid;
# 修改密码
ALTER USER postgres WITH PASSWORD 'postgres';
# 新建用户和密码
create user 用户名 with password '密码';

# 查看默认端口,默认5432
sudo netstat -plunt |grep postgres

postgresql driver: https://www.postgresql.org/ftp/odbc/versions/src/

ps aux | grep postgresql-odbc   # 检查postgresql-odbc是否安装
ps aux | grep unixODBC   # 检查postgresql-odbc是否安装

Explanation of transaction isolation level on postgresql official website: http://www.postgres.cn/docs/12/transaction-iso.html

Transaction isolation levels and multi-version concurrency control (MVCC) implementation principles in PostgreSQL.

PostgreSQL provides four transaction isolation levels, one of which is Repeatable Read. Under this isolation level, each transaction will obtain a snapshot at the beginning. During the execution of the transaction, only modifications that have been committed before this snapshot can be seen, and uncommitted modifications of other concurrent transactions cannot be seen. This is achieved by using timestamps and version numbers in the database.

MVCC is a concurrency control mechanism used by PostgreSQL. It works by keeping a consistent snapshot of each transaction in the database engine, so that when each transaction performs a read operation, it can see the database state before the transaction started. When a write operation is performed, MVCC creates a new version instead of directly modifying the original data.

3. Concurrency control algorithm

In the database, multiple transactions may access the same data item. In order to ensure the ACID characteristics of the transaction, concurrent transactions must be scheduled efficiently in some way. This technology is called concurrency control technology. The implementation strategies of concurrency control technology can be divided into optimistic concurrency control and pessimistic concurrency control. These two control ideas are defined from the perspective of "when to detect conflicts".
Optimistic (OCC) : From the beginning, every operation is allowed, but when the transaction is submitted, the isolation and integrity constraints will be checked, and if there is a violation, the transaction will be terminated. In situations where conflicts are rare, an optimistic concurrency control approach is appropriate.
Optimistic (PCC) : From the beginning, each operation is checked whether it violates isolation and integrity constraints, and if it may be violated, the operation is blocked. For example, in the two-stage blocking technology, the read lock blocks the write operation of another transaction because the write operation may cause the read exception mentioned above. Therefore, the two-stage blockade technology is a pessimistic method and requires prevention in advance.

3.1 Two-stage blocking technology (2PL)

The purpose of the three-level locking protocol is to ensure data consistency on different programs, while the purpose of the two-stage locking protocol is to ensure the correctness of concurrent scheduling. In order to ensure the correctness of concurrent scheduling, DBMS generally uses the two-stage locking protocol to implement concurrent scheduling. Serializability ensures the correctness of scheduling. The specific contents are as follows:

  • Locking phase: Before reading or writing any data, the transaction must first apply for and obtain a block on the data;
  • Unlocking phase: After releasing a block, the transaction no longer applies for and obtains any other block.

If all concurrent transactions comply with the two-stage locking protocol, then any concurrent scheduling policy for these transactions is serializable
Insert image description here
Insert image description here

3.2 Multi-version concurrency control technology (MVCC)

MVCC (Multi Version Concurrency Control) solves the read consistency problem under concurrent access by maintaining historical versions of data.
One of the implementation methods of MVCC in postgres is snapshot isolation. The core idea of ​​snapshot isolation is: each transaction reads data from a snapshot of the database. (That is: all the data seen by the transaction is the data committed before the start of the transaction.) If some data has been changed by other transactions after the current transaction started, snapshot isolation can ensure that the current transaction cannot see this new value. In snapshot isolation, each read is read from a past snapshot, so there is no situation where a value is read multiple times but inconsistent results are obtained.

Insert image description here
Snapshot isolation technology will cause a write partial order exception problem, which will cause the transaction to be non-serializable. This problem can be solved through serializable snapshot isolation SSI technology.

Finally, if there may be anything wrong, you are welcome to point it out and share it. Some are official documents, some are excerpts from papers and personal understandings, and some are integration of online materials. The article is accompanied by a large number of reference link resources. Some reference articles are as follows:

Reference article

  • Detailed diagram of database transactions (Transaction) and locking (Locking) https://blog.csdn.net/weixin_45670060/article/details/119977481
  • Transactions and locks in interviews: https://zhuanlan.zhihu.com/p/187345419
  • In-depth explanation: https://www.cnblogs.com/leijiangtao/p/11911644.html
  • https://zhuanlan.zhihu.com/p/133823461
  • Official website documentation: https://axingguchen.github.io/3TS/
  • Write a slanted explanation: https://www.jdon.com/55452.html
  • Understanding the isolation level of mvcc: https://cloud.tencent.com/developer/article/1529460
  • Snapshot isolation: https://blog.csdn.net/songchuwang1868/article/details/97630005
  • Understanding the four major features of acid: https://cloud.tencent.com/developer/article/1888427
  • https://blog.csdn.net/qq_52668274/article/details/129843223

Guess you like

Origin blog.csdn.net/BinBinCome/article/details/132564526