[Interview] 12 consecutive questions about MySQL affairs

foreword

Gold, Three, Silver and Four are coming soon. I have prepared 12 consecutive questions about affairs. I believe it will be helpful for everyone to read.

1. What is a database transaction?

A transaction consists of a limited sequence of database operations that are either all executed or not executed at all, and are an indivisible unit of work.

If A transfers 100 yuan to B, first deduct 100 yuan from A's account, and then add 100 yuan to B's account
. If A’s 100 yuan has been deducted, but there is no time to add it to B, the banking system is abnormal, and finally A’s balance decreases, but B’s balance does not increase. So you need a transaction, roll back A's money, it's that simple.

2. Four characteristics of transactions

  • Atomicity: The transaction is executed as a whole, and the operations on the database contained in it are either all executed or none of them are executed.
  • Consistency: It means that the data will not be destroyed before the transaction starts and after the transaction ends. If account A transfers 10 yuan to account B, the total amount of A and B will remain the same regardless of success or failure.
  • Isolation: When multiple transactions access concurrently, the transactions are isolated from each other. One transaction should not be interfered by other transactions, and multiple concurrent transactions should be isolated from each other.
  • Persistence: It means that after the transaction is committed, the operational changes made by the transaction to the database will be persistently saved in the database.

3. What are the isolation levels of transactions? What is the default isolation level for MySQL?

There are four isolation levels for transactions: Read Uncommitted, Read Committed, Repeatable Read, and Serializable.

  • Read uncommitted isolation level: Only two data cannot be modified at the same time, but when the data is modified, even if the transaction is not committed, it can be read by other transactions. This level of transaction isolation includes dirty reads and repeated reads , The problem of phantom reading;
  • Read committed isolation level: the current transaction can only read the data submitted by other transactions, so the isolation level of this transaction solves the problem of dirty reads, but there are still problems of repeated reads and phantom reads;
  • Repeatable read: The repeatable read isolation level limits the reading of data and cannot be modified, so it solves the problem of repeated reading, but when reading range data, data can be inserted, so phantom reading still exists question;
  • Serialization: The highest isolation level for transactions, under which all transactions are executed in serialized order. All concurrent problems of dirty reads, non-repeatable reads, and phantom reads can be avoided. However, under this transaction isolation level, transaction execution consumes a lot of performance.

The default transaction isolation level of Mysql is 可重复读(RR).

4. Why does Mysql choose RR as the default isolation level?

We know that Mysql has four database isolation levels, namely 读未提交、读已提交、可重复读、串行化. The read uncommitted isolation level is too low, 会有脏读问题and the serialization isolation level is too high, 会影响并发读. Then there are read committed (RC) and repeatable read (RR).

So, why does Mysql choose RR as the default isolation level ?

Our MySQL databases are generally deployed in clusters, with master and slave databases. The main library is responsible for writing, and the slave library is responsible for reading. After the master library is written, master-slave replication will be performed to synchronize the data to the slave library.
insert image description here

The slave library obtains the bin log log from the main library and executes the bin log to ensure the data consistency between the slave library and the main library.

Actually, bin logthere are three formats, namely statement, , rowand mixed. If it is statementa format, the original text bin logis recorded . SQLIn the early days of Mysql, bin logthere was only this kind of log format statement. At the isolation level of RC, data inconsistency may occur.

This bug is also documented on the MySQL official website.
insert image description here

We can reproduce this bug, assuming the table structure is as follows:

   CREATE TABLE t (
      a int(11) DEFAULT NULL,
      b int(11) DEFAULT NULL,
      KEY a (a)
    ) ENGINE=InnoDB DEFAULT CHARSET=latin1;
insert into t values(666,2),(233,1);

Execute these two transactions:
insert image description here

After execution, because the isolation level of the transaction is RC, transaction A will b=2add a , so the execution result is (888,2)that transaction B is not affected by the row-level lock when it is executed, and the two data become (888,2),(233,2).

Under the RC isolation level, let's look at bin logthe log again. When the two transactions are executed, bin logthe log of transaction B will be recorded first, because it is committed first, and then bin logthe log of transaction A will be generated. When bin logthe log format is statement, binlogthe original text is recorded, that is, the update t set b=2 where b = 1; is recorded first, and then recorded update t set a=888 where b=2.

For Jiangzi, when the master database synchronizes the binlog to the slave database and executes SQL playback, the data in the database becomes (888,2) and (888,2), and the data in the master database and the slave database are inconsistent. Under RR (repeatable read database isolation level), this situation will not happen because of the existence of gap locks. Therefore, Mysql chooses RR as the isolation level by default.

5. Why do many major manufacturers choose the RC database isolation level?

The most obvious feature of major Internet companies and some traditional enterprises is high concurrency. Then big factories are more inclined to improve the concurrent reading of the system.
RC isolation level, concurrency is better than RR, why?

Because of the RC isolation level, only row locks need to be added to the modified records during the locking process. The RR isolation level also needs to add Gap Lock and Next-Key Lock, that is, under the RR isolation level, the probability of deadlock is much higher. In addition, RC also supports semi-consistent reading, which can greatly reduce the conflict of row locks when updating statements; for records that do not meet the update conditions, locks can be released in advance to improve concurrency.

  • Consistent read: also known as snapshot read. The snapshot is the historical version before the current row data. Snapshot read is the use of snapshot information to display query results based on a certain point in time, regardless of changes performed by other transactions running at the same time.
  • Current read: The current read rule is to be able to read the latest values ​​of all submitted records.
  • Semi-consistent read: For an update statement, if the record matched by the where condition has been locked, InnoDB will return the most recently submitted version of the record, and the upper layer of MySQL will judge whether it needs to be locked.

6. In concurrent scenarios, what consistency problems does the database have?

  • Dirty read: If a transaction reads data modified by another uncommitted transaction, we call it a dirty read phenomenon.
  • Non-repeatable read: In the same transaction, read multiple times before and after, and the read data content is inconsistent
  • Phantom reading: If a transaction first queries some records according to certain search conditions, and when the transaction is not committed, another transaction writes some records (such as insert, delete, update) that meet those search conditions, which means that phantom reading.
  • Lost update: Both transaction A and transaction B modify the same data, transaction A modifies first, transaction B modifies later, and transaction B's modification overwrites transaction A's modification.

7. What concurrency problems will exist in the four isolation levels?

isolation level dirty read non-repeatable read Phantom reading
Read Uncommitted (RU)
Read Committed (RC) ×
Repeatable Read (RR) × ×
Serializable × × ×
  • At the RU isolation level, dirty reads, non-repeatable reads, and phantom reads may occur.
  • Under the RC isolation level, non-repeatable reads and phantom reads may occur.
  • Under the RR isolation level, phantom reading may occur.
  • At the Serializable isolation level, transactions are forced to execute serially, and there will be no dirty reads, non-repeatable reads, or phantom reads.

8. How is the isolation level of MySQL implemented?

MySQL's isolation level is implemented through MVCC and locking mechanisms.

  • The RU isolation level is the lowest, there is no lock, and there is a dirty read problem. Transactional reading does not lock, and does not block the reading and writing of other transactions
  • RC and RR isolation levels can be achieved through MVCC.
  • Serialization is achieved through a locking mechanism. Read plus shared lock, write plus exclusive lock, read and write mutual exclusion. If there are uncommitted transactions modifying certain rows, all statements that select these rows will block.

9. What is MVCC and its underlying principles?

MVCC, ie Multi-Version Concurrency Control(multi-version concurrency control). It is a method of concurrency control, generally in the database management system, to achieve concurrent access to the database.

In layman's terms, there are multiple versions of data in the database at the same time, not multiple versions of the entire database, but multiple versions of a certain record exist at the same time. When a transaction operates on it, you need to check this one The transaction version id of the hidden column of the record, compare the transaction id and judge which version of the data to read according to the transaction isolation level.

To understand the underlying principles of MVCC, you need to review a lot of relevant knowledge points. Let’s analyze it according to the following outline:

  • What is snapshot read and current read
  • implicit field
  • What is Undo Log
  • What is a snapshot version chain
  • transaction version number
  • What is Read View
  • What is the process of querying a record based on MVCC
  • Based on MVCC, RC isolation level, analysis of non-repeatable read problems

9.1 What is snapshot read and current read

  • Snapshot read: read the visible version of the record data (there are old versions). Without locks, ordinary select statements are all snapshot reads.
  • Current read: The latest version of the recorded data is read, and all the explicit locks are currently read.

Snapshot read is the basis of MVCC implementation.

9.2 Implicit fields

For InnoDBthe storage engine, each row has two hidden columns trx_id、roll_pointer, and if there is no primary key and non-NULL unique key in the table, there will be a third hidden primary key column row_id.

9.3 What is Undo Log

undo log, rollback log, used to record the information before the data is modified. Before the table records are modified, the data will be copied to the undo log first. If the transaction is rolled back, the data can be restored through the undo log.

It can be considered that when deletethere is a record, undo loga corresponding insertrecord will be recorded in , and when updatethere is a record, it will record a corresponding opposite updaterecord.

undo logWhat's the use?

  • Atomicity and consistency are guaranteed when a transaction is rolled back.
  • Used for MVCC snapshot reads.

9.4 Snapshot version chain

When multiple transactions operate a row of data in parallel, the modification of the row of data by different transactions will generate multiple versions, and then through the rollback pointer (roll_pointer), they will be connected into a linked list. This linked list is called a version chain. as follows:
insert image description here

9.5 Transaction version number

Before each transaction is started, a self-increasing transaction ID will be obtained from the database, and the execution sequence of the transactions can be judged from the transaction ID (trx_id). This is the transaction version number.

9.6 What is Read View

What is Read View? It is the read view generated when the transaction executes the SQL statement. In fact, in innodb, each SQL statement will get one before execution Read View. It is mainly used for visibility judgment, that is, to judge which version of data is visible in the current transaction~

In Read View, there are several important attributes.

  • m_ids: A list of uncommitted read and write transaction IDs in the current system.
  • min_limit_id: Indicates the smallest transaction id among active read and write transactions in the current system when Read View is generated, that is, the minimum value in m_ids.
  • max_limit_id: Indicates the id value that should be assigned to the next transaction in the system when Read View is generated.
  • creator_trx_id: Create the transaction ID of the current Read View

Read view matching condition rules (very important) are as follows :

  1. If it is a data transaction ID trx_id < min_limit_id, it means that the transaction that generated this version has been committed before generating Read View (because the transaction ID is incremented), so this version can be accessed by the current transaction.
  2. If trx_id>= max_limit_id, it means that the transaction that generates this version is generated after Read View is generated, so this version cannot be accessed by the current transaction.
  3. If so min_limit_id =<trx_id< max_limit_id, it needs to be divided into 3 situations for discussion

(1). If m_ids contains trx_id, it means that the transaction has not been submitted at the time of Read
View generation, but if the trx_id of the data is equal to creator_trx_id, it indicates that the data is generated by itself, so it is visible.
(2) If m_ids contains trx_id, and trx_id is not equal to creator_trx_id, then
when Read View is generated, the transaction is not committed, and it is not produced by itself, so the current transaction is also invisible;
(3). If m_ids does not contain trx_id, it means Your transaction has been submitted before Read View is generated, and the result of the modification can be seen by the current transaction.

9.7 What is the process of querying a record based on MVCC

  1. Get the version number of the transaction itself, that is, the transaction ID (trx_id)
  2. Get Read View
  3. Query the obtained data, and then compare the transaction version number in Read View.
  4. If the visibility rules of Read View are not met, the historical snapshot in the Undo log is required;
  5. Finally, return the data that meets the rules

InnoDB implements MVCC by Read View+ Undo Logsaving Undo Loghistorical snapshots, and Read Viewvisibility rules help determine whether the current version of data is visible.

9.8 Based on MVCC, RC isolation level, analysis of non-repeatable read problems

In order to deepen everyone's understanding of MVCC, let's analyze an example: For example, RC isolation level, there is a non-repeatable read problem, let's analyze this process.

  1. First create the core_user table and insert a piece of initialization data, as follows:
    insert image description here

  2. The isolation level is set to read committed (RC), and transaction A and transaction B perform query and modification operations on the core_user table at the same time.

事务A: select * fom core_user where id=1
事务B: update core_user set name =”曹操”

insert image description here

Finally, the query result of transaction A is the record of name=Cao Cao. Let’s analyze the execution process based on MVCC:
(1) A starts a transaction, and first obtains a transaction ID of 100
(2) B starts a transaction, and obtains a transaction ID of 101
(3) Transaction A generates a Read View, and the values ​​corresponding to the read view are as follows
insert image description here

Then go back to the version chain: start picking visible records from the version chain:
insert image description here

It can be seen from the figure that the content of the column name in the latest version is Sun Quan, and the value of trx_id in this version is 100. Judging the read view visibility rule check:

min_limit_id(100)=<trx_id(100<102;
creator_trx_id = trx_id =100;

From this, it can be seen that the current transaction is visible for the record of trx_id=100. So it is found that the name is Sun Quan's record.
(4) Transaction B performs modification operation and changes the name to Cao Cao. Copy the original data to the undo log, then modify the data, and mark the transaction ID and the address of the previous data version in the undo log.
insert image description here

(5) Submit the transaction

(6) Transaction A executes the query operation again, and generates a new Read View, and the corresponding values ​​of the Read View are as follows
insert image description here

Then go back to the version chain again: pick the visible records from the version chain:
insert image description here

It can be seen from the figure that the content of the column name in the latest version is Cao Cao, and the trx_id value of this version is 101. Judging the Read View visibility rule check:

min_limit_id(100)=<trx_id(101<max_limit_id(102);
但是,trx_id=101,不属于m_ids集合

Therefore, the record trx_id=101 is visible to the current transaction. So the SQL query is the record whose name is Cao Cao.

To sum up, 在读已提交(RC)隔离级别下in the same transaction, two identical queries read the same record (id=1), but returned different data (the first time it was found to be Sun Quan, and the second time it was found out to be Cao Cao That record), so the RC isolation level, there is a non-repeatable read concurrency problem.

At the RR isolation level, in a transaction, each query will only get one read view, which is shared by the copy, so as to ensure that the data of each query is the same**, so it solves the problem of non-repeatable read Ha** of concurrency issues.

10. How to deal with large and long transactions? Please give some ways to deal with it.

Dealing with large and long transactions is a very important part of database design and optimization. Here are some commonly used processing methods:

  • Split a large transaction into small transactions: Split a large transaction into multiple small transactions, reduce the amount of data operated by each transaction, reduce the risk of lock competition and deadlock, and improve concurrency performance.
  • Optimize query statements: For query operations in long transactions, query performance can be improved by optimizing query statements, such as adding indexes and optimizing SQL structures.
  • Avoid occupying locks for a long time: Long transactions will occupy lock resources, causing other transactions to be unable to access corresponding data. Therefore, it is necessary to shorten the execution time of transactions as much as possible to avoid occupying locks for a long time.
  • Avoid long transaction waiting: Long transactions may cause other transactions to wait too long, affecting system performance and availability. Therefore, it is necessary to shorten transaction execution time as much as possible to avoid long transaction waiting.
  • Optimize transaction logs: Long transactions will occupy a large amount of transaction logs, resulting in a decrease in database performance. Therefore, it is necessary to optimize the writing and flushing strategies of transaction logs to improve performance.
  • Use scheduled tasks: Long-running transactions can be executed regularly through scheduled tasks to avoid long-term resource occupation.
  • Appropriately increase hardware resources: If the above methods cannot solve the problem, you can appropriately increase hardware resources, such as increasing memory, CPU, storage, etc., to improve system performance.

11. How to optimize the performance of MySQL transactions? Please list some optimization methods.

MySQL transaction performance optimization is one of the keys to improving database performance. The following are some commonly used optimization methods:

  • Choose the right storage engine: Different storage engines have different characteristics and performance, so you need to choose the right storage engine according to specific business needs, such as MyISAM, InnoDB, Memory, etc.
  • Use appropriate indexes: Appropriate indexes can improve the efficiency of query and update operations, so you need to add appropriate indexes based on actual business conditions to avoid full table scans.
  • Avoid unnecessary locking: Unnecessary locking will reduce concurrency performance, so unnecessary locking needs to be avoided, such as optimizing query statements, using optimistic locking, etc.
  • Select an appropriate transaction isolation level: Different transaction isolation levels have different characteristics and performance impacts, so you need to select an appropriate transaction isolation level based on actual business conditions.
  • Reduce the scope of transactions: Minimize the scope of transactions as much as possible, and split large transactions into multiple small transactions, which can reduce the risk of lock competition and deadlock, and improve concurrency performance.
  • Use an appropriate transaction commit method: For transactions that do not need to be rolled back, you can use the automatic commit method to reduce the number of commit operations and improve performance.
  • Avoid long transactions: Long-running transactions will take up a lot of resources and affect concurrency performance. Therefore, it is necessary to shorten the execution time of transactions as much as possible to avoid long-term transaction waiting.
  • Optimizing the hardware and configuration of the database server: Optimizing the hardware and configuration of the database server can improve database performance, such as increasing memory, optimizing disk performance, and adjusting cache size, etc.
  • Use a distributed database: For high-concurrency scenarios, you can use a distributed database architecture to distribute data to multiple database nodes to improve concurrency performance.

Of course, these methods may not be applicable to all business scenarios, and need to be selected and adjusted according to specific situations.

12. The basic principle of Innodb's transaction implementation

InnoDB is a commonly used storage engine in MySQL that supports advanced features such as transactions and row-level locks. The following are the basic principles of InnoDB's implementation of transactions:

  • In InnoDB, each transaction has a unique transaction ID (transaction ID), which is used to distinguish different transactions.
  • InnoDB uses MVCC (multi-version concurrency control) to achieve transaction isolation. Each modification will generate a new version. When querying, you can only see the version that has been submitted before the query starts, so as to avoid reading dirty data.
  • When performing an update operation in a transaction, InnoDB will lock the relevant data rows as needed to ensure the atomicity and consistency of the transaction. Row-level locks in InnoDB are implemented by locking index nodes, so for the same data row, different transactions can access and modify data through different indexes.
  • Transactions in InnoDB support ACID properties, namely atomicity, consistency, isolation, and durability. InnoDB guarantees the atomicity and durability of transactions through redo log and undo log. Redo log records transaction modification operations, while undo log records transaction rollback operations. When the system crashes or other failures occur, InnoDB can restore data to the state before transaction commit through redo log and undo log to ensure data consistency and durability.
  • Transaction isolation levels in InnoDB include read uncommitted, read committed, repeatable read, and serializable. The default isolation level is Repeatable Read, which is implemented using locks and MVCC mechanisms. In the case of high concurrency, if the lock granularity is too large or the lock competition is too intense, it may cause performance bottlenecks or deadlock problems, so it needs to be optimized for specific scenarios.

reference and thanks

My Ali two sides, why does MySQL choose Repeatable Read as the default isolation level?

Guess you like

Origin blog.csdn.net/u011397981/article/details/130424878