Ali interviewer: How does MYSQL implement ACID?

WeChat official account link to the original text , more comfortable typesetting

As a back-end freshman of a big factory with two books ashore, I know the hardships of no one leading me all the way, and I want to share my mental journey and experience with everyone. The series of interviews with major manufacturers in the later stage is being continuously updated...

1. Preamble

Before, a classmate was asked on the second side of Ali: How does MYSQL implement ACID? In fact, if you simply introduce what ACID is, everyone will definitely be able to answer it. However, if you want to answer how to realize the ACID characteristics at the bottom, you still have to test your skills!

Today, the author briefly talks about my understanding of the realization principle of ACID characteristics. This article mainly discusses the ACID implementation principle under the MYSQL InnoDB engine, and briefly reviews what is a transaction and the isolation level.

2. Transactions and ACID

What is business? The concepts given in the book are many and difficult to understand. The author's understanding of transactions: a series of operations, either all succeed or all fail. It has the four characteristics of ACID. Under concurrency, there may be concurrency problems such as dirty reads, phantom reads, and non-repeatable reads, so four isolation levels are introduced.

Transaction ACID properties

As a relational database, how does MYSQL guarantee ACID for the most common InnoDB engine?

  • (Atomicity) Atomicity: Some column operations either all succeed or all fail
  • (Isolation) Isolation: the result of a transaction is only available if other transactions are committed
  • (Consistency) Consistency: The database always changes from one consistent state to another consistent state (data before and after transaction modification is generally guaranteed to be transferred consistently)
  • (Durability) Persistence: After the transaction is committed, the data modification is permanent.

atomicity

Before talking about atomicity, I have to popularize something for everyone - undo log, what is this? If you want to learn more or want to know how it is implemented internally, you can read the book carefully. Here I will briefly share my understanding. Knowing this, the interview is basically enough.

Undo log, which is a rollback log, can be used to implement isolation MVCC and ensure atomicity. MVCC will be discussed later. The key to achieving atomicity is to be able to undo all successfully executed SQL statements when the transaction is rolled back.

When a transaction modifies the database, InnoDB will generate a corresponding undo log. The undo log will save the old version of the data before the transaction started. When an exception occurs in the transaction, it will roll back to the old version. When a rollback occurs, InnoDB will do the opposite logical operation based on the contents of the undo log.

  • insert statement, delete will be executed when rolling back;
  • delete statement, insert will be executed when rolling back;
  • The update statement executes the opposite update when rolling back to change the data back.

In short, the atomicity of MYSQL is guaranteed by undo log. I have made a summary of the role of undo log:

Function : undolog records the old version data before the start of the transaction, which is used to implement rollback, guarantee atomicity, and implement MVCC. The old version before data modification will be saved in undolog, and then the line record has a hidden field rollback pointer pointing to the old version.

Persistence

Before talking about persistence, we have to know the redo log first. The old rules, if you want to learn and understand reading books, here is only the author's interview and answer sharing.

Let's take a small life case to understand:

Redo log is a physical log. It is similar to a small cart for unloading. Wouldn’t it be a waste of time if we unload each item and put it into storage (low efficiency and finding a suitable storage location). At this time, if there is a trolley, we store the goods in the trolley first, and then store the goods in the warehouse when the trolley is full. Wouldn't it greatly increase the efficiency.

A similar idea is also used in MYSQL. When we update the database, we first record the update operation in the redo log, and wait until the redo log is full or MYSQL is idle before flashing the disk. In fact, it is the WAL technology that is often mentioned in MySQL. The full name of WAL is Write-Ahead Logging. Its key point is to write the log first, and then write the disk, that is, install the trolley first, and then install the library when it is not busy.

In short, the persistence of MYSQL is guaranteed by the redo log. I made a summary of the role of the redo log:

The function of the redo log
physical log : it will record the modification of the data after the transaction is started. The crash-safe feature : the space is fixed, and it will be written in a loop after writing. There are two pointers: write pos pointing to the current recording position, and checkpoint pointing to the position to be erased , redolog is equivalent to a pick-up trolley. When there are too many goods, it is too late to put them into the warehouse one by one. So, put the goods into the trolley first, and wait until the goods are not much or the trolley is full or the store is free. The goods are delivered to the warehouse. It is used for crash-safe, and the redo log can be used to recover in case of abnormal power failure of the database.

The following is just for understanding:
Write process : first write redo log buffer, then write to the page cache of the file system, at this time there is no persistence, and then fsync persists to disk.
Write strategy : Control according to the innodb_flush_log_at_trx_commit parameter (my memory: innodb flushes the log with the commit method of the transaction) 0 --> only leave the redo log in redo log buffer1 when the transaction is committed --> persist the redo log directly to the disk (So ​​there is a double "1" configuration, which will be discussed later) 2 --> just write the redo log to the page cache.

isolation

Speaking of isolation, we all know that MYSQL has four isolation levels to solve existing concurrency problems. Dirty reads, phantom reads, and non-repeatable reads.

So how is isolation achieved at different isolation levels? What is the specific implementation principle? Let's talk about it next, it doesn't matter if you don't understand it, the old rules, there will be a summary at the end!

In a word: lock + MVCC.

Lock

  • table lock
    • lock table table_name read/write
    • myisam executes select to automatically add read locks, and executes update/delete/insert to automatically add write locks
    • The table has a read lock, which will not block the read operations of other threads and block the write operations
    • A write lock is added to the table, and both read and write operations are blocked
  • row lock
    • type of lock
      • Gap lock-gap lock : lock the interval range, prevent phantom reading, open left and right, only take effect under the repeatable read isolation level—|—in order to prevent multiple transactions from inserting records into the same range, which will lead to phantom reading problem generation
      • Record lock-record Lock : lock the row record, the index of the index, the index is invalid, it is a table lock
      • Next key lock-next-key Lock : record lock+gap lock left open and right closed** (to solve phantom reading**)
    • lock mode
      • Shared lock-read lock-S lock
        • select … lock in share mode
        • Read locks are held, other read locks can be added, and write locks cannot be added
      • Exclusive lock - write lock - X lock
        • select … for update
        • Holds a write lock, and neither can add a read lock nor a write lock
      • Intention lock: read intent lock + write intent lock
      • self-increasing lock
    • Add it when needed, not release it immediately, release it after the transaction is committed, two-stage lock protocol
  • Global lock - logical backup of the whole database
  • deadlock
    • When two or more transactions occupy each other on the same resource and request to lock, they will wait for each other and block infinitely
    • InnoDB rolls back the transaction with the least exclusive row-level locks
    • Set the lock wait timeout
  • Optimistic locking and pessimistic locking
    • The pessimistic lock uses the database's own lock mechanism - write more
    • Optimistic locking uses version mechanism or CAS algorithm - read more and write less, rarely conflicts

MVCC

Multi-version concurrency control.
Summary of principle extraction : use version chain + Read View
detailed explanation :
version chain: the same row of data may have multiple versions
innodb data table, each row of data records will have several hidden fields, row_id, transaction ID, rollback pointer.

1. Innodb adopts the primary key index (clustered index), and will use the primary key to maintain the index. If the table has no primary key, the first non-empty unique index will be used. If there is no unique index, the hidden field row_id will be used as the primary key index.
2. When a transaction starts, it will apply for a transaction ID from the system, which will be strictly incremented, and will insert the ID of the transaction that operated it most recently into the row record. 3. The undolog will record the data of the
old version before the transaction, and then the rollback pointer in the row record will point to the old version position, thus forming a version chain. Therefore, the undo log can be used to implement rollback to ensure atomicity, and it can also be used to implement the MVCC version chain.

insert image description here
Figure 3 version chain formation

Read View Under the Read Committed isolation level, a Read View will be generated for each query. Rereadable Read only generates a Read View at the beginning of the transaction, and this Read View will be used for each subsequent query to achieve different isolation boundaries. .

What is included in Read View? (Consistency view)
an array + up_limit_id (low water level) + low_limit_id (high water level) (the up and low here are not wrongly written, they are defined in this way) 1. The
array contains the current active transaction ID when the transaction starts (uncommitted transaction ), the low water level is the minimum ID of the active transaction, and the high water level is the next transaction ID to be allocated, that is, the current maximum transaction ID+1.

How are data visibility rules implemented?
The visibility rules of the data version are obtained based on the comparison between the row trx_id of the data and the consistent view (Read View).

The view array divides all row trx_id into several different situations:
insert image description here
Figure 4 Data version visibility rules

Reading principle :
To access data A for a certain transaction T, first obtain the transaction id in the data A (obtain the transaction ID of the transaction that recently operated on it), and compare the readview generated when the transaction T starts:

1. If it is on the left side of readview (smaller than readview), it means that this transaction can access the data (on the left means that the transaction has been submitted) 2. If it is on the right side
of readview (larger than readview), it means that this version is made by 3. If the current transaction
is in the uncommitted transaction collection:
a. If the row trx_id is in the array, it means that this version is generated by a transaction that has not yet been committed and is not visible;
b. If the row trx_id is not in the array, it means that this version is generated by a transaction that has already been submitted, which can be seen.

Not accessible, get the roll_pointer, and get the previous version through the version chain.
According to the transaction ID of the historical version of the data, compare it with the view array again.

After this execution, although this row of data has been modified during the period, no matter when transaction A queries, the result of seeing this row of data is consistent, so we call it consistent reading.

consistency

Consistency is the ultimate goal pursued by transactions. The atomicity, persistence, and isolation mentioned above are actually to ensure the consistency of the database state.

Of course, the above are all guarantees at the database level, and the realization of consistency also needs to be guaranteed at the application level. That is to say, your business. For example, the purchase operation only deducts the user's balance and does not reduce the inventory. It is definitely impossible to guarantee the consistency of the state.

Follow my official account "Xiaolong coding", let's discuss together, help revise your resume, answer questions, and analyze projects, just to help you who are confused get your favorite offer efficiently!

In the follow-up, the face-to-face interview questions and analysis of the big factory will be updated one after another. The big factory will recommend it directly to the department head , and there will also be an exchange group for everyone to discuss and make progress together. keep it up!

Remember to like , watch , and forward .

insert image description here

Guess you like

Origin blog.csdn.net/qq_43666365/article/details/120187440