Mysql transaction (MVCC implementation principle), lock, sql optimization

1. Affairs

  A database transaction is a database operation sequence for accessing and manipulating various data. It is composed of the entire execution process from the beginning of the transaction to the end of the transaction. Transaction processing can be used to maintain the integrity of the database and ensure that batches of SQL are either executed in full Either all of them are not executed. Of course, only databases or tables that use the Innodb database engine have transactions in mysql.

The characteristics of the transaction:

1. Atomicity: In all operations of a transaction, either all of them are executed, or none of them are executed. If an error occurs in a certain link during the execution, it will be rolled back to the state before the transaction began to execute, ensuring that the transaction has no happened before.

2. Persistence:  After the transaction is completed, the modification to the data is permanent, even if the system fails, it does not matter

3. Isolation:   The database allows multiple transactions to read, write and modify the number, so it may lead to cross-execution and result in inconsistent data obtained in a transaction. Isolation can prevent such events from happening. Transaction isolation includes There are four isolation levels: read uncommitted, read committed, non-repeatable read, and serializable.

4. Consistency:   To ensure the integrity of the database before and after the transaction starts, the data written and read must conform to all preset rules. The previous three features are all to ensure consistency.

Implementation principle of business

Mysql has many log files, such as binary files, error logs, query logs, etc., and the Innodb engine provides two types of logs that specifically implement transactions, one is redolog (redo log),  and the other is undolog (rollback log) , where  redolog is used to ensure the durability of transactions, and undolog is used to ensure the atomicity and isolation of transactions.

 1. Realization of atomicity

      The key to achieving atomicity is to be able to undo all successfully executed sql statements when the transaction is rolled back. InnoDB implements rollback by undo log: when a transaction modifies the database, InnoDB will generate a corresponding undo log; if the transaction execution fails or rollback is called, resulting in the transaction needing to be rolled back, the information in the undo log can be used Roll back the data to what it was before the modification. The undo log is a logical log, which records information related to sql execution. When a rollback occurs, InnoDB will do the opposite of the previous work according to the contents of the undo log: for each insert, delete will be executed when rolling back; for each delete, insert will be executed when rolling back; for each update, back When rolling, an opposite update will be performed to change the data back.
2. Persistence implementation
   
     Redo log is called redo log, which is an important mechanism to ensure transaction persistence. When the mysql server unexpectedly crashes or goes down, it is a measure to ensure that the committed transactions are persisted to the disk. InnoDB manages storage space in units of pages. Any addition, deletion, and modification operations will eventually operate on a complete page, load the entire page into the buffer pool, and then modify the records that need to be modified. After the modification, it will not immediately Refresh to disk, and only one record is modified, it is too wasteful to refresh a complete data page. However, if the data is not refreshed immediately, the data is still in memory at this time. If the system crashes at this time, the final data will be lost. Therefore, to weigh the pros and cons, the redo log is introduced, that is, after the modification, it is not refreshed immediately but recorded A log, the log content is to record which page, how much offset, what data has changed. In this way, even if the system crashes, after recovery, data recovery can be performed according to the redo log. In addition, the redo log is cyclically written to a fixed file and written to disk sequentially

transaction isolation level

   Mysql is a software with client/server architecture, so there will be multiple clients connecting to one server, so the server will process multiple transactions at the same time, which may lead to the same data accessed by different transactions Yes, in theory, during the execution of a transaction, other transactions need to wait in line. After the transaction is committed, other transactions can access the data, but this has a relatively large impact on performance, so the principle of transaction isolation level is proposed.

Check the isolation level
SELECT @@global.transaction_isolation,@@transaction_isolation;
Mysql data provides four different levels of isolation levels. In actual development, different isolation levels can be selected according to different needs and scenarios. There will be certain problems in other levels except the serial level .
1. Read uncommitted (read uncommitted):  One transaction can read uncommitted data of another transaction, which will cause dirty reads (the first transaction may be rolled back), phantom reads, and non-repeatable reads. 
2. Read committed:  One transaction reads the data committed by another transaction. Although the problem of dirty reading is solved, there are problems of non-repeatable reading and phantom reading.
Non-repeatable read means that the results of the first query and the second query in a transaction are inconsistent.
3. Repeatable read (repeatable read MySQL default isolation level):  read the same data multiple times in the same transaction and return the same result, which solves dirty read and non-repeatable read, but there is still the problem of phantom read.
4. Serializable:  Transactions are executed serially, avoiding all the above-mentioned problems, high security, but low efficiency.

 Transaction isolation level implementation principle (MVCC)

MVCC, also known as multi-version concurrency control, cooperates with undolog and version chain to allow the read-write and write-read functions of transactions to be executed concurrently, thereby improving system performance.

MVCC prevents the database from locking the read operation and improves the concurrent processing capability of the database. With the help of MVCC, the read-committed and repeatable-read isolation levels can be achieved.

 Innodb's MVCC is implemented by adding two hidden columns at the end of each row of records, one holds the transaction id, and the other holds the rollback pointer.

trx_id: Every time a record is changed, the corresponding transaction id will be assigned to the trx_id hidden column.
roll_pt: Every time a record is modified, the old version will be written into the undo log, and this hidden column is equivalent to a pointer, through which the information before the modification of the record can be found.

Every time the record is updated, the old value will be put into an undolog. Even if it is an old version of the record, as the number of updates increases, all versions will be connected into a linked list by the roll_pt attribute. We put this linked list It is called a version chain, and the head node of the version chain is the latest value of the current record. In addition, each version also contains the corresponding transaction id when the version was generated, which is very important information.

What is ReadView

Snapshot read is used to determine which transaction in the version chain is visible to the current transaction.

Content contained in readview:

  • m_ids. When generating ReadView, the transaction id list of active read and write transactions in the current system, that is, not yet committed .
  • min_trx_id. When generating ReadView, the smallest transaction id among the active read and write transactions in the current system; that is, the smallest value in m_ids.
  • max_trx_id. The transaction id value that the system should assign to the next transaction when generating a ReadView.
  • creator_trx_id. The transaction id of the transaction that generated this ReadView.

How to judge which version of the record is visible to the current transaction through readview?

1. If the txr_id of the accessed version= creator_trx_id in the current readview , it means that the current transaction is accessing the record that has been modified by itself, so it is visible

2. If txr_id<min_trx_id in readview , it means that this version of the transaction has submitted the transaction before the current transaction generates readview, so it is visible to the current transaction.

3. If txr_id> max_trx_id  in readview , it means that the transaction of this version is opened after the current transaction generates readview, so the current transaction is not visible.

4. If txr_id is between min_trx_id and max_trx_id , then you need to judge whether txr_id is in m_ids, then there are two situations:

   (1) If txr_id is in m_ids, it means that when readview is created, the transaction of this version is still active, so it cannot be accessed.

   (2) If txr_id is not in m_ids, it means that when readview is created, the transaction of this version has been submitted, so it can be accessed.

The timing of Readview generation

The biggest difference between     read committed ( READ COMMITTED ) and repeatable read ( REPEATABLE READ ) lies in the timing of their generation of readview.

1. Read committed: In a transaction, a readview is generated before each data read.

2. Repeatable reading: In a transaction, readview is generated only when data is read for the first time, and each read reads data from the same readview

MVCC summary

   MVCC is to control the behavior of concurrent transactions accessing the same record through version chain and Readview or version chain and undolog. Mysql judges whether the version accessed by the current transaction id is by comparing several ids in the transaction list with the current transaction id. Visible, the situation where the version is visible includes:

       Whether the transaction id of the current version is less than, greater than, or equal to several ids in the transaction list.

The corresponding readview will be generated before each data reading and the readview will be generated when the data is read for the first time, respectively corresponding to the read committed ( READ COMMITTED ) and repeatable read ( REPEATABLE READ ).

Mysql lock

Locks in Mysql are divided into table locksrow-level locks , and  gap locks .

Table lock:   Table lock is the most granular lock in Mysql, which means to lock the entire table currently being operated, and is suitable for a large number of batch operations, such as: table reconstruction and full table backup, etc., through LOCK TABLE and UNLOCK TABLES The statement is realized.

At the same time, because the table lock needs to live in the entire table, the concurrency performance is poor, and the locking itself needs to consume resources (acquiring locks, checking locks, releasing locks, etc.), so when there are many locked data, you can choose Use table locks to save a lot of resources. Different storage engines in Mysql use different locks. MyIsam supports table locks, and InnoDB supports table locks and row locks.

Row lock:  Row lock is the smallest granularity lock in Mysql. It only locks the currently operated row, and other transactions can access the data of other rows. It is suitable for scenarios with high concurrency, through SELECT ... FOR UPDATE and SELECT. .. The LOCK IN SHARE MODE statement is implemented, but the cost of locking is also high, and deadlock may occur, but the probability of lock conflict is the lowest.

Row-level locks are also divided into shared locks and exclusive locks.

1. Shared lock (Shared Lock):  Also called read lock, referred to as S lock, multiple transactions can hold shared locks at the same time, and transactions holding shared locks can be executed concurrently, that is, read locks will not block reading lock, but if a transaction holds a shared lock, other transactions cannot obtain the exclusive lock of the row, and can only wait for the shared lock to be released.

2. Exclusive lock (Exclusive Lock):   Also called a write lock, only one transaction can hold the lock at the same time, and the transaction holding the exclusive lock can either read or modify the row data, and any other transaction can Shared and exclusive locks on the row can no longer be acquired until the exclusive lock is released.

Gap lock:  Gap lock locks an interval. In order to solve the problem of phantom reading, InnoDB introduces a gap lock, which also meets the requirements of the serialization isolation level.

              Phantom reading:  Phantom reading refers to a transaction querying in the same range, and the next query finds rows that were not queried in the previous query.

For example, if there are only 101 records in the user table, and their userid values ​​are 1, 2, ..., 100, 101, the following SQL: select * from user where userid > 100 for update; is a range condition retrieval, InnoDB It will not only lock the records with eligible userid value 101, but also lock the "gap" where userid is greater than 101 (but these records do not exist), preventing other transactions from adding data at the end of the table

lock conflict

        When multiple users access the database concurrently, if multiple users request to modify the same data at the same time, lock conflicts will occur. Lock conflict means that in a transaction, if you want to access a resource that has been locked, you need to wait for the lock to be released, which causes the transaction to wait and reduces the performance of the database.

Lock conflicts are generally divided into two types: shared locks and exclusive locks. Shared lock (Shared Lock), also known as read lock, is a shared lock mechanism. Multiple transactions can hold shared locks at the same time without preventing other transactions from obtaining shared locks. It is used to ensure data consistency during concurrent reading. Exclusive lock, also known as write lock, is a mutual exclusion locking mechanism. Once a transaction acquires an exclusive lock, other transactions cannot obtain shared locks and exclusive locks, which is used to ensure the atomicity of transaction operations.

SQL optimization

1. Try not to use select * query when querying, but use specific fields.

     1. It can save resources, reduce network and IO overhead, because we need to read data from disk, the fields I use will increase network overhead and IO overhead.  

     2. It may also affect the security of data. If we have a class containing accounts, passwords, etc., using select * may cause user information to be leaked. Or some private information was added later.

     3. Covering indexes will not be used.

2. Avoid using or to connect conditions in the where clause.

   Because the use of or may cause the engine to give up using the index, thereby performing a full table scan

select id from t where num=10 or num=20 

The correct way to use it is as follows:

select id from t where num=10 
union all 
select id from t where num=20

3. Fuzzy query will also lead to full table scan

select id from t where name like '%abc%'

4. Try to use numeric values ​​instead of string types

Primary key (id): primary key is preferred to use the numeric type int
Gender (sex): 0 represents female, 1 represents male; the database does not have Boolean type, mysql recommends using tinyint
Because the engine will compare each character in the string one by one when processing queries and connections;
For the digital type, only one comparison is enough;
Characters degrade query and join performance and increase storage overhead;
5. Use varchar instead of char
Because varchar is a variable-length field, storing data according to the actual length of the data content can save storage space;
char is stored according to the declaration consignment, and will not make up the empty space.
For queries, it is more efficient to search in a relatively small field.
6. Optimize the query. Try to avoid full table scanning, first consider building indexes on the columns involved in where and order by
7. Try to avoid index failure
      1. Avoid judging the null value in the where clause, otherwise the engine will give up using the index to scan the entire table.

   如:select id from t where num is null   

You can set a default value of 0 on num to ensure that there is no null value in the num column in the table,

Then query like this: select id from t where num=0
      2. In and not in should also be used with caution, otherwise it will cause a full table scan, such as: select id from t where num
in(1,2,3), for continuous values, do not use in if you can use between, select id from t where num between 1 and 3
      3. Try to avoid performing function operations on fields in the where clause, which will cause the engine to give up using indexes and perform
8. Inner join, left join, right join, inner join is preferred
           
        If the results of the three connections are the same, the inner join is preferred
Inner join Inner connection, only keep the result sets that match exactly in the two tables;
left join returns all rows from the left table, even if there are no matching records in the right table;
right join returns all rows from the right table, even if there are no matching records in the left table;
9. Improve the efficiency of the group by statement
   Counter example: group first, then filter
   Positive example: filter first, then group
10. Use truncate first when clearing the table
truncate table is faster than delete and uses less system and transaction log resources.
The delete statement deletes one row at a time and records an entry in the transaction log for each row deleted. truncate table removes data by freeing the data pages used to store table data.
11. There should not be too many table connections and too many indexes, generally within 5
The more linked tables, the greater the compilation time and overhead
A temporary table is generated in each associated memory
The connection table should be split into several smaller executions, which are more readable
12. Avoid using built-in functions on indexed columns
Using built-in functions on indexed columns, the index is invalidated.

Sql execution plan (explain)

Explain:  Use explain to simulate the optimizer to execute SQL queries, so as to know how MySQL processes your SQL statements. Analyze the performance bottleneck of your query statement or table structure. '  

The role of Explain:  

       Table reading order, data reading operation type, which indexes can be used, which indexes are actually used, references between tables, how many rows of each table are queried by the optimizer

Add the explain keyword before the select statement, and executing the query will return the information of the execution plan instead of executing SQL.
EXPLAIN SELECT * FROM USER WHERE id = 1

The information from explain has 12 columns, which are:
id, select_type, table, type, possible_keys, key, key_len, ref, rows, Extra
Summary description
id: selection identifier
select_type: Indicates the type of query.
table: The table that outputs the result set
partitions: matching partitions
type: Indicates the connection type of the table
possible_keys: Indicates the index that may be used when querying
key: Indicates the index actually used
key_len: the length of the index field
ref: comparison of column and index
rows: the number of rows scanned (the estimated number of rows)
filtered: Percentage of rows filtered by table criteria
Extra: Description and clarification of the implementation
1.id

   SELECT identifier. This is the query sequence number for the SELECT

    If the id is the same, it can be considered as a group and executed sequentially from top to bottom; in all groups, the larger the id value, the higher the priority, and the earlier the execution

EXPLAIN SELECT * FROM employee e,dept d WHERE e.deptId = d.id

 

 EXPLAIN SELECT * FROM employee e WHERE e.deptId = (SELECT id FROM dept d WHERE d.id = 1)

2. select_type

Indicates the type of each select clause in the query
1. SIMPLE (simple SELECT, without using UNION or subqueries, etc.)
2. PRIMARY (the outermost query in the subquery, if the query contains any complex subparts, the outermost
select is marked as PRIMARY)
3.SUBQUERY (the first SELECT in the subquery, the result does not depend on the outer query)
4. DERIVED (SELECT of derived table, subquery of FROM clause)
5. UNION (the second or subsequent SELECT statement in UNION) 6. UNION RESULT (the result of UNION, the second select in the union statement starts all subsequent selects)

3.type

For the table access method, it means the way MySQL finds the required rows in the table, also known as "access type".
Commonly used types are: system>const>eq_ref>ref>range>index>ALL (from left to right, performance from good to bad)
.
4.system: The table has only one row of records (equal to the system table), which usually does not appear, and this can also be ignored.
5. const: indicates that the index is found once, and const is used to compare primary key or unique index.
6.eq_ref: Unique index scan, for each index key, only one record in the table matches it. Common for primary key or unique index scans.
7.ref: Non-unique index scan, returns all rows that match a single value. It is essentially an index access that returns all rows that match a single value, however, it may find multiple eligible OK, so it should be a hybrid of find and scan.
8.range: Only retrieve rows of a given range, using an index to select rows. The key column shows which index is used. Generally, there are queries such as between, <, >, in, etc. in your where statement. This kind of range scan index scan is better than full table scan because it only needs to start at a certain point of the index. And another point in the conclusion, there is no need to scan all indexes.
9.index: Full Index Scan, the difference between index and ALL is that the index type only traverses the index tree. This is usually faster than ALL because index files are usually smaller than data files. That is to say, although both all and Index read the entire table, index is read from the index, and all is read from the hard disk)
10.All: Full Table Scan, which will traverse the entire table to find matching rows. Generally speaking, it is necessary to ensure that the query reaches at least the range level, and it is best to reach the ref.
11.possible_keys shows the indexes that may be applied to this table, one or more.
If there is an index on the field involved in the query, the index will be listed, but not necessarily the actual key used by the query
The actual index to use. If NULL, the index is not used, or the index is invalid.
12.ken_len
Indicates the number of bytes used in the index, which can be used to calculate the length of the index used in the query. The shorter the length, the better without loss of precision.
13.ref    shows which column of the index is used, if possible, is a constant. Which columns or constants are used to look up values ​​on indexed columns
EXPLAIN SELECT * FROM employee e,dept d,admin a WHERE e.deptId = d.id AND e.adminId=a.id
AND e.age=20
14.rows
According to table statistics and index selection, roughly estimate the number of rows that need to be read to find the required records.
15.Extra
Additional information notes
Using filesort: When the Query contains ORDER BY operations and cannot use the index to complete the sorting operation, the operation that Mysql cannot use the index to complete the sorting is called "file sorting".
Using temporary: Use temporary tables to save intermediate results, and MySQL uses temporary tables when sorting query results. It is common in sorting order by and grouping query group by.
Using index
Indicates that the index is used in the corresponding select operation, avoiding access to the data rows of the table, and the efficiency is good!
If using where appears at the same time, it indicates that the index is used to perform the lookup of the index key value; if there is no using where at the same time, it indicates that the index is used to read data instead of performing search actions.
Using where
Indicates that the index is used, but where filtering is also performed

Guess you like

Origin blog.csdn.net/weixin_71243923/article/details/129811434