MySql Innodb storage engine--locks and transactions

 

 

Comparison of lock and latch

Latches are generally called latches (lightweight locks) because they require a very short locking time. If the delay time is long, the application performance will be very poor. In the InnoDB storage engine, latches can be divided into mutex (mutual exclusion locks). ) and rwlock (read-write lock) whose purpose is to ensure the correctness of concurrent thread operations on critical resources, and there is no mechanism for deadlock detection

 

The object of lock is a transaction, which is used to lock the UI in the database, such as tables, pages, and rows. And generally, the lock object is only released after the transaction commit or rollback (the release time of different transaction isolation levels may be different), and the lock, like most databases, has a deadlock mechanism. The table shows the difference between lock and latch


 

mysql> SHOW ENGINE INNODB MUTEX;
+--------+-------------------+-------------+
| Type   | Name              | Status      |
+--------+-------------------+-------------+
| InnoDB | dict0dict.cc:1057 | os_waits=2  |
| InnoDB | log0log.cc:844    | os_waits=1  |
| InnoDB | fil0fil.cc:1690   | os_waits=1  |
| InnoDB | dict0dict.cc:1066 | os_waits=3  |
| InnoDB | log0log.cc:907    | os_waits=11 |
+--------+-------------------+-------------+

Under the DEBUG version, more information about the latch can be seen through SHOW ENGINE INNODB MUTEX


Introduction to parameters in the status field in debug


 

If the locked object is regarded as a tree, then the top-level object is locked, that is, the most fine-grained object is locked, then the coarse-grained object needs to be locked first, as shown in the figure above, if the page needs to be locked If X lock is performed on record r on , then intentional lock IX on data A, table, and page is required, and finally X lock on record r. If any part of them causes waiting, then the operation needs to wait for the completion of coarse-grained lock.


 

 

 

 

 

innodb lock related table

INNODB_LOCKS表

a) lock_id: the id of the lock and the id number of the locked space, the number of pages, and the number of rows

b) lock_trx_id: The transaction id of the lock.

c) lock_mode: the mode of the lock.

d) lock_type: type of lock, table lock or row lock

e) lock_table: the table to be locked.

f) lock_index: The index of the lock.

g) lock_space: the id number of the innodb storage engine tablespace

h) lock_page: The number of locked pages, or null if it is a table lock.

i) lock_rec: The number of locked rows, or null if the table is locked.

j) lock_data: The primary key value of the locked row, or null if the table is locked.

 

innodb_lock_waits表

1) requesting_trx_id: The transaction id that applies for the lock resource.

2) requested_lock_id: The id of the requested lock.

3) blocking_trx_id: The blocking transaction id.

4) blocking_lock_id: The id of the blocking lock.

 

innodb_trx table

trx_id transaction ID

trx_state transaction state

trx_started transaction start time

trx_requested_lock_idinnodb_locks.lock_id

trx_wait_started transaction start waiting time

trx_weight

trx_mysql_thread_id transaction thread ID

trx_query specific SQL statement

trx_operation_state transaction current operation state

How many tables are used in the trx_tables_in_use transaction

How many locks are owned by the trx_tables_locked transaction

trx_lock_structs

trx_lock_memory_bytes transaction locked memory size (B)

The number of rows locked by the trx_rows_locked transaction

trx_rows_modified number of rows changed by transaction

trx_concurrency_tickets transaction and invoice number

trx_isolation_level transaction isolation level

Whether trx_unique_checks is unique check

trx_foreign_key_checks whether foreign key check

trx_last_foreign_key_error last foreign key error

trx_adaptive_hash_latched

trx_adaptive_hash_timeout

 

 

 

 

 

Consistent nonlocking read means that the InnoDB storage engine reads the data of the row in the database at the current execution time through multi versioning. If the read row is performing a DELETE or UPDATE operation, this is Read operations do not therefore wait for the row lock to be released. Instead, InnoDB will read a snapshot of the row


 

 

SQL Syntax for Consistent Locked Reads

-- exclusive lock
SELECT ......  FOR UPDATE

-- shared lock
SELECT ...... LOCK IN SHARE MODE

 

 

 

auto lock

-- Internally implemented in the following way
SELECT max(auto_inc_col) FROM t FOR UPDATE;

Starting from MySQL 5.1.22, InnoDB provides a lightweight mutex self-growth implementation mechanism, which greatly improves the performance of self-growth insertion.

InnoDB provides a parameter innodb_autoinc_lock_mode to control the mode of self-growth. The default value of this parameter is 1. Before continuing to discuss the new self-growth implementation, it is necessary to classify the self-growth inserts


 

The parameter innodb_autoinc_lock_mode and the effect of each setting on auto-increment, there are 3 valid values ​​that can be set 0 1 2 


 

 

 

 

Types of MySql locks

InnoDB has three algorithms for row locking:

1, Record Lock: A lock on a single row record.

2, Gap Lock: Gap lock, locks a range, but does not include the record itself. The purpose of the GAP lock is to prevent two current reads of the same transaction from occurring phantom reads.

3, Next-Key Lock: 1+2, lock a range, and lock the record itself. For row query, this method is used, and the main purpose is to solve the problem of phantom reading.

Innodb uses Record Lock for unique indexes and primary key indexes. For ordinary indexes, the joint index uses the next key lock algorithm, because the unique index is a definite value and does not need to lock a range.

next key lock can also solve the problem of phantom reading

Deadlock check in MySql is solved by timeout mechanism and wait-for graph algorithm

 

 

 

 

transaction isolation level

=================================================================================

 Isolation level Dirty Read NonRepeatable Read Phantom Read 

=================================================================================

 

Read uncommitted may be possible

 

Read committed Impossible Possible Possible

 

Repeatable read Impossible Impossible Possible

 

Serializable Impossible Impossible Impossible

 

================================================================================

Read Uncommitted: Allows dirty reads, that is, it is possible to read data modified by uncommitted transactions in other sessions

·Committed read (Read Committed): Only the data that has been submitted can be read. Most databases such as Oracle default to this level (non-repeatable read)

·Repeated Read: Repeated read. Queries within the same transaction are consistent at the beginning of the transaction, the default level of InnoDB. In the SQL standard, this isolation level eliminates non-repeatable reads, but there are still phantom reads

Serializable: A fully serialized read, each read requires a table-level shared lock, and reads and writes will block each other

 

 

 

Blocking related parameters

-- Control the waiting time, dynamic parameters
innodb_lock_wait_timeout

-- Used to set whether to roll back the ongoing transaction when waiting for the timeout (default is OFF, no rollback), static parameter
innodb_rollback_on_timeout

 

 

lock upgrade

1. The number of locks held by a single SQL on an object exceeds the threshold. The default value is 5000. If it is a different object, it will not be upgraded

2. Lock escalation occurs when the memory occupied by lock resources exceeds 40% of the activated memory

Innodb manages locks according to each page accessed by each thing, using a bitmap method

There is no lock escalation problem in innodb, these are Microsoft SQL

 

 

 

 

 

ACID of the transaction

ACID stands for atomicity, consistency, isolation and durability. A good transaction processing system must have these standard characteristics:

atomicity

  A transaction must be regarded as an indivisible minimum unit of work. All operations in the entire transaction are either submitted successfully or all failed and rolled back. For a transaction, it is impossible to perform only a part of the operations, which is the transaction atomicity

consistency

     The database always transitions from one consistent state to another consistent state. (In the previous example, consistency ensures that even if the system crashes between the execution of the third and fourth statements, there is no loss of $200 in the checking account, because the transaction does not end up committing, so the modifications made in the transaction are not will not be saved to the database.)

isolation

     In general, changes made by one transaction are not visible to other transactions until they are finally committed. (In the previous example, when the third statement is executed and the fourth statement has not yet started, another account summary program starts to run, and it sees that the balance of the checking account has not been subtracted by 200. Dollar.)

Durability

  Once a transaction commits, its modifications are not permanently saved to the database. (At this point, even if the system crashes, the modified data will not be lost. Persistence is a vague concept, because in fact there are many different levels of persistence. Some persistence strategies can provide very strong security guarantees, while others Not necessarily, and there can be no strategy that can achieve a 100% durability guarantee.)

 

 

 

Classification of affairs

Flat Transactions

Flat Transactions with Savepoints

Chained Transactions

Nested Transactions

Distributed Transactions

 

A flat transaction is the simplest of the transaction types, but in a real production environment, this is probably the most frequently used transaction, in a flat transaction, all operations are at the same level, which starts with BEGIN WORK and ends with COMMIT WORK or The ROLLBACK WORK ends, and the operations in between are either executed or rolled back, so flat transactions are the basic building blocks of what applications call atomic operations

 

Flat transactions with savepoints For flat transactions, a savepoint is implicitly set. However, there is only this savepoint in the entire transaction, so the rollback can only be rolled back to the state at the beginning of the transaction. The savepoint is established with the SAVE WORK function to notify the system to record the current processing state. When a problem occurs, the savepoint can be used as an internal restart point, depending on the application logic, whether to go back to the most recent savepoint or another earlier savepoint.


 

Chain transactions can be regarded as a variant of the savepoint pattern, flat transactions with savepoints. When a system crash occurs, all savepoints will disappear because their savepoints are volatile, which means that when a system crash occurs, all savepoints will disappear. When recovering, the transaction needs to be re-executed from the beginning, not from the most recent savepoint

链事务的思想是:在提交一个事务时,释放不需要的数据对象,将必要的处理上下文隐式地传给下一个要开始的事务,提交事务操作和开始下一个事务操作 将合并为一个原子操作,这意味着下一个事务将看到上一个事务的结果,就好像一个事务中进行的一样,如图显示了链事务的工作方式

 

嵌套事务 是一个层次结构框架,由一个顶层事务(top-level transaction)控制着各个层次的事务,顶层事务之下嵌套的事务被称为子事务,其控制每一个局部的变换

采用保存点技术比嵌套查询有更大的灵活性

但是用保存点技术来模拟嵌套事务在锁的持有方面还是与嵌套查询有些区别。当通过保存点技术来模拟嵌套事务时,用户无法选择哪些锁需要被子事务集成,哪些需要被父事务保留


 

 

 

事务控制语句

start transction 显示的开启一个事务
begin	显示的开启一个事务
 commit (commit work)
    commit work与completion_type的关系,commit work是用来控制事务结束后的行为,是chain还是release的,可以通过参数completion_type来控制,默认为0

rollback,rollback work与commit,commit work的工作原理一样。
rollback(rollback work)
savepoint [identifier]	在事务中创建一个保存点,一个事务允许有多个保存点
release savepoint [identifier]	删除事务中的保存点,当时一个保存点也没有时执行这个命令,会报错抛出一个异常

 

 

 

事务操作的统计

因为InnoDB存储引擎是支持事务的,因此对于InnoDB存储引擎的应用,在考虑每秒请求数(Question Per Second,QPS)的同时,也许更应该关注每秒事务处理的能力(Transaction Per Second,TPS)。

计算TPS的方法是(com_commit+com_rollback)/time。但是用这种方法计算的前提是:所有的事务必须都是显式提交的,如果存在隐式的提交和回滚(默认autocommit=1),不会计算到com_commit和com_rollback变量中。

show global status like'com_commit'; 
show global status like'com_rollback'; 

 

 

 

分布式事务

XA 事务的基础是两阶段提交协议。需要有一个事务协调者来保证所有的事务参与者都完成了准备工作(第一阶段)。如果协调者收到所有参与者都准备好的消息,就会通知所有的事务都可以提交了(第二阶段)。MySQL 在这个XA事务中扮演的是参与者的角色,而不是协调者(事务管理器)。

 

Mysql 的XA事务分为内部XA和外部XA。 外部XA可以参与到外部的分布式事务中,需要应用层介入作为协调者;内部XA事务用于同一实例下跨多引擎事务,由Binlog作为协调者,比如在一个存储引擎提交时,需要将提交信息写入二进制日志,这就是一个分布式内部XA事务,只不过二进制日志的参与者是MySQL本身。 Mysql 在XA事务中扮演的是一个参与者的角色,而不是协调者。

 

基本语法

XA {START|BEGIN} xid [JOIN|RESUME]     启动一个XA事务 (xid 必须是一个唯一值; [JOIN|RESUME]  字句不被支持)    
XA END xid [SUSPEND [FOR MIGRATE]]   结束一个XA事务 ( [SUSPEND [FOR MIGRATE]] 字句不被支持)
XA PREPARE xid    准备
XA COMMIT xid [ONE PHASE]    提交XA事务
XA ROLLBACK xid  回滚XA事务
XA RECOVER   查看处于PREPARE 阶段的所有XA事务

操作演示

mysql> XA START 'xatest';
Query OK, 0 rows affected (0.00 sec)

mysql> INSERT INTO mytable (i) VALUES(10);
Query OK, 1 row affected (0.04 sec)

mysql> XA END 'xatest';
Query OK, 0 rows affected (0.00 sec)

mysql> XA PREPARE 'xatest';
Query OK, 0 rows affected (0.00 sec)

mysql> XA COMMIT 'xatest';
Query OK, 0 rows affected (0.00 sec)

 

 

 

重做日志

重做日志保证了持久性和原子性,锁用来保持隔离性,undo log则保持了一致性

重做日志包含重做日志缓冲(redo log buffer)和重做日志文件(redo log file)

 

为了确保每次日志都能写入日志文件,在每次将重做日志缓冲写入重做日志文件后,InnoDB存储引擎都需要调用一次fsync操作,由于重做日志文件打开并没有使用O_DIRECT选项,因此重做日志缓冲先写入文件系统缓存。为了确保重做日志写入磁盘,必须进行fsync操作。由于fsync的效率取决于磁盘的性能,因此磁盘的性能决定了事务提交的性能,也就是数据库的性能

参数innodb_flush_log_at_trx_commit用来控制重做日志刷新到磁盘的策略

1 ,表示事务提交时必须调用一次fsync操作,

0表示事务提交时不进行写入重做日志操作,这个操作仅在master thread中完成,也就是每秒刷新一次,但可能会出现瞬间宕机数据丢失的可能性

2 表示事务提交时将重做日志写入重做日志文件,但仅写入文件系统的缓存中,不进行fsync操作。在这个设置下,当MySQL发生宕机而操作系统不发生宕机时,并不会导致事务的丢失,而当操作系统宕机时,重启数据库会丢失未从文件系统缓存刷新到重做日志文件的那部分事务

重做日志缓冲区和重做日志组之间的关系

 

在InnoDB存储引擎中,重做日志都是以512字节进行存储的,这意味着重做日志缓存、重做日志文件块都是以块block的方式进行保存的,称为重做日志块(redo log block)每块的大小512字节

 

 

日志块由三部分组成,依次为日志快头(log block header)、日志内容(log body)、日志块尾(log block tailer)


 

LOG_BLOCK_HDR_NO用来标记这个数组中的位置,尤其是递增并且循环使用的。占用4个字节。但是由于第一位用来判断是否是flush bit,所以最大值为2G

LOG_BLOCK_HDR_DATA_LEN占用2个字节,表示log block所占用的大小,当log block被写满时,该值为0x200,表示使用全部的log block空间,即占用512字节

LOG_BLOCK_FIRST_REC_GROUP 占用2个字节,表示log block中第一个日志所在的偏移量。如果该值的大小和LOG_BLOCK_HDR_DATA_LEN相同,则表示当前log block不包含新的日志

LOG_BLOCK_CHECKPOINT_NO占用4字节,表示该log block最后被写入时的检查点第4字节的值

LOG_BLOCK_TAILER 只由1个部分组成,且值和LOG_BLOCK_HDR_NO相同,并在函数log_block_init中被初始化 LOG_BLOCK_TRL_NO 大小为4字节


 

 

重做日志格式

通用的头部格式由一下3部分组成

redo_log_type 重做日志类型

space: 表空间ID

page_no 页的偏移量

之后是redo log body  

 

body体的内容


 

SHOW ENGINE INNODB STATUS查看LSN的情况

---
LOG
---
Log sequence number 18766833801	-- 表示当前的LSN
Log flushed up to   18766832201 -- 表示刷新到重做日志文件的LSN
Pages flushed up to 18766816420
Last checkpoint at 18766816420 -- Indicates the LSN flushed to disk

Since the checkpoint represents the LSN that has been flushed to the disk page, only the log portion at the beginning of the checkpoint needs to be recovered during the recovery process


 

Put another complete log block analysis diagram


 

 

 

undo log

In order to satisfy the atomicity of the transaction, before any data is operated, the data is first backed up to Undo Log, and then the data is modified. If there is an error or the user executes the ROLLBACK statement, the system can use the backup in the Undo Log to restore the data to the state before the transaction started. Unlike redo log, there is no separate undo log file on the disk, it is stored in a special segment (segment) inside the database, which is called the undo segment (undo segment), and the undo segment is located in the shared tablespace.

In addition, undo log is also used for MMVC function of transaction, multi-version concurrency

The undo log is not a physical operation, but a logical reverse operation. For example, when a record is inserted, there will be an operation to delete the record in the undo log. If a record is updated, the undo will be updated in the reverse direction. Similarly, the delete operation will be recorded in the undo log. an insert operation.

Reverse operation after inserting 1W records may make the page larger.

 

Related parameters

 

innodb_undo_directory is used to set the path where the rollback segment file is located
innodb_undo_logs is used to set the number of rollback segments, the default is 128
innodb_undo_tablespaces is used to set the number of files that make up the rollback segment
After the transaction is committed, the undo log will not be deleted immediately. This is because there are other transactions that need the undo log to get the previous version. Whether the undo log can be deleted is determined by the purge thread.

 

 

In order to ensure that there is no conflict when writing the respective undo logs when transactions operate concurrently, InnoDB uses rollback segments to maintain concurrent writing and persistence of undo logs. The rollback segment is actually an undo file organization method, and each rollback segment has multiple undo log slots.


 

 

 

 

undo log的insert格式


 

undo log的update和delete格式


 

INNODB_TRX_ROLLBACK_SEGMENT  这个数据字典表用来查看rollback segment(回滚段)。

可以查看某个记录插入哪个表,哪个页,插入的偏移量和长度,这样可以获取页中的undo log内容

INNODB_TRX_UNDO  用来记录事务对应的undo log,方便DBA和开发人员详细了解每个事务产生的undo量 

 

对于delete操作,数据并不是马上删除,而是增加一个标志位,最后由purge线程异步删除这些数据

所有的undo log会放到一个列表中,然后purge会依次扫描这个列表判断哪些可以删除

下面操作会首先检查trx1,发现trx1引用到了trx5,而trx5被占用于是不能删除,再检查trx2发现可以删除因为没有人引用了

 

 

组提交(group commit)是MYSQL处理日志的一种优化方式,主要为了解决写日志时频繁刷磁盘的问题。组提交伴随着MYSQL的发展不断优化,从最初只支持redo log 组提交,到目前5.6官方版本同时支持redo log 和binlog组提交。组提交的实现大大提高了mysql的事务处理性能,下文将以innodb 存储引擎为例,详细介绍组提交在各个阶段的实现原理。

组提交思想是,将多个事务redo log的刷盘动作合并,减少磁盘顺序写。Innodb的日志系统里面,每条redo log都有一个LSN(Log Sequence Number),LSN是单调递增的。每个事务执行更新操作都会包含一条或多条redo log,各个事务将日志拷贝到log_sys_buffer时(log_sys_buffer 通过log_mutex保护),都会获取当前最大的LSN,因此可以保证不同事务的LSN不会重复。

 

 

 

参考

通过例子理解事务的4种隔离级别

MySQL事务隔离级别详解

Innodb锁机制:Next-Key Lock 浅谈

MySQL中一致性非锁定读

MySql死锁问题分析

MySQL · 引擎特性 · InnoDB redo log漫游

MySQL redo log及recover过程浅析

MySQL中redo日志

MySQL · 引擎特性 · InnoDB undo log 漫游

InnoDB undo log解析(一)

MySQL 研究innodb_max_purge_lag分享

MYSQL-GroupCommit

XA/JTA/MYSQL两阶段提交事务

针对SSD的MySQL IO优化

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=326400275&siteId=291194637