MySQL (6) InnoDB engine structure, transaction principles, MVCC and MysqlSQL management tools

InnoDB engine

logical storage structure

Insert image description here

  • Table space (ibd file) , one mysql instance can correspond to multiple table spaces, used to store records, indexes and other data. (The default storage location of the file is the data folder in the MySQL installation directory)
  • Segments are divided into data segment (Leaf node segment), index segment (Non-leaf node segment), and rollback segment (Rollback segment). InnoDB is an index-organized table. The data segment is the leaf node of the B+ tree, and the index segment is the B+ tree. of non-leaf nodes. Segments are used to manage multiple Extents (areas).
  • The area is the unit structure of the table space, and the size of each area is 1M. By default, the InnoDB storage engine page size is 16K, that is, there are 64 consecutive pages in one area.
  • Page is the smallest unit of disk management of InnoDB storage engine. The default size of each page is 16KB. In order to ensure page continuity, the InnoDB storage engine applies for 4-5 areas from the disk each time.
  • Row InnoDB storage engine data is stored row by row.

In a row, there are two hidden fields by default:

  • Trx_id: Every time a record is modified, the corresponding transaction ID will be assigned to the trx_id hidden column.
  • Roll_pointer: Every time a certain index record is modified, the old version will be written to the undo log. Then this hidden column is equivalent to a pointer through which the information before the record can be found.

memory architecture

Starting from MySQL version 5.5, the InnoDB storage engine is used by default. It is good at transaction processing and has crash recovery features. It is widely used in daily development. The following is the InnoDB architecture diagram, with the memory structure on the left and the disk structure on the right.
Insert image description here

memory structure

The memory structure is mainly divided into: Buffer Pool, Change Buffer, Adaptive Hash Index, and Log Buffer.

Buffer Pool
InnoDB storage engine is based on disk file storage. The access speed of physical hard disk and memory is very different. In order to make up for the difference in I/O efficiency between the two as much as possible, it is necessary to load frequently used data into In the buffer pool, disk I/O is avoided for each access. InnoDB's buffer pool not only caches index pages and data pages, but also contains undo pages, insertion cache, adaptive hash indexes, InnoDB lock information, etc.

The Buffer Pool is an area in the main memory that can cache real data that is frequently operated on the disk. When performing add, delete, modify, and query operations, the data in the buffer pool is first operated (if there is no data in the buffer pool, it is loaded from the disk and cached ), and then flush the data in the buffer pool to the disk at a certain frequency, thereby reducing disk IO and speeding up processing.

The buffer pool is based on Page pages, and the underlying layer uses a linked list data structure to manage Pages. Pages are divided into three types according to their status:

  • free page: free page, not used.
  • Clean page: The page is used and the data has not been modified.
  • Dirty page: Dirty page, used page, data has been modified, but has not been flushed to the disk, and the data in the buffer pool is inconsistent with the data on the disk.

On dedicated servers, it is common to allocate up to 80% of physical memory to the buffer pool. Parameter settings: show variables like 'innodb_buffer_pool_size';

Change Buffer Change
Buffer, change buffer (for non-unique secondary index pages), when executing DML statements, if these data pages are not in the Buffer Pool, the disk will not be directly operated, but the data changes will be stored. Cached in the change buffer Change Buffer, when the data is read in the future, the data will be merged and restored to the Buffer Pool, and then the merged data will be flushed to the disk. (Primary key index and unique index will not operate the change buffer)

What is the significance of Change Buffer?
The structure diagram of the secondary index is as follows: Unlike the clustered index, the secondary index is usually non-unique and is inserted into the secondary index in a relatively random order. Therefore, each deletion and update may affect non-adjacent secondary index pages in the index tree. If the disk is operated every time, a large amount of disk IO will be caused. With ChangeBuffer, merge processing can be performed in the buffer pool, reducing disk IO and improving efficiency.
Insert image description here

Adaptive Hash Index Adaptive Hash Index
The hash index only needs one match when there is no hash conflict. It is highly efficient and corresponds to the B+ tree. It may require several matches. However, the hash index is not suitable for range queries and fuzzy matching. It is only suitable for Do equivalent matching. MySQL's innoDB engine does not directly support hash indexes, but provides an adaptive hash index to optimize queries for Buffer Pool data.
The InnoDB storage engine monitors queries for each index page on the table. If it is observed that the hash index can improve the speed under certain conditions, it will automatically create a hash index, which is called an adaptive hash index.

Adaptive hash index does not require manual intervention, the system automatically completes it according to the situation.
Check whether it is turned on: show variables '%hash_index%'; ON is turned on
Insert image description here

Log Buffer
Log Buffer: The log buffer is used to save the log data (redo log, undo log) to be written to the disk. The default size is 16MB. The logs in the log buffer will be refreshed to the disk regularly. If a transaction requires updating, inserting, or deleting many rows, increasing the log buffer size can save disk I/O.

Parameters: innodb_log_buffer_size: buffer size
Insert image description here

innodb_flush_log_at_trx_commit: Timing of flushing logs to disk. The values ​​mainly include the following three:

  • 1: The log is written and flushed to disk on each transaction commit, default value.
  • 0: Write and flush the log to disk once per second.
  • 2: The log is written after each transaction commits and flushed to disk once per second.

Insert image description here

disk structure

System Tablespace System Tablespace

The system tablespace is the storage area for change buffers. If the independent table space of each table in the InnoDB engine is closed, the data and indexes of all tables will be stored in the system table space. (MySQL5.x version also includes InnoDB data dictionary, undolog, etc.)

Parameters: innodb_data_file_path, system table space, the default file name is ibdata1.
Insert image description here
Insert image description here

Independent tablespaces File-Per-Table Tablespaces
A separate tablespace for each table contains the data and indexes of a single InnoDB table and is stored in a single data file on the file system.

Switch parameter: innodb_file_per_table, this parameter is enabled by default.
Insert image description here

That is, every time a table is created, a tablespace file will be generated, as shown in the figure:
Insert image description here
General Tablespaces
. General tablespaces need to be created manually. When creating a table, you can specify the tablespace.

## 创建表空间
CREATE TABLESPACE ts_name ADD DATAFILE 'file_name' ENGINE = engine_name;

Insert image description here

## 创建表时指定表空间
CREATE TABLE xxx ... TABLESPACE ts_name;

Insert image description here
Insert image description here

Undo Tablespaces

When the MySQL instance is initialized, it will automatically create two default undo table spaces (initial size 16M) for storing undo logs.
Insert image description here

Temporary Tablespaces
InnoDB uses session temporary tablespaces and global temporary tablespaces. Store data such as temporary tables created by users.

Doublewrite Buffer Files
Before the innoDB engine refreshes the data page from the Buffer Pool to the disk, in order to ensure the security of the data, it first writes the data page into the doublewrite buffer file to facilitate data recovery when the system is abnormal.
Insert image description here
Redo Log Redo Log
Redo log is used to achieve transaction durability. Contains redo log buffer (redo logbuffer) and redo log file (redo log), the former is in memory and the latter is in disk. After the transaction is committed, all modification information will be stored in the log, which can be used for data recovery when an error occurs when flushing dirty pages to disk. The Redo Log will not be saved permanently. The previously useless Redo Log will be cleaned up every once in a while. After the transaction is submitted, it will be useless. The main purpose is to ensure data recovery in the event of an exception, thereby ensuring the durability of the transaction.

The redo log files are written in a round-robin fashion, involving two files:
Insert image description here

Background thread
To refresh the data in the memory structure to the disk structure, you need to use background threads:
Insert image description here
InnoDB's background threads are divided into 4 categories: Master Thread, IO Thread, Purge Thread, and Page Cleaner Thread.

The Master Thread
core background thread is responsible for scheduling other threads, and is also responsible for asynchronously refreshing the data in the buffer pool to the disk to maintain data consistency, including refreshing dirty pages, merging and inserting cache, and recycling undo pages.

IO Thread
uses AIO (asynchronous non-blocking IO) extensively in the InnoDB storage engine to handle IO requests, which can greatly improve the performance of the database, and IOThread is mainly responsible for the callback of these IO requests.
Insert image description here

Through the following command, you can view the status information of InnoDB, which includes IO Thread information.

show engine innodb status \G;

Insert image description here
Purge Thread
is mainly used to recycle the undo log that has been submitted by the transaction. After the transaction is committed, the undo log may not be used, so use it to recycle it.

Page Cleaner Thread
assists the Master Thread in refreshing dirty pages to disk. It can reduce the work pressure of the Master Thread and reduce blocking.

Principles of affairs

A transaction is a set of operations, which is an indivisible unit of work. A transaction will submit or revoke operation requests to the system as a whole, that is, these operations will either succeed at the same time or fail at the same time.

  • Atomicity: A transaction is an indivisible minimum unit of operations, either all succeed or all fail.
  • Consistency: When a transaction is completed, all data must be in a consistent state.
  • Isolation: The isolation mechanism provided by the database system ensures that transactions run in an independent environment that is not affected by external concurrent operations.
  • Durability: Once a transaction is committed or rolled back, its changes to the data in the database are permanent.

Insert image description here

Atomicity, consistency, and persistence are actually guaranteed by two logs in InnoDB, one is the redo log and the other is the undo log. Durability is guaranteed through database locks and MVCC.

redo log

The redo log records the physical modification of the data page when the transaction is committed to achieve transaction durability .

The log file consists of two parts: the redo log buffer and the redo log file. The former is in memory and the latter is on disk. After the transaction is committed, all modified information will be stored in the redo log file, which is used for data recovery when dirty pages are flushed to disk and an error occurs, thereby ensuring the durability of the transaction.

If there is no redolog, what problems may exist?
1. In the memory structure of the InnoDB engine, the main memory area is the buffer pool, and many data pages are cached in the buffer pool.
2. When a transaction performs multiple addition, deletion and modification operations, the InnoDB engine will first operate on the data in the buffer pool. If there is no corresponding data in the buffer, it will load the data from the disk through a background thread and store it in the buffer. .
3. Then modify the data in the buffer pool (the data on the disk has not been modified), and the modified data pages are called dirty pages.
4. Dirty pages will be flushed to the disk through the background thread at a certain opportunity to ensure that the data in the buffer and the disk are consistent.
5. The dirty page data in the buffer is not refreshed in real time, but the data in the buffer is refreshed to the disk after a period of time. If an error occurs in the process of refreshing to the disk, the user is prompted that the transaction was submitted successfully, but the data is not. After persistence, the durability of the transaction is not guaranteed.

Insert image description here

InnoDB provides a log redo log. How to solve this problem through redolog.
1. After adding, deleting, or modifying the data in the buffer, the changes in the operated data page will first be recorded in the redolog buffer.
2. When the transaction is committed, the data in the redo log buffer will be flushed to the redo log disk file.
3. After a period of time, if an error occurs when refreshing the dirty pages of the buffer to the disk, you can use the redo log for data recovery, thus ensuring the durability of the transaction.
4. If the dirty page is successfully flushed to the disk or the data involved has been written to the disk, the redolog will have no effect at this time and can be deleted, so the two existing redolog files are written in a loop.

Why do we need to refresh the redo log to the disk every time we commit a transaction, instead of directly flushing the dirty pages in the buffer pool to the disk? In
business operations, operating data is generally read and written to the disk randomly, rather than sequentially. Write to disk. The performance of a large amount of disk IO is relatively low, and redo log writes data to the disk file. Since it is a log file, it is appended, so it is written sequentially. The efficiency of sequential writing is much greater than that of random writing. This method of writing logs first is called WAL (Write-Ahead Logging). After writing to the log, the data of the dirty pages will be flushed to the disk after a period of time.

Rollback log

The rollback log undolog is used to record information before the data is modified. It has two functions: providing rollback (to ensure the atomicity of transactions) and MVCC (multi-version concurrency control) to ensure the atomicity of transactions.

Undo log and redo log record physical logs (what is the data content), which are logical logs (what operations are performed at each step). It can be thought that when a record is deleted, a corresponding insert record will be recorded in the undo log, and vice versa, when a record is updated, a corresponding update record will be recorded (what the data looks like before the update data is executed). When rollback is executed, the corresponding content can be read from the logical record in the undo log and rolled back.

Undo log destruction: The undo log is generated when a transaction is executed. When the transaction is committed, the undo log will not be deleted immediately because these logs may also be used for MVCC.

Undo log storage: The undo log is managed and recorded in segments and is stored in the rollback segment introduced earlier. It contains 1024 undo log segments internally.

MVCC

basic concept

Current read: What is read is the latest version of the record. When reading, it is also necessary to ensure that other concurrent transactions cannot modify the current record, and the read record will be locked. For our daily operations, such as: select ... lock in share mode (shared lock), select ... for update, update, insert, delete (exclusive lock) are all current reads.
Insert image description here
In the test, we can see that even under the default RR (Repeatable Read) isolation level, transaction A can still read the latest submitted content of transaction B because lock in share mode is added after the query statement. Shared lock, this is the current read operation. When we add an exclusive lock, it is also the current read operation.

Snapshot read: A simple select (without locking) is a snapshot read. Snapshot read reads the visible version of the recorded data, which may be historical data. It is not locked and is a non-blocking read.

  • Read Committed: Each time you select, a snapshot read is generated.
  • Repeatable Read: The first select statement after starting a transaction is where the snapshot is read.
  • Serializable: Snapshot reads will degenerate into current reads.
    Insert image description here

Even if transaction B submits data, it cannot be queried in transaction A. Because ordinary select is snapshot reading, and under the current default RR isolation level, the first select statement after starting a transaction is where the snapshot is read. The same select statement executed later will obtain data from the snapshot, which may not be the current The latest data, thus ensuring repeatable reading.

MVCC: Full name Multi-Version Concurrency Control, multi-version concurrency control. It refers to maintaining multiple versions of a data so that there is no conflict in read and write operations. Snapshot reading provides a non-blocking read function for MySQL to implement MVCC. The specific implementation of MVCC also needs to rely on three implicit fields in the database record, undo log, and readView.

The implementation principle of MVCC is realized through the hidden fields of the InnoDB table, UndoLog version chain, and ReadView. MVCC + lock achieves transaction isolation. The consistency is guaranteed by redolog and undolog.
Insert image description here

Three implicit fields

When we create the above table, we can explicitly see these three fields when we look at the table structure. In fact, in addition to these three fields, InnoDB will automatically add three hidden fields to us and their meanings are:
Insert image description here
Insert image description here

The first two fields mentioned above will definitely be added. Whether to add the last field DB_ROW_ID depends on whether the current table has a primary key. If there is a primary key, the hidden field will not be added.

To view the table stu with the primary key, enter /var/lib/mysql/itcast/ in the server to view the table structure information of stu, and pass the following command:

ibd2sdi stu.ibd

In the table structure information viewed, there is a column of columns. In it, we will see that in addition to the fields specified when we created the table, there are two additional fields: DB_TRX_ID and DB_ROLL_PTR. Because the table has a primary key, there is no DB_ROW_ID hidden field.
Insert image description here
To view the employee table without a primary key, create the table statement:

create table employee (id int , name varchar(10));

Then use the following command to view the table structure and its field information:

ibd2sdi employee.ibd

Insert image description here
In the table structure information viewed, there is a column of columns. In addition to the fields specified when creating the table, there are three additional fields: DB_TRX_ID, DB_ROLL_PTR, and DB_ROW_ID, because the employee table does not specify a primary key.

undolog version chain

Rollback log, a log generated during insert, update, and delete to facilitate data rollback.
When inserting, the undo log generated is only needed during rollback and can be deleted immediately after the transaction is committed.
When updating and deleting, the undo log generated is not only needed during rollback, but also during snapshot reading, and will not be deleted immediately.

Undolog version chain
inserts a piece of data into a table structure:
Insert image description here

DB_TRX_ID : 代表最近修改事务ID,记录插入这条记录或最后一次修改该记录的事务ID,是自增的。
DB_ROLL_PTR : 由于这条数据是才插入的,没有被更新过,所以该字段值为null。

Under concurrent access, four concurrent transactions are accessing this table at the same time:
Insert image description here
When transaction 2 executes the first modification statement, the data before the data change is recorded in the undo log, and then the update operation is performed, and DB_TRX_ID records this operation. The transaction ID (recording which transaction last operated this data), the current transaction ID is 2, DB_ROLL_PTR rollback pointer, the rollback pointer is used to specify which version to roll back to if a rollback occurs.
Insert image description here
When transaction 3 executes the first modification statement, the undo log will also be recorded, as above.
Insert image description here
When transaction 4 executes the first modification statement, the undo log will also be recorded, as above.
Insert image description here

Modifications to the same record by different transactions or the same transaction will cause the record's undolog to generate a record version linked list. The head of the linked list is the latest old record, and the tail of the linked list is the oldest old record.

readview

ReadView is the basis for MVCC to extract data when the snapshot read SQL is executed. It records and maintains the currently active transaction (uncommitted) ID of the system.

ReadView contains four core fields:
Insert image description here

The access rules for version chain data are stipulated in readview: trx_id represents the transaction ID corresponding to the current undolog version chain.
Insert image description here
Different isolation levels have different timings for generating ReadView:

  • READ COMMITTED: A ReadView is generated every time a snapshot read is performed in a transaction.
  • REPEATABLE READ: A ReadView is generated only when a snapshot read is performed for the first time in a transaction, and the ReadView is reused subsequently.

Case: RC level

In transaction 5, the record with ID 30 was queried twice. Since the isolation level is Read Committed, a ReadView will be generated for each snapshot read. The ReadView generated twice is as follows: When matching, it will be read from the undo log
Insert image description here
. In the version chain, match one by one from top to bottom.

The specific reading process of the first snapshot read:
1. First match the record trx_id=4, and substitute 4 into the matching rule on the right. ① is not satisfied, ② is not satisfied, ③ is not satisfied, ④ is not satisfied, and neither is satisfied. Then continue to match the next item in the undo log version chain.
2. Then match the record with trx_id 3 and substitute it into the matching rule on the right. ① is not satisfied, ② is not satisfied, ③ is not satisfied, and ④ is not satisfied. If neither is satisfied, continue to match the next item in the undo log version chain.
3. For the record with trx_id 2, bring 2 into the matching rule on the right. ① does not satisfy ② terminates matching. The data returned by this snapshot read is the data recorded in the version chain.
Insert image description here
The specific reading process of the second snapshot read:
1. Match the record with trx_id 4, and bring 4 into the matching rule on the right. ① is not satisfied, ② is not satisfied, ③ is not satisfied, and ④ is not satisfied. If neither is satisfied, continue to match the next item in the undo log version chain.
2. Match the record with trx_id 3 and bring 3 into the matching rule on the right. ① Not satisfied ② Satisfied. Terminate matching. The data returned during this snapshot read is the data recorded in the version chain.
Insert image description here

Case: RR level

Under the RR isolation level, ReadView is generated only when a snapshot read is performed for the first time in a transaction, and the ReadView is reused subsequently. RR is repeatable read. In a transaction, if the same select statement is executed twice, the query results will be the same.
Insert image description here
Under the RR isolation level, ReadView is only generated when the first snapshot is read in a transaction, and the ReadView is reused in subsequent times. Since the ReadView is the same, the version chain matching rules of ReadView are also the same, so the result returned by the final snapshot read is also the same.

MySQL management

System database

After the Mysql database is installed, it comes with four databases. The specific functions are as follows:
Insert image description here

Common tool

mysql: The mysql does not refer to the mysql service, but to the client tool of mysql

语法 :
		mysql [options] [database]
选项 :
		-u, --user=name #指定用户名
		-p, --password[=name] #指定密码
		-h, --host=name #指定服务器IP或域名
		-P, --port=port #指定连接端口
		-e, --execute=name #执行SQL语句并退出

The -e option can execute SQL statements on the Mysql client without having to connect to the MySQL database and execute them. This method is especially convenient for some batch scripts.

Example:

mysql -uroot –p123456 db01 -e "select * from stu";

mysqladmin: is a client program that performs management operations. You can use it to check the configuration and current status of the server, create and delete databases, etc.

通过帮助文档查看选项:
mysqladmin --help
语法:
	mysqladmin [options] command ...
选项:
	-u, --user=name #指定用户名
	-p, --password[=name] #指定密码
	-h, --host=name #指定服务器IP或域名
	-P, --port=port #指定连接端口

Example:

mysqladmin -uroot –p1234 drop 'test01';
mysqladmin -uroot –p1234 version;

mysqlbinlog: Since the binary log files generated by the server are saved in binary format, if you want to check the text format of these texts, the mysqlbinlog log management tool will be used.

语法 :
	mysqlbinlog [options] log-files1 log-files2 ...
选项 :
	-d, --database=name 指定数据库名称,只列出指定的数据库相关操作。
	-o, --offset=# 忽略掉日志中的前n行命令。
	-r,--result-file=name 将输出的文本格式日志输出到指定文件。
	-s, --short-form 显示简单格式, 省略掉一些信息。
	--start-datatime=date1 --stop-datetime=date2 指定日期间隔内的所有日志。
	--start-position=pos1 --stop-position=pos2 指定位置间隔内的所有日志。

mysqlshow: client object search tool, used to quickly find which databases, tables in the database, columns or indexes in the tables exist.

语法 :
mysqlshow [options] [db_name [table_name [col_name]]]
选项 :
	--count 显示数据库及表的统计信息(数据库,表 均可以不指定)
	-i 显示指定数据库或者指定表的状态信息
示例:
	#查询test库中每个表中的字段书,及行数
	mysqlshow -uroot -p2143 test --count
	#查询test库中book表的详细情况
	mysqlshow -uroot -p2143 test book --count

Example: Query the number of tables and the number of records in each database

mysqlshow -uroot -p1234 --count

Example: View statistics of database hei

mysqlshow -uroot -p1234 heimam --count

Insert image description here

Example: View information about the stu table in the database Heima

mysqlshow -uroot -p1234 heima stu --count

Insert image description here
Example: View information about the id field of the stu table in the database Heima

mysqlshow -uroot -p1234 heima stu id --count

Insert image description here

mysqldump: client tool used to back up databases or migrate data between different databases. The backup content includes SQL statements that create tables and insert tables.

语法 :
	mysqldump [options] db_name [tables]
	mysqldump [options] --database/-B db1 [db2 db3...]
	mysqldump [options] --all-databases/-A
连接选项 :
	-u, --user=name 指定用户名
	-p, --password[=name] 指定密码
	-h, --host=name 指定服务器ip或域名
	-P, --port=# 指定连接端口
输出选项:
	--add-drop-database 在每个数据库创建语句前加上 drop database 语句
	--add-drop-table 在每个表创建语句前加上 drop table 语句 , 默认开启 ; 不开启 (--skip-add-drop-table)
	-n, --no-create-db 不包含数据库的创建语句
	-t, --no-create-info 不包含数据表的创建语句
	-d --no-data 不包含数据
	-T, --tab=name 自动生成两个文件:一个.sql文件,创建表结构的语句;一个.txt文件,数据文件

Example: Back up the db01 database:

mysqldump -uroot -p12345678 heima > heima.sql

Insert image description here
You can directly open db01.sql to see what the backed up data looks like.

The backed up data includes: statements to delete tables, statements to create tables, and data insertion statements.
Insert image description here

When backing up data, there is no need to create a table or back up data. You only need to back up the table structure, which can be achieved through the corresponding parameters.

Back up the table data in the db01 database without backing up the table structure (-t)

mysqldump -uroot -p12345678 -t heima > heima02.sql

Open heima02.sql to view the backup data. There are only insert statements and no backup table structure.
Insert image description here
Only the table structure is backed up, not the data

mysqldump -uroot -p12345678 -d heima > heima03.sql

Insert image description here
Back up the table structure and data of the db01 database table separately (-T)

mysqldump -uroot -p1234 -T /root db01 score

An error will occur when executing the above command, and the data cannot be backed up because the specified data storage directory /root is considered unsafe by MySQL and needs to be stored in a directory trusted by MySQL. You can check the system variable secure_file_priv and the trust directory. The execution results are as follows: the
Insert image description here
Insert image description here
Insert image description here
above two files score.sql record the table structure file, and score.txt is the table data file. However, it should be noted that the table data file is not a record. Each insert statement records the data in the table structure according to a certain format. As follows:
Insert image description here
mysqlimport: It is a client data import tool that imports backup txt files and is used to import text files exported by mysqldump with the -T parameter.

语法 :
	mysqlimport [options] db_name textfile1 [textfile2...]
示例 :
	mysqlimport -uroot -p2143 test /tmp/city.txt

A problem occurs: Untrusted directory, the default is the current directory, the directory path needs to be added.
Insert image description here
Insert image description here
source: Import the backup sql file

source /root/xxxxx.sql

Guess you like

Origin blog.csdn.net/weixin_43994244/article/details/129426655