MySQL Technical Insider InnoDB Storage Engine Study Notes Chapter 2 InnoDB Storage Engine

InnoDB is developed by Innobase Oy. It is the first MySQL storage engine that supports ACID transactions (BDB is the first MySQL storage engine that supports transactions and has been discontinued). It has row lock design, supports MVCC, and provides consistent Oracle style. Non-locking read, supports foreign keys, and is designed to make the most efficient use of memory and CPU.

Oracle is a multi-process architecture (except under Windows), there are multiple core background processes, respectively responsible for database writing, log writing, checkpoint processes, etc., InnoDB is a multi-threaded architecture, its master thread almost realizes the functions of all Oracle core processes .

InnoDB has seven threads by default, four IO threads, one master thread, one lock monitoring thread, and one error monitoring thread. The number of IO threads is controlled by the innodb_file_io_threads parameter (only Windows can control the number of IO threads), and the default is 4. View InnoDB status:

SHOW ENGINE INNODB STATUS\G;

Run it: It
Insert picture description here
can be seen that the four IO threads are insert buffer thread, log thread, read thread, and write thread. The number of IO threads on Linux cannot be adjusted.

InnoDB Plugin version began to increase the number of default IO threads, the default read thread and write thread were increased to four, and no longer use the innodb_file_io_threads parameter, but use the innodb_read_io_threads and innodb_write_io_threads parameters.

Check the InnoDB version:

SHOW VARIABLES LIKE 'innodb_version';

View the number of InnoDB read and write threads:

SHOW VARIABLES LIKE 'innodb_%io_threads';

InnoDB memory is composed of buffer pool (innodb_buffer_pool_size configuration file parameter control), redo log buffer pool (innodb_log_buffer_size configuration file participation control), and additional buffer pool.

Display the number of bytes in the buffer pool:

SHOW VIRIABLES LIKE 'innodb_buffer_pool_size';    # 缓冲池大小
SHOW VIRIABLES LIKE 'innodb_log_buffer_size';    # 重做日志缓冲池大小
SHOW VIRIABLES LIKE 'innodb_additional_mem_pool_size';    # 额外内存池大小

The buffer pool is the largest part of the memory and is a cache for storing various data. InnoDB always reads the database file by page (16k per page) into the buffer pool, and then uses the least recently used (LRU) algorithm to retain the cached data in the buffer pool. If the database file needs to be modified, always modify it in the buffer pool first The dirty pages in the buffer pool are flushed to the file at a certain frequency.

View the buffer pool usage:

SHOW ENGINE innodb status\G;

After running, you can view it in BUFFFER POOL AND MEMORY:
Insert picture description here
Buffer pool size refers to the number of buffer frames, and each buffer frame is 16k. Free buffers refers to the number of free buffered frames. Database pages represents the number of buffered frames used. Modifued db pages represents the number of dirty pages.

The above command shows not the current state, but the state of InnoDB within a certain time in the past. The following statement is also included in the operation result: the
Insert picture description here
description is the average state in the past 7 seconds.

The types of data pages cached in the buffer pool include index pages, data pages, undo pages, insert buffers, adaptive hash indexes, lock information stored in InnoDB, and data dictionary information. Index pages and data pages account for a large part of the buffer pool.

The parameter innodb_buffer_pool_size specifies the buffer pool size. The parameter innodb_buffer_pool_awe_mem_mb under 32-bit Windows can enable the address window extension (AWE) function, breaking through the 32-bit memory usage limit, but once the AWE function is enabled, the adaptive hash index function will be automatically disabled.
Insert picture description here
The log buffer puts the redo log information into this buffer first, and then flushes it to the redo log file at a certain frequency. Generally, the value does not need to be large, because the redo log buffer is generally flushed to the log file in one second , You only need to ensure that the transaction volume per second is within this buffer size.

The additional memory pool is often ignored by DBA, but its value is very important. InnoDB manages memory through a method called memory heap. When allocating memory for some data structures themselves, you need to apply from the additional memory pool , When the memory in this area is not enough, it will apply from the buffer pool. The frame buffer in the buffer pool has corresponding buffer control objects. These objects record information such as LRU and locks. This object needs to be applied for from the additional memory pool. Therefore, when applying for a large buffer pool, you should also Increase the size of the extra memory pool.

The thread priority of the master thread is the highest, and its internal is composed of several loops: the main loop (loop), background loop (background loop), flush loop (flush loop), suspend loop (suspend loop), according to the status of database operation Switch between these cycles.

There are two major operations in the loop: operations per second and operations every ten seconds.

The loop loop is realized by thread sleep, which means that the operation every second or every ten seconds is not accurate, and there may be delay when the load is heavy.

Operations once per second:
1. The log buffer is flushed to disk, even if the transaction has not yet been committed. This explains why the commit time of a large transaction is fast.
2. Possibility to merge insert buffer. InnoDB will determine whether the number of IOs that have occurred in the current one second is less than five. If it is, it is considered that the current IO pressure is very small, and the merge insert buffer operation can be performed.
3. Flush up to 100 dirty pages in the InnoDB buffer pool to disk. InnoDB determines whether the ratio of dirty pages in the current buffer pool (buf_get_modified_ratio_pct) exceeds the innodb_max_dirty_pct parameter in the configuration file (default is 90, which means 90%). If it is, InnoDB thinks it needs to do a disk synchronization operation and write 100 dirty pages Into the disk.
4. If there is no user activity currently, switch to background loop.

Operations performed every ten seconds:
1. Flush 100 dirty pages to disk. InnoDB will first determine whether the disk IO operation is less than 200 times in the past ten seconds. If it is, it considers that there is enough disk IO capacity, so it flushes 100 dirty pages to disk.
2. Merge up to five insert buffers.
3. Flush the log buffer to disk.
4. Delete useless Undo pages. When performing operations such as update and delete on the table, the original rows are marked as deleted, but due to consistent reads, the version information of these rows needs to be retained, but in the process of full purge (delete useless Undo pages), InnoDB will judge the current Whether the rows that have been deleted in the transaction system can be deleted (sometimes there may be query operations that need to read the Undo information of the previous version), and up to 20 Undo pages are deleted each time.
5. Flush 100 or 10 dirty pages to disk. InnoDB will determine the proportion of dirty pages in the buffer pool. If it exceeds 70%, flush 100 dirty pages to disk, otherwise only flush 10% of dirty pages to disk.
6. Generate a checkpoint. InnoDB checkpoints are also called fuzzy checkpoints. During checkpoints, all dirty pages in the buffer pool are not written to disk, and only pages with the oldest log sequence number are written to disk.

The background loop performs the following operations:
1. Delete useless Undo pages.
2. Combine 20 insert buffers.
3. Jump back to the main loop.
4. Continue to refresh 100 pages until the conditions are met, and then jump to the flush loop to complete.

If there is nothing to do in the flush loop, it will switch to suspend_loop to suspend the master thread and wait for an event to occur. If the InnoDB engine is enabled but the InnoDB table is not used, the master thread is always in a suspended state.

Master thread pseudo code:

void master_thread() {
    
    
    goto loop;
loop:
    for (int i = 0; i < 10; ++i) {
    
    
        thread_sleep(1);
        do log buffer flush to disk
        if (last_one_second_ios < 5)
            do merge at most 5 insert buffer
        if (buf_get_modified_ratio_pct > innodb_max_dirty_pages_pct)
            do buffer pool flush 100 dirty page
        if (no user activity)
            goto background loop;
    }
    
    if (last_ten_second_ios < 200)
        do buffer pool flush 100 dirty page
    do merge at most 5 insert buffer
    do log buffer flush to disk
    do full purge
    if (buf_get_modified_retio_pct > 70%)
        do buffer pool flush 100 dirty page
    else
        buffer pool flush 10 dirty page
    do fuzzy checkpoint
    goto loop;

background loop:
    do full purge
    do merge 20 insert buffer
    if not idle:
        goto loop;
    else:
        goto flush loop;

flush loop:
    do buffer pool flush 100 dirty page
    if (buf_get_modified_ratio_pct > innodb_max_dirty_pages_pct)
        goto flush loop;
    goto suspend loop;

suspend loop:
    suspend_thread();
    waiting event
    goto loop;
}

Starting from InnoDB Plugin, you can use SHOW ENGINE INNODB STATUS to view the current master status information: it
Insert picture description here
can be seen that the main loop has run 45 times, and the sleep operation per second is performed 45 times (indicating that the load is not very large, and when the load is large, InnoDB optimization makes sleep less Run), the activity every ten seconds is executed four times, conforming to 1:10, the background loop is executed 6 times, and the flush loop is executed 6 times. This server has a relatively small load and can run at the theoretical value. If there is a pressure On a very large server, it may be the following scenario: It
Insert picture description here
can be seen that the main loop runs 2188 times, but the sleep per second only runs 1537 times. The load of the database can be reflected by the difference between the number of main loops and the sleep operation once per second.

InnoDB Plugin version is MySQL 5.1 version and later InnoDB engine.

InnoDB has restrictions on IO, and certain hard regulations are made when the buffer pool is flushed to the disk. With the rapid development of current disk technology, especially when solid-state disks appear, this kind of regulation limits InnoDB's disk IO performance, especially writing Into performance.

InnoDB will only flush up to 100 dirty pages to disk per second, and merge up to 20 insert buffers. If it is a intensive write reference, it may generate more than 100 dirty pages or more than 20 insert buffers per second, even if the disk can bear more IO, but InnoDB's hard and fast rules can only write so much. However, when a downtime needs to be restored, because a lot of data has not been flushed back to disk, the restoration may be slow, especially for insert buffers.

Due to the above shortcomings, InnoDB began to provide the parameter innodb_io_capacity to indicate disk IO throughput, and the default value is 200:
1. When merging insert buffers, the number of merging is 5% of the innodb_io_capacity value.
2. When refreshing dirty pages, the number of refreshing dirty pages is innodb_io_capacity.

If you use SSD or use several disks to make RAID, you can increase the value of innodb_io_capacity.

Before MySQL 5.1, the default value of innodb_max_dirty_pages_pct is 90, that is, dirty pages account for 90% of the buffer pool, but when the percentage of dirty pages is judged per second or flush loop, when it is greater than this value, 100 dirty pages are refreshed, so when the memory is very The speed of refreshing dirty pages will actually decrease when the load is large or the database is under great pressure. It may take more time during the database recovery phase. When someone adjusts this value to 10 or 20, they find that performance will be improved, but it will increase disk pressure and increase system burden. Starting from InnoDB Plugin, this value is changed to 75 by default, which can speed up the refresh frequency of dirty pages and ensure disk IO load.

InnoDB Plugin brings the innodb_adaptive_flushing (adaptive flushing) parameter, which affects the number of dirty pages refreshed per second. Previously, if the proportion of dirty pages in the buffer pool was less than innodb_max_dirty_pages_pct, dirty pages were not refreshed. When it was greater than innodb_max_dirty_pages_pct, 100 dirty pages were refreshed. With the introduction of the parameter innodb_adaptive_flushing, InnoDB will use a function named buf_flush_get_desired_flush_rate to determine the most appropriate number of dirty pages to flush. This function uses the redo log generation speed to determine the most appropriate number of dirty pages to flush. When the ratio of dirty pages is less than innodb_max_dirty_pages_pct, it will also A certain amount of dirty pages will be refreshed.

The insertion buffer, like the data page, is also an integral part of the physical page.

The clustered index can determine the physical order of records in the table. It is especially effective for columns that often need to search for range values. After using the clustered index to find the row containing the first value, you can ensure that the rows containing subsequent index values ​​are physically adjacent. Because the clustered index determines the physical order of records, there can only be one clustered index, but a clustered index can contain multiple columns.

There may be multiple non-clustered indexes (secondary indexes) on a table, and non-clustered index pages need to be accessed discretely. The characteristics of the B+ tree determine the discreteness of non-clustered index insertion. InnoDB pioneered the design of the insert buffer. For non-clustered index insertion or update operations, it is not directly inserted into the index page every time, but first to determine whether the inserted non-clustered index page is in the buffer pool, if it is, insert it directly, otherwise , First put in an insert buffer, on the surface the non-clustered index has been inserted into the leaf node, and then perform the merge operation of insert buffer and non-clustered index page child nodes at a certain frequency, so that multiple inserts can usually be merged into one During operation (because in an index page), the performance of nonclustered index insert and modify operations is greatly improved.

Insert buffer usage conditions (InnoDB automatically used):
1. The index is an auxiliary index.
2. The index is not unique. If it is unique, you need to check whether it is duplicated when inserting, and there will be discrete reads when searching, and the insert buffer loses its meaning.

If the application performs a large number of insert and update operations, and involves non-unique non-clustered index, if the database is down at this time, there will be a large number of insert buffers that are not merged into the actual non-clustered index, and recovery will take a long time, in extreme cases It may even take several hours to perform the recovery operation.

View the insert buffer information: The
Insert picture description here
seg size in the above figure shows that the current insert buffer size is 11336*16K, free list len ​​represents the length of the free list, and size represents the number of merged record pages. The three in the bottom row show the improved performance. inserts represents the number of records inserted, merge recs represents the number of merged pages, merges represents the number of merges, merges is about three times that of merged recs, which means that the IO request of the insert buffer for non-clustered index pages is about three times lower.

One problem with insert buffering is that under write-intensive conditions, insert buffering will take up too much buffer pool memory. By default, it will occupy up to 50% of the buffer pool memory, which may affect other operations. Percona released some patches to fix the problem of insert buffer occupying too much buffer pool memory.

The double write mechanism brings data reliability to InnoDB. When the database is down, the database may be writing a page, but only a part of it. We call it partial write failure. Before InnoDB used double write, there had been data loss due to partial write failure. When a write failure occurs, the redo log can be used to recover, but the redo log records the physical operations on the page. If the page is damaged, it is meaningless to redo it, that is, we are applying the redo log Before, you need a copy of the page to restore the page, and then redo, this is double write.

Insert picture description here
Double write consists of two parts, one is the double write buffer (2MB) in the memory, and the other is the continuous 128 pages in the shared table space on the physical disk, that is, two areas, each with 1MB.

When the dirty pages of the buffer pool are refreshed, they are not directly written to disk, but the dirty pages are first copied to the double write buffer in the memory through the memcpy function, and then the double write buffer is divided into two writes of 1MB each to the shared table space. On the physical disk, then immediately call the fsync function to synchronize the disk. This function does not delay writing (the content to be written is stored in the output buffer and then returns), but waits for the data to be written to the disk before returning. Since the double write page is continuous in this process, it is written sequentially, and the overhead is not very large. After the double write page is written to the disk, the page in the double write buffer is written to each table space file. At this time, the writing is discrete.

Check the double write operation:

SHOW GLOBAL STATUS LIKE 'innodb_dblwr%';

Run it: It
Insert picture description here
can be seen that the double write has written a total of 6,325,194 pages, but the actual number of writes is 100,399.

If the operating system crashes during the process of writing a page to disk, during recovery, InnoDB finds a copy of the page from the double write in the shared table space, copies it to the table space file, and applies the redo log.

The parameter skip_innodb_doublewtite can prohibit the use of the double-write function, which may cause write failure, but if there are multiple slave servers, this parameter can be enabled to provide faster performance. On the main server that needs to improve data reliability, please enable the double write function.

Some file systems provide a mechanism to prevent partial write failures, such as ZFS (Dynamic File System), so there is no need to enable two writes.

Hash lookup is fast and is often used in join operations, such as hash join in SQL server and Oracle, but these two databases do not support hash indexes. The default index type of MySQL's Heap storage engine is hash, while InnoDB has another implementation method called adaptive hash index.

InnoDB will monitor the lookup of the index on the table. If it is observed that the establishment of a hash index can bring a speed improvement, the establishment of the hash index is called adaptive.

The adaptive hash index is constructed by the B+ tree of the buffer pool, so the establishment speed is very fast, and there is no need to build a hash index for the entire table. InnoDB will build a hash index for some pages according to the frequency and mode of access. The design idea is to optimize the database.

View the current use of adaptive hash index:

SHOW ENGINE innodb STATUS;

Run it: The
Insert picture description here
displayed information includes the size of the hash index and the search conditions of the adaptive hash index per second.

The hash index can only be used to search for equivalent queries, and cannot be used for search types such as range searches. The non-hash searches/s in the above figure shows this situation.

Since InnoDB is one of MySQL's storage engines, the startup and shutdown of the InnoDB engine more accurately refers to the processing of the InnoDB table storage engine during the startup of the MySQL instance.

When the database is shut down, the innodb_fast_shutdown parameter affects the behavior of the innoDB table. The value of this parameter can be 0, 1, 2, and 0 means that all full purge and merge insert buffers must be completed when MySQL is shut down. This takes a period of time, even several hours to complete; 1 is the parameter default value, which means that the full pruge and merge insert buffer operations do not need to be completed, but the dirty pages of the buffer pool will be flushed to disk; 2 means that the full purge and merge insert buffer will not be completed, and the dirty pages in the buffer pool will not be flushed. Disk, only the logs are written to the log file, so that no transactions will be lost, but the next time the database is started, the recovery operation will be performed.

When the database is shut down abnormally, such as shutting down the database with the kill command, restarting the server while the MySQL database is running, and setting the parameter innodb_fast_shutdown to 2 when shutting down the database, the InnoDB table will be restored the next time the database is started.

The parameter innodb_force_recovery defaults to 0, which means that all recovery operations are performed when recovery is needed. If recovery is not possible (such as a data page corruption), the MySQL database may be down and errors will be written to the error log.

Sometimes we don’t need to perform a complete recovery operation because we know how to recover. For example, if an accident occurs when an alter table operation is performed on a table, the InnoDB table will be rolled back when the database is restarted. For a large table, this takes a long time. Even for a few hours, we can restore by ourselves, such as deleting the table, and then importing the data into the table from the backup, which may be much faster than the rollback operation.

innodb_force_recovery can also set 6 non-zero values, the big number includes the influence of all the previous small numbers:
1 (SRV_FORCE_IGNORE_CORRUPT): Ignore the corrupted page checked.
2 (SRV_FORCE_NO_BACKGROUND): Prevent the main thread from running. If the main thread needs to perform a full purge operation, it will cause a crash.
3 (SRV_FORCE_NO_TRX_UNDO): Do not perform transaction rollback operations.
4 (SRV_FORCE_NO_IBUF_MERGE): Do not perform insert buffer merge operation.
5 (SRV_FORCE_NO_UNDO_LOG_SCAN): Without viewing the undo log, InnoDB treats uncommitted transactions as committed.
6 (SRV_FORCE_NO_LOG_REDO): No roll forward operation is performed.

After innodb_force_recovery is set to a value greater than 0, only select, create, and drop operations can be performed on the table, and additions, deletions, and modifications are not allowed.

Simulate the occurrence of failure when innodb_force_recovery is 0:
Insert picture description here
start the transaction first to prevent automatic commit, and the update operation generates a large number of rollback logs. At this time, kill MySQL: When
Insert picture description here
MySQL restarts, it will perform a rollback operation on the update transaction This information is recorded in the error log. The default suffix is ​​err. You can get from the error log: It
Insert picture description here
can be seen that 8867280 rows of records were rolled back this time, and the total time was from 13:40:20 to 13:49:21, which took nine minutes Many minutes.

Simulate the above database downtime process again, but this time the innodb_force_recovery is set to 3, and then the error log can be seen: the
Insert picture description here
database has issued three exclamation mark warnings, prompting the user that the undo rollback will not be performed. The database will start soon, but the user needs to check the database status to confirm whether it does not need to be rolled back.

MySQL 5.1 began to adopt a plug-in architecture. Each storage engine in the source code was implemented through a handler's C++ base class (previous versions were mostly implemented through function pointers). The advantage of this design is that all storage engines are truly plug-in. If you find a bug in InnoDB before, you need to wait for a new version of MySQL to be released. After downloading, you need to recompile MySQL, but now you can download the new version of InnoDB storage directly The engine directly replaces the old engine with bugs, without waiting for the new version of MySQL, as long as the provider of the corresponding engine releases the new version.

In MySQL version 5.1.38 before, you need to download the Plugin file when installing InnoDB Plugin. After decompression, a series of installations are required. However, in MySQL 5.1.38, MySQL contains two different versions of InnoDB engine, one is the old one. The version engine (called build-in innodb), and the other is the InnoDB engine of version 1.0.4. If you want to use the new InnoDB Plugin engine, you only need to configure the following in the configuration file:
Insert picture description here
MySQL 5.5.5 starts InnoDB becomes MySQL The default storage engine.

Guess you like

Origin blog.csdn.net/tus00000/article/details/112095472