Mysql - InnoDB storage engine Detailed

Original link: https://www.jianshu.com/p/519fd7747137

Author: small North seek

Mysql5.5 version from the start, InnoDB is the default storage engine tables. It characterized by line lock design, MVCC support, support foreign keys, providing read consistency non-locking, while being designed to use the most efficient use of memory as well and the CPU.

The main contents:

  • InnoDB architecture
  • CheckPoint technology
  • InnoDB Key Features

A, InnoDB architecture

Brief description of the architecture of FIG InnoDB storage engine:

img

InnoDB storage engine has a plurality of memory blocks, memory blocks which constitute a large memory pool. Background thread is responsible refresh the memory pool, refresh the modified data to disk and so on. Next we were introduced background thread and memory pools.

1.1 background thread

InnoDB background there are a number of different threads, to be responsible for different tasks. There are as follows:

  • Master Thread
    This is the core of a thread, it is mainly responsible for the data buffer pool asynchronously flushed to disk to ensure data consistency, including stolen refresh the page, insert buffer merge, recycling UNDO page.
  • IO Thread
    in InnoDB storage engine makes extensive use of asynchronous IO to process the write request IO, IO Thread work is mainly responsible for these callback handler IO requests.
  • Purge Thread
    after the transaction has been committed, undo log may no longer be needed, it is necessary to recover Purge Thread has been used and allocated undo page. InnoDB supports multiple Purge Thread, doing so can speed up the recovery undo page.
  • The Thread Cleaner Page
    Page Cleaner is the Thread in InnoDB 1.2.x version of the newly introduced, its role is to release a refresh operation before the dirty pages are put in a separate thread to complete, so reducing the work and Master Thread for user queries blocking threads.

1.2 Memory

This part, I will be good information online try to integrated, intuitive describe.
InnoDB storage engine is a disk-based storage, which means that data is stored on disk, because the gulf between CPU speed and disk speed, engine InnoDB buffer pool to use technology to improve the overall performance of the database. When the buffer pool is simply a memory area. Operate read page in the database, the page is first read from the disk is stored in the buffer pool, the next read the same page, the page is not the first determination buffer pool, if in saying the page is in the buffer pool hit directly read the page. Otherwise, read the page on disk. For the modification operation page database, modify the first page buffer in the pool, and then at a certain frequency flushed to disk, not every change to refresh the page back to disk.

Pool of the cache page data types: index page, data pages, pages Use the undo, insert buffer, adaptive hash index, InnoDB lock information, data dictionary information. Index and data pages accounted for a large part of the buffer pool (know these pages, these pages can be used as a noun, not confused). In InnoDB, the default page size buffer pool is 16KB.

img

We already know that Buffer Pool is actually a continuous memory space, it is now faced with the question: how the page on the disk cache to memory Buffer Pool in it? Directly to the need to cache a page inside a hate to Buffer Pool in it? No, no, in order to better manage these cached pages, InnoDB buffer for each page creates a so-called control information, the control information includes the number of the page table space belongs, page number, page Buffer Pool in It addresses some of the key data and information on LSN (LSN'll let you lock and ignored), and of course there are some other control information.

Each cache control information corresponding to the page memory size is the same, we put the control information corresponding to each page takes a chunk of memory called a control block it, block and control the cache page is one to one, they all is stored in the buffer Pool, in which the control block is stored in buffer Pool front edge, is stored in the cache page buffer Pool back, so that the entire memory space corresponding buffer Pool that looks like this:

img

The debris between the control block and a cache page is what is it? After you think, ah, each cache control block corresponds to a page that in allocate enough control blocks and cached pages, may remain that something is not enough space and a pair of control block cache page size, nature could not be used to myself, this was a little less than the memory space is called fragments. Of course, if you set the size of the Buffer Pool just good, it may not produce debris ~

Earlier we know the structure of the buffer pool. Then said the InnoDB storage engine is how to manage the buffer pool. I was confused at speaking it, (this part you can not see, because some terms not related to) see "Mysql Technology Insider," the book, speak more LRU List, Free List is just passing over, not the free list the initial allocation is how or what kind of structure, leading to the management of the buffer pool I always imagine not come out, then go through to find information on the internet, be figured out. Look at it:

When we first start the MySQL server, you need to complete the initialization process of the Buffer Pool, it is the Buffer Pool allocation of memory space, divide it into several pages and cache control block. But at this time and no real disk pages are cached in the Buffer Pool (because there are not used), then with the running program, and will continue to have a page on the disk is cached Buffer Pool, then the question is, reads a page from disk to buffer Pool in which position when the cache pages into it? Or how to distinguish Buffer Pool in which cache pages are free, which has been used for it? We'd better look somewhere record which pages are available, we can put all free pages packed into a node to form a linked list, this list may also be called Free the list (or the free list). Because just completed initialization Buffer Pool all cached pages are idle, so every cached page will be added to the Free list, assuming that the number of cached pages in the Buffer Pool can accommodate up to n, that increases Free list of renderings is this:

img

As can be seen from the figure, we have to manage this Free list, specifically defines the head of the linked list node address, the tail node address, and number of the current linked list node information of a control information inside of this list contains child. We have recorded a certain cache page address in the control block of each node in the Free list, and each page cache control blocks are recorded with a corresponding cache page address, the equivalent of each node corresponds to a list Free free cached pages.

With this thing will be easier Free list, whenever a page needs to be loaded from disk into the Buffer Pool, to take a free cache page Free from the list, and the information control block corresponding to the cache page fill, then the corresponding cache page Free list node is removed from the list, indicating that the page buffer has been used -

I think this part of the original author in a very good map is also very hard, I directly address the question.

Do not go too far and forget why departure.
A brief review, why talk about free list? To say how to manage buffer pool bar. That is the equivalent of free list database service does not just start when the data page, data structure maintenance free buffer pool cache pages.

Here again briefly review the working mechanism of the Buffer Pool. Buffer Pool two main functions: One is accelerated reading, is to accelerate a write. Accelerate read it? That is, when the need to access a data page, if the page is already in the buffer pool, then you no longer need to access the disk, you can directly obtain the content of this page from the buffer pool. Accelerated write it? When a page needs to be modified is when the first page in the buffer pool changes, write down the related redo log, even if modify this page has been completed. As for the modified pages are flushed to disk when really, this is a background refresh thread to complete.

At the same time achieve the above two functions need to be considered to objective conditions, because the machine's memory size is limited, so the size of the MySQL InnoDB Buffer Pool also is limited, if a cache page size in memory of more than Buffer Pool size, that is, the list has no extra free free cached pages when not it be embarrassing, there was such a thing the country? Of course some of the old cached page removed from the Buffer Pool, a new page and then put to myself - so the question is, what cached page removed it?

To answer this question, we need to return to our original intention to set up Buffer Pool, we just want to I / O interaction and reduce disk, the best time when accessing a page of it have been cached in the Buffer Pool . Suppose we visited a total of n times a page, then the page has been visited already divided by n is called the cache hit rate is the number of times the cache, our expectation is to make the cache hit rate as high as possible -

How to improve the cache hit rate it? InnoDB Buffer Pool classical LRU algorithm to eliminate the page, in order to improve the cache hit rate. Buffer Pool when there are no more free cache pages, we need to eliminate part of the cache of recently used page. But how do we know which page cache frequently used recently, which recently rarely use it? Oh, the magic of the list once again came in handy, we can then create a list, this list is due to the principle of least recently used to eliminate caching pages, so this list may be called LRU list (Least Recently Used) . When we need to access a page, it can be treated LRU list:

  • If the page is not in Buffer Pool, when the page is loaded from disk into the cache Buffer Pool page, put the package into the cache page header stuffed node list.
  • Buffer Pool If the page, the page directly to the corresponding LRU list node moves to the head of the list.

But doing so will have some performance problems, such as your full table scan or a logical backup data to put the heat rushed over, it will lead to pollution problems leading to the pool! All data pages Buffer Pool in the blood have been changed once, in other query execution time and have to be loaded from disk to perform a Buffer Pool operations, and the frequency of such statements to perform a full table scan is not high, every time shall be performed in the buffer Pool in the cache page for once the blood, which seriously affect the use of other queries buffer Pool, seriously reduces the cache hit rate!

So InnoDB storage engine for the traditional LRU algorithm to do some optimization, joined the midpoint in InnoDB. New page read, although the latest access pages, but not directly into the head of the LRU list, but the insertion midpoint position of the LRU list. This algorithm is called the midpoint insertion stategy. The default configuration is inserted to 5/8 of the length of the list. midpoint controlled parameter innodb_old_blocks_pct.

Before the midpoint list called the new list, which is called after the old list. The page can simply be understood as a new list of the most active hot data.

Meanwhile InnoDB storage engine is also introduced to represent innodb_old_blocks_time mid position after the page read to how long to wait before being added to the hot end of the LRU list. This parameter can be set to ensure that data is not easily to be hot brush.

After a good, basic scored LRU list, we continue. Earlier we talked about in the buffer pool page update is performed first, and that page on disk and it is inconsistent, so the cache page is also known as dirty pages (English name: dirty page). So consider these pages to be modified when flushed to disk? In what order flushed to disk? Of course, the easiest way is to take place once every modification immediately synchronized to the corresponding page on the disk, but frequent data to disk write performance will seriously affect the program (after all disk slow like a turtle). So after each modification cached pages, we are not in a hurry to immediately modify synchronization to disk, but to sync some future point in time, the background refresh thread in turn flushed to disk, modify the landing to achieve disk.

But if you do not immediately synchronized to disk, then the time after resynchronization of how we know Buffer Pool in which pages are dirty pages, which pages have never been changed it? We can not put all the cached pages are synchronized to disk it, if Buffer Pool is set large, say 300G, so much that one-time synchronization of data are we going to slow death! So, we have no choice but to create a linked list to store dirty pages, those who have been modified pages in the LRU chain is required to join this list, because the pages are in this list need to be flushed to disk, also called FLUSH list, sometimes abbreviated as FLU list. Free list of construction and almost list, which will not go into details. Herein refers to modified dirty pages after the page is first loaded into Buffer Pool is modified, only the first modified if need be added to the list FLUSH (code according oldest_modification Page head == 0 to determine whether the the first modification), if the page is modified again will not put a FLUSH list because it already exists. Note that, the actual data are still dirty pages LRU linked list, and dirty page record FLUSH simply point the list of dirty pages in the LRU linked list through pointers. FLUSH and the dirty pages list is sorted according to oldest_lsn (This value represents the number lsn when the first page is changed, the corresponding values ​​oldest_modification, each page record head) flushed to disk, the smaller the value for a the first to be refreshed, to avoid data inconsistencies.

Note: dirty pages present both in the LRU list, there is also the Flush list. LRU list is used to manage the availability of buffer pool page, Flush used to manage the list will refresh the page to disk, the two affect each other.

The relationship between these three important list (LRU list, free list, flush list) may be represented by the following diagram:

img

Free list relationship with each other LRU list is circulating, in the pages back and forth between the two lists replacement. FLUSH dirty list and the recorded page data, also through the pointer to the LRU list, the list FLUSH FIG LRU chain is wrapped.

Two, CheckPoint technology

Having the pool, said the following CheckPoint technology.
CheckPoint technology is used to solve the following issues:

  • Database shorten recovery time
  • When the pool is not enough, the dirty pages are flushed to disk
  • When redo logs are not available, flushing dirty pages

Shorten recovery time database, redo log records the location of the checkpoint, the page prior to this point have been flushed to disk, you only need to redo log after checkpoint recovery. This greatly reduces recovery time.

When the pool is not enough, according to the LRU algorithm, spilling the least recently used page, if the page is dirty pages, enforce checkpoint, the dirty pages are flushed to disk.

Redo logs are not available, refer to redo this portion of the log can not be covered, and why? Because: Because redo log is designed to be recycled. This part of the corresponding data is not flushed to disk. Database recovery, if not this part of the log can be overwritten; if necessary, they must enforce checkpoint, a page buffer pool to refresh at least the current position redo logs.

Every checkpoint how many pages are flushed to disk? Where to get dirty every page? What time-triggered checkpoint?

Internal storage engine InnoDB two checkpoint, were:

  • Sharp Checkpoint
  • Fuzzy Checkpoint

Sharp Checkpoint occurs when the database is shut down, all the dirty pages to refresh back to disk, which is the default mode of operation, namely parameters: innodb_fast_shutdown = 1.
Does not apply to refresh the database runtime.

The database is running, InnoDB storage engine uses internal Fuzzy Checkpoint, just refresh the dirty part of the page.

Fuzzy Checkpoint several cases occur:
①MasterThread Checkpoint
asynchronous refresh every second or every 10 seconds to refresh a certain percentage of the page to disk from the buffer pool dirty pages list. Asynchronous refresh, that this time the InnoDB storage engine can perform other operations, the user query thread is not blocked.

Checkpoint ②FLUSH_LRU_LIST
InnoDB storage engine needs to ensure LRU list of almost 100 free pages available. Before InnoDB 1.1.x version, the user query thread checks LRU list if there is enough space to operate. If not, LRU algorithm, the overflow end of the LRU list of pages according to these pages if there are dirty pages need to be checkpoint. Hence the name: flush_lru_list checkpoint.

InnoDB 1.2.x beginning, this check on a separate process (Page Cleaner) in progress. Benefits: 1. reduce the pressure to reduce the user master Thread 2. Thread blocked.

Set parameters: innodb_lru_scan_dept: control the number of available pages in the LRU list, the default value 1024

③Async / Sync Flush Checkpoint
refers to the situation redo log is not available, you need to force a refresh pages to disk, at this time when the dirty pages list the selected page.
This case is to ensure the availability of the redo logs, it means, the redo logs can be recycled covered part of the space is too small, put it another way, is a very short time produced a large number of redo log.
Then there will be several variables, diagrams is not difficult, take a closer look.
InnoDB storage engine, by LSN (Log Sequence Number) labeled version, the LSN is 8-byte number. Each page has LSN, there redo log LSN, checkpoint there LSN.
Written to the log LSN: redo_lsn
refresh the page LSN date back to disk: checkpoint_lsn
defined as follows:
checkpoint_age = redo_lsn - checkpoint_lsn
async_water_mark = 75% * total_redo_file_size
sync_water_mark = 90% * total_redo_file_size
refresh process as shown below:

img

④Dirty Page too much Checkpoint
that is too dirty pages, mandatory checkpoint. Pool is available to ensure enough pages.
Parameters: innodb_max_dirty_pages_pct = 75 represents: when the number of dirty pages in the buffer pool 75%, force checkpoint. After 1.0.x default 75

Three, InnoDB Key Features

Insert buffer 3.1

Insert Buffer is a key feature of InnoDB storage engine is the most exciting and excitement of a function. But this name might lead people to think that the insert buffer is an integral part of the buffer pool. In actual fact, InnoDB buffer pool has Insert Buffer information is good, but the Insert Buffer and data pages, just as a part of physical pages.

In general, the primary key is a unique identifier for the row. Usually, the application order of rows in the insertion recording is carried out according to the master key is inserted in increasing order. Accordingly, the insertion sequence is generally clustered index, it does not require random disk reads. Because, for insertion rate in such cases is still very fast. (If the primary key class is a class the UUID, and then inserted as a secondary index, it is random.)

If the index is not the only non-aggregated. During insertion, storage of data for non-clustered index leaf node is inserted not in sequence, then the need to access a discrete non-clustered index page, due to the presence of the random access performance degradation leads to insertion. This is because the characteristics of the B + tree determines the non-clustered index insertion of discrete.

Insert Buffer design, inserts and updates to the non-clustered index is not directly inserted into each index page, but the first non-clustered index is determined whether the page is inserted in the buffer pool, if present, directly inserted, does not exist, the first into a Insert Buffer object. This database has been inserted into the non-clustered index leaf nodes, but not actually, but stored in another location. Then to merge (merging) and an auxiliary operation Insert Buffer index page child nodes and with some frequency, the time can generally inserted into one of a plurality of (since a page index), which greatly improve the operational for a non-clustered index insertion performance.

Two conditions need to be met:

  • The index is a secondary index;
  • The index is not unique.

Secondary indexes can not be unique, because when inserting the buffer, not the database to find the index page to determine the uniqueness of the inserted record. If you go and there will certainly find a discrete reading occurs, leading to Insert Buffer meaningless.

3.2 write twice

If the insert buffer to improve write performance, then two write is to improve reliability.

Before introducing the two write, talk about partial write failed:
to imagine such a scenario, when the database is writing a page of data from memory to disk, database downtime, resulting in the write-only part of the data page, which is part of the write fails, it can cause data loss. At this time can not be recovered by redo logs, because of redo log is to modify the physical page, if the page itself is damaged, redo log can not do anything.

From the above analysis, we know that, in the case of a partial write failure, before we apply redo logs, you need a copy of the original page, two write is to solve this problem, it is the following schematic:

img

image.png

Two write need to add additional two parts:
1 twice) memory write buffer (doublewrite buffer), the size of 2MB
2) shared disk table space contiguous 128, but also for the size 2MB

The principle is this:
1) When the refresh buffer pool dirty pages are not written directly to the data file, but the first copy to twice the memory write buffer.
2) twice and then write the shared buffer is written to disk twice the table space, each write 1MB
. 3) after the second step to be complete, then the two data file write buffer write

This can solve part of the problem mentioned above write failure, because there is already a copy of a copy of data pages in the table shared disk space, process data if the database file is written down in the page, in the instance recovery, from find the page copy of the shared table space, it will overwrite the original copy of the data page, then apply the redo logs can be.

Step 2 is where the performance overhead, but the disk shared table space is continuous, so the cost is not great. Function parameters can be written by two skip_innodb_doublewrite disabled, is enabled by default, it is strongly recommended that you turn the feature.

MySQL InnoDB features: two write (DoubleWrite)
InnoDB characteristics - The two write

3.3 Adaptive hash indexes

Hash lookup is a very fast method, in general, the time complexity is O (1). Find the number of the B + tree, depending on the height of the B + tree in the build environment, the height of the B + tree is generally 3-4 layers, the query does not need to 3-4 times.

InnoDB storage engine query will monitor each index page on the table. If the hash index can be observed to enhance the speed established, which resumes hash indexes, hash indexes called adaptive (Adaptive Hash Index, AHI). AHI by a B + tree configuration page from the buffer pool. Therefore, the establishment of very fast, and do not build a hash index for the whole table. InnoDB storage engine will automatically be automatically set up for some hot hash index page based on the frequency and mode of access.

AHI has a requirement for continuous access mode (query) of this page must be the same. For example, joint index (a, b) which have the following access modes:

  • WHERE a=XXX;
  • WHERE a = xxx AND b = xxx .
    If the above two alternate query, InnoDB storage engine does not AHI structure of the page. AHI addition there are the following requirements:
  • Visited 100 times in this mode;
  • Pages visited by the pattern N times, where N = pages recording / 16.
    According to official documents show, after enable AHI, read and write speed can be increased by 2 times, responsible for operational performance index of links can be increased by 5 times. The design idea is to liberalize the database, the database without the need to artificially adjust the DBA.

3.4 Asynchronous IO (AIO)

In order to improve the performance of disk operations, the current database systems use asynchronous IO way to handle disk operations. InnoDB is also true.

And AIO corresponds Sync IO, that is, once for each IO operation, you need to wait for the end of this operation in order to continue the next operation. However, if a user issues a query index scan, then this SQL statement may need to scan multiple index pages, that require multiple IO operations. In each scan a page and then wait for it to complete the next scan, it is not necessary. The user may then issue an IO request, after issuing an IO request immediate addition, when all IO request is completed, the waiting for completion of all IO operations, which is AIO.

AIO Another advantage is IO Merge operation, that is, multiple IO IO operations into one, this can increase IOPS performance.

Before InnoDB 1.1.x, AIO is achieved by InnoDB storage engine simulation code. But after this, providing support for the AIO kernel level, called the Native AIO. Native AIO requires operating system support. Windows and Linux are supported, but Mac is not available. In the choice of operating system, MySQL database server, you need to consider factors in this regard.

MySQL can decide whether to enable Native AIO parameter innodb_use_native_aio. In the InnoDB storage engine, read ahead to read all the way through AIO completed, refresh the dirty pages, but also by AIO completed.

3.5 adjacent page Refresh

InnoDB storage engine when refreshing a dirty page, the page will detect all the pages where the area (extent), if it is dirty pages, then refresh together. The advantage of this is that more IO can be written by a AIO operations into IO operations. The working mechanism has significant advantages in the traditional mechanical disk. However, two issues need to be considered under the bar:

It is not the very dirty page is written to, and then the page will soon become a dirty page?
SSDs have a high IOPS, do I need this feature?
e_aio to decide whether to enable Native AIO. In the InnoDB storage engine, read ahead to read all the way through AIO completed, refresh the dirty pages, but also by AIO completed.

3.5 adjacent page Refresh

InnoDB storage engine when refreshing a dirty page, the page will detect all the pages where the area (extent), if it is dirty pages, then refresh together. The advantage of this is that more IO can be written by a AIO operations into IO operations. The working mechanism has significant advantages in the traditional mechanical disk. However, two issues need to be considered under the bar:

It is not the very dirty page is written to, and then the page will soon become a dirty page?
SSDs have a high IOPS, do I need this feature?
To this end InnoDB storage engine version 1.2.x began to provide parameters to decide whether to enable innodb_flush_neighbors. For the traditional mechanical hard disk is recommended, while for solid-state drives can be closed.

Released four original articles · won praise 0 · Views 510

Guess you like

Origin blog.csdn.net/The_Inertia/article/details/105018084