First, the storage engine

1, InnoDB engine

Application is designed for online transaction (OLTP) process.

Supports transactions, row-level locking, support for high concurrency through multi-version concurrency control (MVCC), provide consistent non-locking read, next-key locking to avoid phantom read, the primary key clustered index

2, MyISAM engine

It is designed for OLAP applications.

Does not support transactions do not support row locks, table locks designed to support full-text indexing

3, other storage engines

slightly

Two, InnoDB architecture

1, threading model

InnoDB storage engine is multi-threaded model, the background there are a number of different threads to handle different tasks.

Master Thread: kernel threads, asynchronous data buffer pool flushed to disk
IO Thread: io responsible for handling callback request
Recycling responsible undo page: Purge Thread
Refresh responsible for dirty pages: Page Cleaner Thread

1.1、Master Thread

A plurality of cycles of the internal components. Comprising a main loop (loop), the background loop (background loop)

Every second main loop operation

The log buffer is flushed to disk, even if the transaction is not yet committed. Well explained and then a big transaction commit time is very short
The combined insert buffer
(After a configurable, automatic adjustment, version 1.2) up to the n-th refresh dirty pages to disk
No user activity, switch to the background loop

Operation of main loop every 10 seconds

The combined insert buffer up to 5
Log buffer is flushed to disk
Undo delete unwanted pages
Flushing dirty pages to disk (more than 70 percent, a 100, not more than 70%, refresh 10)

2, memory model

2.1, buffer pool

InnoDB is a disk-based storage systems, in order to bridge the gap cpu and disk performance and data read out from the disk is stored in memory, the next read first from the buffer pool reads. Esen update data to update the data buffer pool by checkpoint mechanisms written back to disk. Including the pool index pages, data pages, pages Use the undo, insert cache, lock information,

2.2, buffer pool management (LRU List)

Recently made less use of algorithms, the most frequently used pages in the List front pages the least used at the end of List. When pool capacity is insufficient to accommodate the new data, starting with the release of the tail of the data page. The new data is inserted in the List of Midpoint (List of 5/8, the optimization simple LRU, LRU simple insertion head List. Avoid the large number of pages of frequently used queries disposable brush buffer pool)

2.3, dirty pages management (Flush List)

When the data is updated, the first data buffer pool is updated, the page after the modification known as a dirty page. Dirty pages saved to Flush List by checkpoint mechanisms dirty pages of data written to disk

2.4, redo log (redo log) buffer

First, the information stored in the redo log buffer, and then follow a certain frequency synchronized to the redo log file. The following three cases will trigger synchronous redo log buffer to the redo log file:

Master Thread refresh every second
Each transaction commit time
Redo log buffer pool capacity reaches a threshold value, typically 1/2

2.5, check point technology

To prevent downtime resulting in an uncommitted transaction information is lost when a transaction is committed, first save data to the redo log (redo log), and then modify the page. To ensure the durability of (D)

The occurrence of downtime, automatic restart after the redo logs to recover data.

But there are the following problems:

Redo log is too large, too slow down restart recovery data
Redo logs can not be unlimited expansion, we need to be recycled
Redo log is not available how do

check point is to solve these problems:

Database shorten recovery time
Redo logs are unavailable, refresh the page dirty
Buffer pool is not enough, the dirty pages are flushed to disk

check point trigger timing:

Master Thread check point. Triggered once every second
LRU List check point. LRU List to ensure that there are 100 free pages, the pages have to clean up if dirty pages, trigger check point to force a refresh dirty pages to disk
Dirty Page too mush check point. Too many dirty page threshold is exceeded, trigger check point to force a refresh dirty pages to disk

3. Key Features

3.1, insert cache

(1) why the need to insert cache?

We know that the index is divided into clustered index and non-clustered indexes.

Generally unique id clustered index increment, the data records in order to store the page, while the write data does not need another random read page, fast write (if UUID as the primary key, the writing speed very slow, each rewrite require random read)

Practical applications, there is often a table of non-clustered index exists. Non-clustered index leaf node is inserted in order not need to access a discrete non-clustered index pages, reading random insertion of data led to performance degradation. Insertion of cache is to optimize the insertion speed in this scenario

(2) what the scene will trigger insert cache?

The index is a secondary index
Index is not a unique index

For non-clustered index insertion, first determines whether the non-clustered index page in the buffer pool, if the buffer pool, directly into the index page, if not the first, into the insert buffer object, then a certain frequency the insert buffer data in the leaf nodes and non-clustered index data consolidation

(3) The principle

Insert buffer data structure is also the B + tree, to insert the recording time, the recording will be packaged, numbered according to the insertion order of recording, the writing order is

3.2, write twice

(1) why the need to insert two write?

If a page InnoDB data is being written to disk, just write down a portion of the time. This condition is called partial write failure will result in data loss

(2) The principle

double write consists of two parts. Part of double write buffer, a portion of the physical disk is continuously shared space. When flushing dirty pages of data, make a copy of a dirty page data to the write cache twice, writes in sequence shared disk (because it is written order not affect performance). Finally, write data storage disk (discrete write)

3.3, adaptive hash index optimization

hash is very fast query time is responsible for the degree of O (1). The number of times to find a B + tree depends on the height of the tree.

If a page is accessed frequently, but also the same access mode (using the joint index leftmost principle). Will automatically speed up the search for this page of data to establish Hash index based on the index buffer pool

3.4, asynchronous IO

May, after issuing an IO request, the IO request issued by another, no need to wait for the next on the IO request processing is completed. Put all IO requests are sent, waiting for the completion of all IO operations, which is AIO (Aysnc IO)

Third, file

MySQL Database and InnoDB storage engines There are many types of files, each file different purposes. There are parameter files, sokcet files, pid files, log files, file table structure, file storage engine

1, the log file

Error Log: Record up and running and close encounter an error message
Query log: a record of all records of the query
Binary files (binlog): records all data changes. For data recovery and data replication. Binary log uncommitted transaction will be stored in the buffer, the buffer will be directly synchronized to log binary files such as when the transaction commits. After the write buffer can specify how many times through configuration synchronization to disk, if the value is greater than 1, when downtime occurs may lose data
Slow query log: query time record of more than a specified threshold value

2, InnoDB storage engine files

Table space files: data storage
Redo log files: store transaction logs

Fourth, the table

1, index-organized tables

InnoDB, the table data are organized according to the order stored in the primary key. Each table has a primary key, if the definition of the primary key, not shown, will unique index as the primary key. If no unique index is created automatically 6-byte pointer as a primary key

2, storage structure

All data is stored in the table space, and table space by the segment, region, page composition

Group: Table space defined by the respective segments. Management segment is done by the engine itself
District: Each district size of 1M, consisting of continuous pages
Page: smallest unit of disk management, the default page size of 16k each
Line: the data is stored in rows, up to a page stored 16k / 2-200 = 7992 rows

[Inside Mysql InnoDB storage engine] study notes