One article is enough to learn MySQL storage engine!

Storage engine

The database storage engine is the underlying software organization of the database, and the database management system (DBMS) uses the data engine to create, query, update, and delete data. Different storage engines provide different storage mechanisms, indexing techniques, locking levels, and other functions. Using different storage engines, you can also obtain specific functions. Many database management systems now support a variety of different data engines.

The storage engines mainly include: 1. MyIsam, 2. InnoDB, 3. Memory, 4. Archive, 5. Federated.

MyIASM

MyIASM is the default engine before MySQL 5.5, but it does not provide support for database transactions, nor does it support row-level locks and foreign keys.

Nonclustered index

The index file and the data file are separate.

advantage

It can ensure that the height of the tree becomes smaller, less than or equal to 3, the lower the height, the fewer the number of disk IOs, the higher the query efficiency, and it does not take up a lot of memory and storage resources.

Disadvantage

Transactions are not supported. Transactions operate on indexes, and MyIASM's index files and data files are separated.

When inserting or updating data, the entire table needs to be locked, which is less efficient.

InnoDB

The default storage engine after MySQL5.5 supports transactions.

Here is the default, which means that you can still use MyIASM

When creating a table, add ENGINE=MYIASM at the end to use the MYIASM storage engine.

Its index and data are in the same file (clustered index)

Clustered index

The clustered index is that the index and the data are in the same file, which contains the data of the entire table, rather than only the index. Therefore, in Innodb, the clustered index is the table, so it does not require independent row storage like myisam. Each leaf node of the clustered index contains the primary key value, transaction ID, MVCC rollback pointer and the values ​​of all remaining fields. If the primary key is a column prefix index, innodb will also contain the complete primary key column and the remaining columns Value.

advantage

Maximize the performance of IO-intensive applications.

Because InnoDB stores index columns and related data rows together. This means that when you access different rows of the same data page, you have already loaded the page into the buffer. When you access it again, you will complete the access in memory without accessing the disk. Memory query speed is 10,000 times faster than disk query speed

Disadvantage

If you store a very large field, then it may occupy one node, or two nodes, or even more. The larger the data, the more records in the table, the higher the height of the tree. Because the data is all in the leaf nodes, there is more data , There are more leaf nodes, resulting in more parent nodes, and the height of the tree is also higher. Therefore, after mysql5.5, it is recommended that our data not exceed 10 million. If it exceeds 10 million, we will sub-database and sub-table.

Try to store large fields separately, such as pictures, large texts, etc., which cannot be thrown into the database, and are stored in a special file system, such as fastDFS, MongoDB.

Only put some relatively small fields in the database, which can improve query efficiency.

B+ tree

InnoDB's data structure is B+ tree

Among them, non-leaf nodes only have indexes and pointers, and no data. Leaf nodes only store indexes and data, not pointers.

The MySQL database reads the size of one page at a time by default, one page is 4 disk blocks, and one disk block occupies 4KB, so the size of a page is 16KB, which means that a node is 16KB.

An index occupies 8 bytes, and a pointer occupies 6 bytes. The first-level node can hold 1170 indexes.

The first-level node evolves down to the second-level node. There are 1170 indexes and 1170 pointers in the first-level node, and one pointer points to a second-level node. Then how many second-level nodes are, 1170 second-level nodes. Non-leaf nodes can accommodate 1170 Index, then the secondary node can accommodate a total of 1170*1170=136W indexes.

Taking the third-level B+tree as an example, the third-level node acts as a leaf node and only stores indexes and data. If a record is 1KB, a third-level node can store 16 records. A total of 1170 * 1170 * 16 = 2100W records. Note that the height of the tree is only three levels now. To query data, you only need to perform disk IO twice. Why is it two? Because MySQL has a feature, the root node of each table is loaded into memory by default. Ten thousand records, two disk IO, so the performance of B+tree can be imagined.

B+tree has another advantage. Its leaf nodes are an ordered list, which is a linked list. There is no need to sort, which reduces CPU consumption. To be more specific, it is a doubly linked list. The link direction of the singly linked list is one-way Yes, the access to the linked list starts from the head through sequential reading. In a doubly linked list, starting at any node, you can easily visit the predecessor and successor nodes. Range queries can be more efficient.

InnoDB's MVCC technology

MVCC is a multi-version concurrency control mechanism and a technology to improve concurrency. Under the MVCC protocol, each read operation will see a consistent snapshot, and non-blocking reads can be realized.

Why use MVCC

  1. MySQL's storage engine is InnoDB.This large transactional storage engine only relies on a row lock mechanism and cannot meet the needs of concurrency.They are often used in conjunction with other mechanisms, such as MVCC.
  2. Although row locks can control concurrent operations, the system overhead is relatively large, and MVCC can replace row locks in most cases.Using MVCC can reduce system overhead and improve concurrency.

MVCC principle

The principle of MVCC is similar to copyonwrite. What is copyonwrite? Copy-on-write and separate read-write.
For example: Lao Wang and Lao Zhang are reading a good-looking book, and Lao Wang wants to write and draw on the book. If this happens, Lao Zhang is unhappy. What do you write, interrupt my reading, so whoosh Whoosh~~, Pharaoh changed a copy, and he scribbled on the copy. Old Zhang continued to read the old Zhang's. After Pharaoh was done, the original citation pointed to the copy, and the modification operation was completed. .
This is achieved by saving a snapshot of the data at a certain point in time. MVCC allows data to have multiple versions, which are controlled by a timestamp or a globally increasing transaction ID. At the same point in time, different transactions see different data of.

How does MVCC in InnoDB perform concurrency control

It adds two fields after each row of records, which are the created version and deleted version of the row, and filled in is the version number of the transaction. This version number continues to increase with the creation of the transaction.

For example, we need to add, delete, modify, and check a row of records.

Perform the insert operation, the creation version number is 1, this creation version number is the ID of the insert transaction.

The update operation is implemented in two steps, first delete, then add, first add 1 to the deleted version number of this row to equal 2, and use the deleted version number as the current transaction ID, then perform the new operation, and add this The creation version number of a row is assigned the value 2.

When the delete operation is executed, the delete version number of the row plus 1 equals 3, and the current transaction ID is 3.

When performing a select operation, two conditions need to be met, and the following two conditions will be queried.

1. The current transaction ID must be greater than or equal to the creation version number of the current row, which means that the row of data already exists before the transaction starts.

2. The current transaction ID must be less than the delete version number of the current row, which means that when the query transaction starts, the row record exists, and the row record is deleted after the query transaction starts. If it is deleted, it will be deleted and deleted Also to be checked out, this is repeatable reading.
······································ ··················································· ·A
simple narrative is a bit boring, but as long as you go through the calculation process independently, I believe you will definitely gain!
Did you lose school

Guess you like

Origin blog.csdn.net/numbbe/article/details/109300087