The storage structure of the database

B-trees and B+ trees are mainly used to build high-speed indexes on disks, such as file systems and databases.

There are two main ways to store data in the database, one is row storage and the other is column storage.

Row storage means that a row of data in the database is stored on a continuous disk block. Similarly, column storage means that a column of data is stored together.

The advantage of row storage is that single-handed records are concentrated and suitable for transactions. The disadvantage is that aggregate queries are slow. Because aggregation involves multiple rows of records, it is necessary to read data from multiple non-contiguous blocks on the disk. Multiple IO accesses result in slow query.

The advantage of column storage is that aggregation queries are fast and aggregation calculations are fast. Because its column data are put together, IO can read the data once or several times to load the data into the memory. The shortcomings are also obvious, not suitable for business. If you want to update several fields in each table, you have to read and write to the disk several times, because the data in these fields is not stored continuously. When updating, in a stand-alone environment, we can lock and use a memory barrier, but in a distributed environment, if we want to maintain consistency, it is difficult to do, and the overhead is very high.

An index is a structure that sorts the values ​​of one or more columns in a database table (such as the name column of the student table). d is independent of the structure of the table and is convenient for quickly finding and locating data. We know that binary search is very fast, but the premise is that the data is sorted. If the direct index directly uses a column of data in the table, when inserting data into the table, in order to keep the order consistent, we have to move other data, which will cost a lot.

Index data structure

Some binary search trees have high search efficiency, their shortcomings are not suitable for disk search, or the old bottleneck, because the binary search tree is too high, the higher the number of times the disk is accessed, the read, and the disk access speed (to be precise is the search Dao speed) is too slow. Therefore, improving the memory of the binary search tree and reducing the height of the tree has become a research direction, and the B-tree and the B+ tree were born.

B-tree reduces the height of the tree and stores more data in nodes. Each node of the B-tree is an index entry (for example: a combination of <order ID, serial number>). If it is a row database, it can index a record stored on disk. As shown below

 B+ tree is an improvement of B tree. It only has leaf nodes to store data, and other nodes only do queries. The advantage of this is that part of the space is sacrificed, that is, non-leaf nodes, but the efficiency of deletion and insertion is greatly improved. When deleting, often only need to delete the leaf node, which is very fast. Unlike B-trees, very complicated deformations are required. If inserted, the B+ tree also involves only one path of the tree at most. Here is a good idea of ​​trading space for time.

On the other hand, its leaf nodes are connected by a linked list, so that it can well support range search. Of course, B-Tree also supports the range very well. HashMap, hash form bar search efficiency is very high, and it does not require data sorting, but this data structure does not support range search, aggregation search, sorting and fuzzy matching.

 

MySQL supports a variety of storage engines. The InnoDB engine uses B+ trees. In some specific data, a hashmap-based engine may be added for frequently accessed row data, but these are things that MySQL does internally.

 

reference:

[0] Database index

[1] Example of database file system: What is the difference between B-tree and B+ tree in MySQL?

[2] High-performance MySQL

 

Guess you like

Origin blog.csdn.net/niu91/article/details/112298213