In-depth understanding of Mysql indexes and their physical storage

In-depth understanding of Mysql indexes and their physical storage

Database basics

  • The top layer is used for connections and thread processing; the second layer contains most of the core services, including SQL parsing, analysis, optimization and caching functions. Stored procedures, triggers and views are all implemented here; The third layer is the storage engine that is actually responsible for data storage and extraction , such as: InnoDB, MyISAM, etc.
    %% Its essence should belong to the three-level model of the database. The storage engine is relative to the internal model.

data storage structure

  • Use different storage engines to store data. Most storage engines store data in binary form.
  • In the InnoDB storage engine, all data is logically stored in the table space . The table space tablespaceis the highest storage logical unit in the storage engine. Below the table space, it includes segments segment, areas extent, and pages.page
  • All table spaces in the same database instance have the same page size ; by default, the page size in the table space is 16KB . Of course, the default size can also innodb_page_sizebe modified by changing the options. It should be noted that different page sizes ultimately It will also lead to differences in zone size
  • ++One page has multiple rows, one area has many pages, one section has many areas, and one table space has many sections. ++
  • Moreover, the storage limit of pages in the area: In the InnoDB storage engine, the minimum size of an area is 1MB, and the number of pages is at least 64 , which is equivalent to a maximum page size of 16KB.
  • When data is inserted, the database will allocate new data pages to store the data . MySQL generally applies in units of extents . We build a MySQL locally and slowly write data into it. You will find that the size of the data file of this table increases step by step. ++Use area as application unit and page as data allocation storage unit++

Random and sequential reads

%% In our commonly used SQL understanding, data is read in the form of rows. In fact, this is not the case. Through the above structure, we can understand that the unit of a single read from the disk is a page, not a row , that is Say, even if only one row of records is read, one page will be read from the disk . The cost of reading a single page is high, so pre-reading is generally performed :: The theoretical basis is the famous locality principle in computer science: when a piece of data is used, the nearby data will usually be used immediately::

  • The data required during program execution is usually concentrated. Disk sequential reads are very efficient不需要寻道时间,只需很少的旋转时间
  • For programs with locality, pre-reading can improve I/O efficiency. The length of pre-reading is generally an pageintegral multiple of the page.

Data reading process

  • One of the most important goals of a relational database management system is to ensure that the data in the table or index is available at any time. In order to achieve this goal as much as possible, in-memory buffer pools are used to minimize disk activity :: Each buffer pool is large enough to store many pages, possibly thousands of pages::

  • The buffer pool manager will try its best to ensure that frequently used data is kept in the pool to avoid unnecessary disk reads. If an index or table page is found in the buffer pool, it will be processed quickly.

  • ::If no data is found in the memory buffer pool, it will be read from the disk server's buffer::The disk server's buffer, just like the database's buffer pool reading, the disk server attempts to retain frequently used data in memory to reduce high disk read costs. This reading cost will be about 1ms

    • ++Will there be many duplicate values ​​between the two? Consider whether there is any relationship between the memory buffer pool of the DBMS and the buffer of the disk server? ++
  • If the data is still not found in the buffer pool of the disk server, it must be read from the disk . At this time, reading is divided into random reading and sequential reading.

Random I/O

%% Random IO: The read and write operation times are continuous, but the access addresses are not continuous and are randomly distributed in the address space of the disk.

  • A page contains multiple records. We may need all the rows on the page, or part of them, but the time and cost are the same . Reading a page involves a random I/O, which takes about 10ms, which is a long time! ::The premise of this place should be that we need to load multiple pages into memory, but the pages are not consecutive::

sequential reading

  • Mechanisms such as read-ahead and caching will have better effects on sequential IO . If we read multiple pages into the buffer pool and process them sequentially, the read speed becomes very fast.
  • Sequential reads are fast and support read-ahead , and require no seek time and only a small amount of spin time

index

%% An index is a decentralized storage data structure created to speed up the retrieval of data rows in a table . One step closer to understanding: the data is stored in the table space, and the index is just a storage structure used to speed up retrieval. A certain amount of data is stored in it. If the index is hit, the data in the index will be quickly retrieved. Return, if there is no hit, then the target data will be fetched from the table space.

  • Generally speaking, the index itself is also very large and cannot be stored entirely in memory, so the index is often stored on disk in the form of an index file . Disk I/O consumption is generated during the index search process. Compared with memory access, the consumption of I/O access is several orders of magnitude higher.

    • Therefore, the most important indicator to evaluate the quality of a data structure as an index is the asymptotic complexity of the number of disk I/O operations during the search process . In other words, the structural organization of the index should minimize the disk I/O access during the search process. frequency
  • Designers of file systems and database systems take advantage of the disk read-ahead principle and set the size of a node to be equal to one page (page) . The size of each page is about 4kb or 8kb, so that each node only needs one I/O to complete the operation. Load . In order to achieve this goal, each time B+Tree creates a new node, it directly applies for a page of space:: You should first apply for the extent area, and then apply for allocation of a page::

    • This ensures that a node is physically stored in a page. In addition, computer storage allocation is page-aligned, which means that only one I/O is required for a node.
  • The height of the tree determines the number of I/Os. The lower the tree height, the better the I/O performance;

%% In other words, because disk operations consume time and resources. So how to improve efficiency, that is, how to avoid too frequent disk searches? The data structures stored in databases are now generally trees. The number of disk searches and accesses is often determined by the height of the tree. Therefore, we need a tree structure with a lower height.

The bottleneck of database query data is disk IO . If we use AVL tree or red-black tree , each tree node only stores one data . We can only take out the data on one node and load it into the memory with one disk IO . But as mentioned before: the time consumed to read 1B data and 1KB data from the disk is basically the same. ::This is a major feature of disk I/O::According to this idea, we can store as much data as possible on a tree node , and load more data into the memory in one disk IO. This is the B tree, B+ tree The design principle of the. ::Make the capacity of each node exactly or close to the size of a disk block (data page, usually 16k)::
The characteristics of the tree structure determine the way of traversing the data itself, which naturally supports query by interval.

B-tree

B-tree

  • 17 and 35 are the keyword nodes on the first level. There are three paths: P1, P2, and P3. If the node is smaller than the node, search on the left. If the node is larger, search on the right. If the node is larger, search in the middle.

  • Generally speaking, the number of paths = the number of keywords + 1 :: As shown in the figure, two keywords can have up to three paths::

    • The keywords stored in each node have a certain total capacity, so the field name of the index affects the number of words used to store the keywords.
    • If the index field name of the index key node (each node can store 4kb data) is shorter, the number of keys stored in each node will increase, and the number of paths will also increase, which will lead to a lower level of the tree . The greater the chance that data can be found during I/O times, which reduces the number of I/O searches overall and improves efficiency.
  • Characteristics of B-tree

    • It is no longer a binary search, but an m-fork search. The height can be greatly reduced compared to the binary tree.
    • Leaf nodes and non-leaf nodes all store data. Each node can store j records. If the node size is set to the page size, such as 4K, the read-ahead feature can be fully utilized and disk IO can be greatly reduced ::Nodes can The storage considered as a page can be any multiple of the page, but if it is the same size as the page, then pre-reading can be done by node, which is very convenient and efficient::
    • If traversal of the B-tree is required, it is best to use in-order traversal to obtain all nodes.
      • Because due to the characteristics of B-tree, in-order traversal will access all key values ​​in increasing order of key values.
    • However, there is only data between sibling nodes and lack of pointers, so the efficiency of in-order traversal is not high.

B+tree

Features
  • Non-leaf nodes no longer store data, data is only stored on leaf nodes of the same layer;
  • Pointers are added between the leaves so that the leaf nodes form a linked list. Get all nodes, no more in-order traversal required

These improvements give B+ trees better properties than B trees:

  • Range search, after locating min and max , the middle leaf node is the result set , without in-order backtracking; range queries are used a lot in SQL, which is the biggest advantage of B+ trees over B trees

    • In a B-tree, there are no direct links between sibling nodes, while in a B+ tree, all leaf nodes are connected to each other through pointers, forming an ordered linked list. In this way, for queries that need to operate on data within a certain range, B+ trees are more efficient than B trees. For example, if you want to get all the data with key values ​​in the range [10, 20], using B+ tree, you only need to find the first key greater than or equal to 10, and then traverse sequentially along the linked list of leaf nodes. However, with a B-tree, you may need to traverse multiple branches in the tree.
    • This is why B+ trees are more commonly used in many database systems and file systems. Because in these systems, operations such as range queries often need to be processed, the design of the B+ tree can better meet such needs.
    • It’s almost the same as Clickhouse! CK actually increases the professionalism of PK, but the positioning of min and max is still very similar to mysql!
  • Jump table: An ordered linked list (dynamic data structure) that supports binary search. Its biggest feature is that it supports adding index layers and can support interval queries. With each added layer of indexes, query efficiency will increase. The time complexity of search, insertion, and deletion is all *O(logn)* level

  • The B+ tree is a bit like the previous jump table . The bottom layer of the data is data, and the upper layer is an index layer composed of the underlying intervals. However, unlike the jump table, which expands vertically and continuously increases the depth of the index , it is a "jump" that expands the index horizontally. Table", only the number of indexes is increased, and the width of the horizontal arrangement is increased. The advantage of this is that it reduces disk IO operations and improves the efficiency of range searches.

    • Jump tables continuously build index layers on top of the data layer to improve search efficiency step by step, but the number of layers will increase and I/O disk operations will inevitably increase.
    • After the B+ tree has established the index layer and pointer layer, the inserted data will horizontally expand the width of the leaf nodes to ensure that the number of layers is still short.
    • In the B+ tree, when data is inserted, the index structure is also expanded horizontally. An important feature of the B+ tree is that its nodes can hold multiple elements. Therefore, when you insert data, if the current leaf node still has enough space, the new data will be inserted into this node. This is what you call horizontal expansion.
    • However, if the current leaf node is full, then an operation called splitting occurs . During the splitting process, the current node will be split into two new nodes, and a new index key will be added to their parent node ++ This is actually a vertical expansion! ++. In this way, the B+ tree can maintain its balanced characteristics, that is, all leaf nodes are at the same level. Therefore, although the B+ tree may expand vertically during the process of inserting data, this expansion is limited , and the height of the B+ tree is usually kept at a very low level, which makes the search efficiency of the B+ tree very high. .
  • redisThe bottom layer of the ordered set zset data structure adopts the skip list principle.

    • The reason why skip tables are used instead of B+ trees is: redis is an in-memory database , and B+ trees are purely prepared for IO databases such as mysql. Each node of the B+ tree is a mysql partition page (Alibaba interview)

Self-balancing :

  • Self-balancing means that when inserting and deleting operations, the data structure can automatically adjust its internal layout to ensure that all query operations have logarithmic time complexity in the worst case. Self-balancing is an important feature of many advanced data structures (such as B+ trees, skip tables, AVL trees, red-black trees, etc.). It can ensure that the data structure can provide good query efficiency under any circumstances.

When to use skip tables and when to use B+ trees :
Both skip lists and B+ trees are excellent index data structures. Which one you choose depends on your specific needs and environment. Here are some factors that may influence your choice:

  1. Data size and memory limitations : If your data set is relatively small and can be completely stored in memory, then a skip table may be a good choice because its implementation is simpler than a B+ tree and insertion, deletion, and search operations are very efficient. . But if your data set is large and needs to be stored on disk, then B+ trees may be a better choice because B+ trees are optimized for disk storage and I/O operations.
  2. Concurrency control : In a multi-threaded environment, skip lists are usually easier to control concurrency. Because the insertion and deletion operations of the skip table only need to modify the pointers of a few nodes, it is easier to implement finer-grained locks, which can reduce the possibility of lock conflicts and improve concurrency performance.
  3. Query type : If your application mainly performs range queries (for example, finding all data whose values ​​are within a certain range), then the B+ tree may be a better choice, because there are pointer connections between the leaf nodes of the B+ tree. Very convenient for range queries. Although skip tables can also perform range queries, their efficiency is usually low.
  4. Implementation complexity : Implementation of skip lists is usually simpler and more straightforward. If you need to implement these data structures yourself, a skip list may be a better choice.
  5. Frequent insertion and deletion operations : If your application requires frequent insertion and deletion of data, then a skip table may have better performance because its insertion and deletion operations only require modifying the pointers of a few nodes.
    Overall, you need to choose the data structure that best suits your specific needs and environment. Jump tables and B+ trees each have their own advantages and disadvantages, and you need to choose based on your actual situation.
Understanding of two-tree leaf nodes and non-leaf nodes
  • Leaf nodes store actual record rows , and record rows are stored relatively compactly, which is suitable for disk storage of large amounts of data; non-leaf nodes store the PK of records , which is used for query acceleration and is suitable for memory storage; ++ is suitable for those who don’t understand well and the index file is not Does it just exist on the disk? ++
  • Non-leaf nodes do not store actual records, but only store keywords (KEY) and the next-level index. So with the same memory, the B+ tree can store more indexes :: The nodes of the B-tree not only store key values, but also To store data::
    • Therefore, when we need to traverse all the keywords of the B+ tree in order, we can directly start traversing from the leaf node of the smallest keyword. Find the first leaf node through the head node, and then traverse the linked list of leaf nodes to complete the interval query.
    • The non-leaf nodes of the B+ tree do not store the Data field, so it can store more keywords and pointers to the next level nodes :: Have you forgotten what the B-tree in Mongodb looks like? Nodes must have keywords and store data. :: The non-leaf nodes in the B+ tree all store keywords and pointers, which will divide more detailed data ranges, reducing the range division of one layer of trees after another. Therefore, with the same amount of data, the B+ tree will be better than Tree B is chunkier
    • Because the B+ tree is shorter than the B tree, the B+ tree performs fewer I/Os. Because we need to perform an I/O to read a node at each layer along the way , the B+ tree will be shorter, the number of paths will be fewer, and the number of I/Os will also be fewer.

Clustered (primary key) index and non-clustered (secondary) index

  • Clustered index : The physical order of data row storage is the same as the logical order of clustered index values ​​(the reason for fast query efficiency). The physical storage order of data is actually stored in the order of the size of the primary key value by default (record storage of data pages) Way)

    • For example, data with primary key values ​​1, 3, 5, 7, and 9 has been inserted into the data table (the storage order is in the order of primary key value). At this time, I want to insert a data record with primary key value 4. At this time, The storage order of data on the disk becomes 1, 3, 4, 5, 7, 9
    • This is also the reason why clustered index modifications are slow . Inserting data may cause data to move, and the movement of data may affect the physical storage order of data. Data pages need to be reordered, which is very expensive, so the primary key is generally incremented.
      • But clustered index queries are fast!
    • Innodb uses clustered indexes, data and indexes are clustered , and the Data field stores the data itself. Indexes are also data ::? Indexes are data structures? ::Data and index exist in an XX.IDB file, so it is also called clustered index
  • A clustered index can be created on any field you want to create. This is theoretically speaking. In practice, you cannot specify it arbitrarily, otherwise it will be a nightmare in terms of performance.

    • InnoDB must have at least one clustered index, but can have no primary key
    • If you create a clustered index without using the UNIQUE attribute , the database engine automatically adds a four-byte uniqueifier column to the table. If necessary, the database engine will automatically add a uniqueifier value to the row to make each key unique and non-null. This column and column values ​​are for internal use and cannot be viewed or accessed by users
  • If you want to query the credits and names of students with credits between 60-90, is it optimal to create a clustered index on the credits?

    • Answer: No. Since only two columns are output, we can create a joint non-clustered index on the credits and student names. At this time, the index forms a covering index, that is, the content stored in the index is the final output data. This kind of index is better than using credits as Clustered index has better query performance

Introduced in MySQL 5.6 索引下推优化(index condition pushdown), during the index traversal process, if a judgment statement for the index appears in where, then the fields included in the index will be judged first, and records that do not meet the conditions will be directly filtered out, reducing Number of return times . First filter the records that do not meet the conditions, and then return to the table. Of course, if there are fewer judgment statements, the performance improvement will be much smaller.

%% Before understanding the concept of table return, we need to understand some basic knowledge about database indexes. In a relational database, an index is a data structure used to quickly find data. A table can have multiple indexes, each index is established for one or more fields in the table. What is saved in the index is not the complete row data, but only the value of the index field and a pointer. This pointer points to the place where the complete row data is stored (page pointer, not the physical location on the disk) %%

%% When we perform a query based on a certain condition, the database will first search in the corresponding index to find the index item that meets the condition, and then return to the table to retrieve the complete row data based on the pointer in the index item. This is from the index. The process of getting to the table is called returning the table . This process may involve disk I/O, because the table data may not be completely placed in the memory, but needs to be read from the disk, which is a relatively slow process. operate. %%

%% Generally speaking, a query may involve multiple table returns. For example, if you want to find information about everyone between the ages of 20 and 30, the database will find all index items that meet the conditions in the age index. Every time one is found, you need to go back to the table and retrieve the complete row data. %%

If the index contains all the fields required for the query, this type of index is called a covering index . Querying using this type of index does not require a table return, because all the required data can be obtained directly from the index. However, covering indexes cannot be used in all cases because they require the index to contain all query fields, which may cause the index to become very large, occupy a lot of storage space, and also reduce the performance of write operations. Therefore, when designing indexes, tradeoffs need to be made based on specific query needs and performance requirements.

  • Non-clustered index : Create a new B+ tree using the non-clustered index value. For example, the primary key is C1, and we often use field C2 as the condition for search. Then we can create a non-clustered index on C2 and use the C2 value as the key. A B+ tree. When searching with C2 as the condition, the C2 value is used to search on the B+ tree. After the search is found, if the field to be selected exists in the leaf node, the value will be returned (called index coverage). If not, You need to perform table return operation
    • Since covering indexes can reduce the number of tree searches and significantly improve query performance, using covering indexes is a common performance optimization method.
    • Table return: The leaf nodes of the primary key index store the entire row of data , and the leaf nodes of the non-primary key index store the value of the primary key and the data of the corresponding field.
      • In other words, the non-clustered index can definitely find the primary key value? yes, the answer is above
    • The leaf layer of a non-clustered index does not overlap with the actual data page. The leaf layer contains a pointer to the data page of the record in the table. Non-clustered indexes have many levels and will not cause data reordering.
    • An index is an index and data is data. The index is placed in the XX.MYI file, and the data is placed in the XX.MYD file, so it is also called a non-clustered index.
  • The leaf node of the clustered index is the final data node, while the leaf node of the non-clustered index is still the index node , but it has a pointer to the final data.

Guess you like

Origin blog.csdn.net/qq_65052774/article/details/135051320