indexed data structure(1)

1. Why use an index

If the data is stored in a data structure such as a binary tree, as shown in the following figure  

2. Indexes and their pros and cons  

2.1 Index overview

MySQL's official definition of an index is: An index is a data structure that helps MySQL efficiently obtain data.

The nature of indexes: Indexes are data structures. You can simply understand it as a "sorted fast lookup data structure" that satisfies a specific lookup algorithm. These data structures point to data in such a way that advanced search algorithms can be implemented on top of these data structures .

2.2 Advantages 

(1) Similar to the establishment of bibliographic indexes in university libraries, the efficiency of data retrieval is improved and the IO cost of the database is reduced , which is also the main reason for creating indexes.

(2) By creating a unique index, the uniqueness of each row of data in the database table can be guaranteed.

(3) In terms of realizing the referential integrity of data, the connection between tables can be accelerated. In other words, the query speed can be improved when the dependent child table and parent table are jointly queried.

(4) When using grouping and sorting clauses for data query, the time of grouping and sorting in the query can be significantly reduced, and the CPU consumption can be reduced.

2.3 Disadvantages

Adding indexes also has many disadvantages, mainly in the following aspects:

(1) It takes time to create and maintain indexes, and as the amount of data increases, the time consumed will also increase. (2) The index needs to occupy disk space. In addition to the data table occupying the data space, each index also occupies a certain amount of physical space, which is stored on the disk. If there are a large number of indexes, the index file may reach faster than the data file. Maximum file size.

(3) Although the index greatly improves the query speed, it will reduce the speed of updating the table at the same time. When adding, deleting and modifying the data in the table, the index is also dynamically maintained , which reduces the data maintenance speed. Therefore, when choosing to use an index, it is necessary to comprehensively consider the advantages and disadvantages of the index.

3. Deduction of indexes in InnoDB 

3.1 Lookup before index

Let's first look at an example of exact matching:

SELECT [列名列表] FROM 表名 WHERE 列名 = xxx;

1. Find within a page

2. Find in many pages

In the absence of an index, whether it is based on the value of the primary key column or other columns, since we cannot quickly locate the page where the record is located, we can only go down the doubly linked list from the first page, Find the specified record in each page according to our search method above. This method is obviously super time-consuming because it needs to traverse all the data pages . What if a table has 100 million records? This is where the index came into being.

 

3.2 Design Index

Create a table:

CREATE TABLE index_demo(
    c1 INT,
    c2 INT,
    c3 CHAR(1),
    PRIMARY KEY(c1)
 ) ROW_FORMAT = Compact;

This newly created index_demo table has 2 INT type columns, 1 CHAR(1) type column, and we specify the c1 column as the primary key, this table uses the Compact row format to actually store records. Here we simplify the row format diagram of the index_demo table:  

We only show these parts of the record in the diagram:  

record_type : An attribute of the record header information, indicating the type of the record, 0 means normal record, 2 means minimum record, 3 means maximum record, 1 has not been used yet , let’s talk about it below.

next_record : An attribute of the record header information, indicating the address offset of the next address relative to this record, we use arrows to indicate who the next record is.

The value of each column : only three columns in the index_demo table are recorded here, namely c1, c2 and c3.

Additional Information: All information except the above 3 types of information, including values ​​of other hidden columns and additional information recorded.

The effect of temporarily removing the other information items of the record format diagram and erecting it is this:  

A schematic diagram of putting some records on a page is: 

1. A simple index design scheme 

Why do we iterate through all the data pages when we are looking for some records based on a certain search criteria? Because the records in each page are irregular, we don't know which records our search criteria match, so we have to traverse all the data pages in turn. So what if we want to quickly locate the data pages in which the records we need to find are located? We can create a directory for quickly locating the data page where the record is located. To create this directory, the following things must be done :

The primary key value of the user record in the next data page must be greater than the primary key value of the user record in the previous page.

Create a directory entry for all pages.

So our table of contents for the top pages looks like this:

Take page 28 as an example, it corresponds to directory entry 2, and this directory entry contains the page number 28 of the page and the minimum primary key value 5 of the user record in the page. We only need to store several directory items consecutively in physical memory (for example, an array), and then we can quickly find a record based on the primary key value. For example: to find the record whose primary key value is 20, the specific search process is divided into two steps:

        1. First, according to the dichotomy method, quickly determine that the record with the primary key value of 20 is in the catalog item 3 (because 12 < 20 < 209 ), and its corresponding page is page 9.

        2. Go to page 9 to locate the specific record according to the method of finding records in the page mentioned above. At this point, the simple directory for the data page is done.

This directory has an alias called the index.

2. Index scheme in InnoDB

① Iteration 1: page of directory entry record

This is how we put the directory items used earlier into the data page:

 

As can be seen from the figure, we have allocated a new page numbered 30 to store directory entry records exclusively. Here again the directory entry record is emphasized

Differences from ordinary user records:

The record_type value of directory entry records is 1, and the record_type value of ordinary user records is 0. The catalog item record has only two columns, the primary key value and the page number, while the columns of the ordinary user record are defined by the user and may contain many columns, as well as the hidden columns added by InnoDB itself. Understand: There is also an attribute called min_rec_mask in the record header information. Only the directory entry record with the smallest primary key value in the page storing the directory entry record has the min_rec_mask value of 1, and the min_rec_mask value of other records is 0.

The same point: both use the same data page, and both generate a Page Directory (page directory) for the primary key value, so that the dichotomy method can be used to speed up the query when searching according to the primary key value. Now take the record with the primary key of 20 as an example, the steps to find a record according to a certain primary key value can be roughly divided into the following two steps:

        1. First go to the page where the directory entry record is stored, that is, page 30, quickly locate the corresponding directory entry by dichotomy. Because 12 < 20 < 209, the page where the corresponding record is located is page 9.

        2. Go to the page 9 where the user record is stored and quickly locate the user record with the primary key value of 20 according to the dichotomy method.

② Iteration 2 times: pages of multiple directory entry records 

As can be seen from the figure, we need two new data pages after inserting a user record with a primary key value of 320: Page 31 is newly created to store the user record. Because the capacity of the original page 30 for storing directory entry records is full (we assumed earlier that only 4 directory entry records can be stored), a new page 32 has to be needed to store the directory entries corresponding to page 31. Now because there is more than one page for storing directory item records, if we want to find a user record based on the primary key value, it takes roughly 3 steps. Take the record with the primary key value of 20 as an example:

1. Determine the catalog entry record page

We now have two pages for storing directory entry records, namely page 30 and page 32, and because the range of the primary key value of the directory entry represented by page 30 is [1, 320), the primary key of the directory entry represented by page 32 The value is not less than 320, so the catalog entry corresponding to the record with the primary key value of 20 is recorded on page 30.

2. Determine the page where the user record is actually located through the directory entry record page. The method of locating a directory entry record by its primary key value in a page that stores directory entry records has been described.

3. Navigate to the specific record in the page where the user record is actually stored.

 ③ Iterate 3 times: the directory page of the directory entry record page

As shown in the figure, we have generated a page 33 that stores more advanced directory items. The two records in this page represent page 30 and page 32 respectively. If the primary key value of the user record is between [1, 320), then go to page Find more detailed directory entry records in 30, and if the primary key value is not less than 320, go to page 32 to find more detailed directory entry records.

We can describe it with the following diagram :

This data structure, its name is B+ tree.  

④ B+Tree

The nodes of a B+ tree can actually be divided into several layers, and the bottom layer, that is, the layer where our user records are stored, is defined as the 0th layer, and then it is added in order. Before, we made a very extreme assumption: the page that stores user records can store up to 3 records, and the page that stores directory item records can store up to 4 records. In fact, the number of records stored in a page in the real environment is very large. Assuming that the data pages represented by all leaf nodes that store user records can store 100 user records, and the data pages represented by all inner nodes that store directory entry records can store 1000 records directory entry records, then:

If the B+ tree has only one level, that is, there is only one node for storing user records, it can store up to 100 records. If the B+ tree has 2 layers, it can store up to 1000×100=10,0000 records.

If the B+ tree has 3 layers, it can store up to 1000×1000×100=1,0000,0000 records.

If the B+ tree has 4 layers, it can store up to 1000×1000×1000×100=1000,0000,0000 records. Quite a lot of records! ! ! Can you store 100000000000 records in your table?

Therefore, in general, the B+ tree we use will not exceed 4 layers, then we only need to do a search within 4 pages at most to find a record through the primary key value (find 3 directory item pages and a user record). page), and because there is a so-called Page Directory (page directory) in each page, it is also possible to quickly locate records in the page through dichotomy.

Guess you like

Origin blog.csdn.net/m0_62436868/article/details/127096678