Summary of MySQL Index Interview Questions

table of Contents

1 Introduction

2. Index data structure classification

3. Binary search tree

4. Red-black tree (self-balancing binary search tree)

5.B-Tree

6.B+Tree

     6.1 Overview and characteristics of B+Tree

     6.2 B+Tree storage data example

     6.3 Implementation of MyISAM storage engine index

     6.4 InnoDB underlying storage engine index implementation

7. Index related interview questions analysis

     7.1 What is an index

     7.2 Classification of Indexes

     7.3 Advantages of Indexes

     7.4 Disadvantages of indexing

     7.5 Under what circumstances are suitable for indexing

     7.6 Under what circumstances is not suitable for indexing

     7.7 When will index failure occur?

     7.8 Why is it recommended that InnoDB must create a primary key?

     7.9 Why is it recommended to use an integer primary key

     7.10 Why is it recommended to use an auto-incrementing primary key?

     7.11 Why InnoDB non-primary key indexes store primary key values

     7.12 Why use B+Tree instead of B-Tree

8. Index mind map


1 Introduction

       The author originally thought that the knowledge points of the database would be good to memorize before the interview, but in fact, this is not feasible at all. One is easy to forget, and the other is not flexible at all, and the interviewer is stunned to change the question. Most of the length of this article is about the underlying implementation of the index. The interview questions are placed at the end. If you can read through the previous low-level implementations, the interview questions are not a problem at all. Of course, those who want to directly look at the interview questions can directly move to the table of contents. Seven points: Index the analysis of the interview questions, and finally attach the index brain map compiled by yourself.

2. Index data structure classification

       The underlying data structure of the index is divided into many categories after consulting the data. There are five kinds of tree-like structures, namely binary search tree (Binary Search Tree), self-balancing binary search tree (Self-balancing Binary Search Tree). ), B-Tree, Trie-Tree, Spatial Data Partitioning Tree. The details are shown in the figure below:

category Tree name
Binary search tree (Binary Search Tree) Binary search tree, Cartesian tree, T tree
Self-balancing binary search tree (Self-balancing Binary Search Tree) AA tree, AVL tree, Red-Black Tree, Splay Tree
B -Tree 2-3 tree, 2-3-4 tree, B tree, B+ tree, B* tree
Trie (Trie-Tree) Suffix tree, radix tree, trigeminal search tree, fast prefix tree
Spatial data partitioning tree (Spatial Data Partitioning Tree) R tree, R+ tree, R* tree, line segment tree, priority R tree

       In this article, we mainly analyze the knowledge of MySQL index, including the binary tree and B-Tree used in the earliest version , the red-black tree that is always asked in the interview , and the B+Tree used in the current version .

3. Binary search tree

       Binary search tree: It is a collection of n finite elements. The collection is either empty or consists of an element called the root and two disjoint binary trees called the left subtree and the right subtree. Composition is a binary search tree with good search performance, which is equivalent to binary search. Regarding the reason why MySQL does not use a binary tree, let’s look at the next picture. There are seven data from 1 to 7. If the insertion order is 4, 3, 5, 2, 6, 1, 7, the result is a sort of height 4 tree. With data structure visualization website address , friends who want to try it on their own can click in and try.

       But if the data insertion sequence becomes 1, 2, 3, 4, 5, 6, and 7, the result will be reduced to a linked list with 7 nodes

       Problems with binary trees : The depth of the binary tree depends on the order of element insertion, and when the inserted data is relatively large, the depth of the tree will be relatively high. The time of data query mainly depends on the number of disk IO. The greater the depth of the binary tree, the more the number of searches, the worse the performance, and the worst case is that it degenerates into a linked list.

4. Red-black tree ( self-balancing binary search tree )

       Red-black tree: Red-black tree is a specialized AVL tree (balanced binary tree), which maintains the balance of the binary search tree through specific operations during insertion and deletion, so as to obtain higher search performance. Although it is complicated, its worst-case running time is also very good, and it is efficient in practice: it can do search, insert and delete in O(log n) time, where n is in the tree The number of elements. Features: The absolute value of the height difference between the left and right subtrees does not exceed 1, and the left and right subtrees are both a balanced binary tree. The red-black tree inserts 1, 2, 3, 4, 5, 6, 7, and seven element GIF diagrams:

        Finding element 7 process: 

       The problem with red-black trees: try not to let the single side of a tree become too long and degenerate into a linked list, which can effectively reduce the height, and the smaller height reduces the number of I/O lookups, and the performance is better than binary trees. However, a node of a red-black tree can only have two child nodes. Although the problem of linked list degradation is balanced, the overall height is still too high.

5.B-Tree

       B-Tree: The B-tree is simply a multi-fork tree. Each leaf stores data and a pointer to the next node. B-Tree inserts 1, 2, 3, 4, 5, 6, 7, seven element GIF pictures:

       Finding element 7 process: 

Features of B-Tree

1) The leaf nodes have the same depth

2) All index elements are not repeated

3) The data index in the node increases from left to right

       From the query process, we can see that the query efficiency of B-Tree does not seem to be higher than that of the balanced binary tree, but the number of nodes passed by the query is one less than that of the binary tree, which means many fewer times when the amount of data is large. Disk IO, which greatly improves performance. In the previous figure of B-Tree operation, we can see that the elements are values ​​like 1, 2, and 3, but the data in the database is a piece of data. If a database stores data in a B-Tree data structure , How is the data stored?

       The specific storage is as follows:

       In database storage, we split the element part into the form of key-data, where key is the primary key of the data, and data is the specific data.

       Problems with B-Tree : B-Tree stores the index and data at each node, which leads to the need to load the index and data into the memory when searching. This is not very cost-effective. The memory resources are so precious, so store more Wouldn't the index be better.

6.B+Tree

     6.1 Overview and characteristics of B+Tree

       B+Tree: An optimization based on B-Tree, making it more suitable for implementing external storage index structure

Features of B+Tree:

1) Non-leaf nodes do not store data, only indexes (redundancy), which can ensure that more indexes are stored

2) Leaf nodes store all index fields

3) Leaf nodes are connected with pointers to improve interval access performance

     6.2 B+Tree storage data example

B+Tree storage data sample diagram:

     6.3 Implementation of MyISAM storage engine index

select * from tablename where id = 20 MyISAM query process: first go to the MYI file to query the index data, locate the leaf node with id = 20, get the data file address "0x6A" corresponding to 20, and then search from the MYD file, According to this file address, locate the line record with specific id = 20.

     6.4 InnoDB underlying storage engine index implementation

       Note: InnoDB's primary key index is a clustered index (the index and data are stored together), and the data stored in the index position of a secondary index is the value of the primary key ID.

select * from tablename where id = 20 InnoDB primary key index query process: query the index with id = 20 directly from the IDB file to directly obtain the current row data.

select * from tablename where name ='Alice' InnoDB non-primary key index query process: directly query the index of name = Alice from the IDB file, get the primary key id = 18, and then search for the data with id 18 from the current file.

7. Index related interview questions analysis

     7.1 What is an index

       In addition to data, the database also maintains data structures that meet specific search algorithms. These data structures refer to the data in a certain way, so that efficient search can be achieved on these data structures. These data structures are indexes.

     7.2 Classification of Indexes

       Single-valued index: an index contains only a single column, and there can be multiple single-valued indexes in a table

       Unique index: the value of the index column must be unique and can be empty

       Composite index: an index includes multiple columns

     7.3 Advantages of Indexes

       1) Improve data retrieval efficiency and reduce disk IO costs

       2) Reduce the cost of sorting by sorting the data

     7.4 Disadvantages of indexing

       1) Although the index improves the query efficiency, it also reduces the efficiency of updates, modifications, and deletions, because MySQL not only saves the data, but also maintains the relationship between the data and the index.

       2) The cost is required to maintain the index. A well-performing index requires constant attempts to find the optimal solution.

     7.5 Under what circumstances are suitable for indexing

       1) The primary key automatically creates a unique index

       2) Fields frequently used as query conditions (fields after where)

       3) Fields associated with other tables in the query (fields after various join on)

       4) Single value/composite index selection? (High concurrency tends to choose composite index)

       5) The sorted fields in the query

       6) Statistics or grouping fields in the query

     7.6 Under what circumstances is not suitable for indexing

       1) Too little table data

       2) Frequently updated fields

       3) Fields not used after where

     7.7 When will index failure occur?

       1) Like beginning with a wildcard ('%abc') will cause the index to become invalid, violating the leftmost prefix rule

       2) Doing any operation (calculation, function, type conversion) on the index column will cause the index to fail and turn to a full table scan

       3) The storage engine cannot use the column on the right of the range condition in the index, for example: select id, name from student where id> 50 and name ='Zhang San', which will cause the name index to fail

       4) Try to use covering index, don't select *

       5) MySQL cannot use the index when it is not equal to (!= or <>), which will result in a full table scan. The reason is also very simple. The B+Tree leaf nodes are connected by pointers and are sorted. This data structure only Can solve the orderly fixed value query, like does not mean that this kind of index query can not be used.

       6) IS NULL and IS NOT NULL cannot use indexes for the same reason as above

       7) The index of the string without single quotes is invalid

       8) The index will fail when connected with or

     7.8 Why is it recommended that InnoDB must create a primary key?

       For InnoDB, if the primary key index is not manually built, the underlying MySQL will still help us create a clustered index to maintain all the data of the entire table, because B+Tree must rely on the index to build. Why is it recommended that InnoDB must build a primary key? Because the resources of our own database are very precious, we should not trouble MySQL to help us maintain what we can do manually. To put it bluntly, it is to reduce database overhead.

     7.9  Why is it recommended to use an integer primary key

       Let's take UUID as an example. A large string of very long but meaningless strings. Looking back at the index graph of InnoDB above, is it faster to compare two int data or to compare two strings? Without even thinking about it, it is definitely more advantageous to compare two int types. Strings need to be compared bit by bit. If it happens that only the last digit of the two strings is inconsistent, it is not a bad thing.

     7.10  Why is it recommended to use an auto-incrementing primary key

       The third feature of B+Tree above: the leaf nodes are connected with pointers to improve the performance of interval access. This brings a benefit that is range search, such as a line of SQL: select * from tablename where id between 1 and 20, MySQL only needs to find the position where the index is equal to 1, and then find the position of 20 in turn through the linked list, the first and the last position In between is the result set we need to find. But this also brings a problem, adding that our primary key has been inserted 1, 2, 3, 4, 6, 7, and at this time we have inserted 5. MySQL will break the original linked list order when maintaining the index, resulting in the linked list Nodes are split and rearranged, which consumes performance.

     7.11 Why InnoDB non-primary key indexes store primary key values

       To maintain consistency, when the database table performs DML operations, the page address of the same row of records will change, because the non-primary key index saves the value of the primary key, and there is no need to change it. At the same time, it can also save storage space, because Innodb data itself has been aggregated on the B+ tree where the primary key index is located. If the ordinary index continues to store another copy of data, it will result in as many copies of data as there are indexes.

     7.12 Why use B+Tree instead of B-Tree

       B+Tree stores all data in leaf nodes, and all non-leaf nodes are used to store redundant indexes. This ensures that non-leaf nodes can store more indexes, because it is non-leaf nodes that determine the height of B+Tree. If non-leaf nodes can store more values, the overall height of the tree will decrease, thereby reducing the number of disk IOs and reducing system consumption.   

8. Index mind map

 

This article is organized from:   https://www.bilibili.com/video/BV1xh411Z79d?p=3

                                     https://www.jianshu.com/p/ac12d2c83708

                  https://www.cnblogs.com/guokaifeng/p/11272896.html

 https://www.cnblogs.com/boothsun/p/8970952.html#autoid-6-0-0

Guess you like

Origin blog.csdn.net/qq_36756682/article/details/114483362