Understanding of the index and index data structure in the database

An index is like the catalog of a book or a thesis. Through the catalog, you can quickly locate a certain chapter, which speeds up the search efficiency and reduces insertion and deletion operations. Now that you know what the index is for, what is the underlying data structure of the index???

1 Brief introduction

  If there is no index in the database, the entire table will be traversed when searching; if the search is for a sequential table, because the sequential table is in memory, the access speed is relatively fast, and the data is not so much, the efficiency is still acceptable; but if the sequential search is performed on the database, the data in the database is stored on the disk, the disk access speed will be very slow, and the amount of data will also be large. At this time, the efficiency of the search is very low. The index is to avoid the sequential search of the database and improve the search efficiency .


2 Data structure considered by index

  Regarding the data structure of the index, the first thing that comes to mind is a binary search tree or a hash table. If the binary search tree is relatively balanced, then the search efficiency is O(logN); if it is a hash table, the search efficiency is O(1);

In fact, the database index can also consider using a hash table, but there will be problems. After all, it is a one-to-one relationship and can only handle equal situations. If id > 3 and id < 9 are encountered, the hash table is not applicable at this time.

  The internal elements of the binary search tree are in order (in-order traversal), for example, to find the situation where id > 3 and id < 9, the specific process: first find the element with id 9, and then find the element with id 3, the result between 3 and 9 in the in-order traversal is the desired result, and the efficiency is O(N); compared with the hash table, although the binary search tree can handle range search, the processing efficiency is not high:

  • Each node of the binary search tree has at most two forks. When the amount of data is relatively large, the height of the tree will be very high, and the final efficiency will be very low;

  • It is not very efficient to directly obtain the in-order traversal of the binary search tree, which is O(N).

Therefore, the real index structure is an N-ary search tree, that is, a B+ tree. Compared with the B tree, two main functions have changed:

  • The elements of each layer are linked together;
  • The data is only saved on the leaf nodes, and only some auxiliary search boundary information is saved on the non-leaf nodes (that is to say, the non-leaf nodes only save the id, which helps to quickly find the node corresponding to the desired id).

The speed of querying any record for an N-fork search tree is relatively average, there will be no large difference in efficiency, and no additional in-order traversal is required; and the leaf nodes are placed on the disk, and the non-leaf nodes are placed in memory (both leaf nodes and non-leaf nodes of the B tree are placed on the disk to store data), so that the search efficiency will be higher, and the number of times to read the disk will be reduced. After all, the actual overhead occupied by the index in memory is not very high .

Guess you like

Origin blog.csdn.net/Onion_521257/article/details/130166326