Hash index and B+ tree index summary

Let's talk about Hash index first

In an ideal situation, the keys are very scattered, and if there is no Hash collision, the Hash index can uniquely determine the position of a key, and there is only one key in this position, so the search time complexity is O(1), very fast , which is the main advantage of Hash indexing. However, Hash index is not without its shortcomings. It is an ideal situation that there is no Hash collision. Usually, there is not only one key for the same Hash value, that is to say, after you find the location of its hash value according to a key, but this There are other keys at the location, so you have to find the real key from this location. As for how to find it, this is related to the specific hash collision handling method. The most commonly used extended linked list method is to place a linked list at the hash location. At this time , there is a linked list query process. If the hash collision is serious, the time complexity of the query is far more than O(1), then the advantage of the hash index is lost. Secondly, the Hash index is not sorted, so it is only suitable for equal-value queries. If you want to query data within a certain range, then the hash index is powerless. You can only check the data one by one, not just to the beginning and the end. After the data, read in place from scratch. Moreover, the hash index cannot be sorted according to the index, which is one of its shortcomings.

Let's talk about B+ number index

The B+ tree is a strictly balanced search tree. The length of the path from the root node to each leaf node is the same, and each node can have multiple child nodes (high fan-out), so it is possible to enter the possible tree The height is reduced to very low. The advantage of doing this is that it can reduce the number of times the node is read, which is why the B+ tree is very suitable for external file indexing. In the external file index, you must read a node to know the position of all its children, and reading a node corresponds to one IO, so the number of IOs for reading a leaf node is equal to the height of the tree. So the lower the height of the tree, the less IO is required. B+ tree is a search tree, all data are placed on leaf nodes, and these data are arranged in order. Therefore, in a range query, you only need to find the upper and lower bound nodes of the range, and you can get the data in the entire range, and there is another advantage. Since these data are sorted, there is no need to re-sort the data.

Summarize:

1. In the absence of hash collision, the hash index needs to be read once, and the query complexity is O(1), which is faster than the B+ tree.

2. However, the Hash index is unordered, so it is only suitable for equal-value queries, not for range queries, and naturally it is not sorted. According to the data queried by the hash index, and sort again

3. The complexity of the B+ tree index is equal to the height of the tree, generally 3-5 times of IO. However, the data on the child nodes of the B+ tree is sorted, so it can be used for range search, and the queried data is sorted and does not need to be sorted again. Queries that involve fuzzy matching like "SELECT xxx FROM TABLE1 WHERE xxx LIKE 'aaa%'" are essentially range queries.

4. Another point, in the multi-column index in the database, only B+ tree index can be used. Which node of the B+ tree the data is on depends only on the key on the leftmost column, and the nodes are sorted according to the second and third columns at a time. Therefore, the B+ tree index has the characteristics of the leftmost principle.

 

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324491088&siteId=291194637