Database Summary (1)-Index Technology in Database-Hash Index

Original link

table of Contents

Index technology in the database-hash index


Index technology in the database-hash index

1. Hash index

The hash index is implemented based on a hash table, and only queries that exactly match all the columns of the index are valid. For each row of data, the storage engine calculates a hash code for all index columns. The hash code is a smaller value, and the calculated hash codes for rows with different key values ​​are different. The hash index stores all the hash codes in the index, and at the same time stores a pointer to each data row in the hash table.

For the same hash, a linked list is used to resolve conflicts. Similar to hashmap. Because the structure of the index is very compact, the query of the hash index is very fast.

For example:
Write picture description here

Limitations of hash index:

  • The hash index only contains the hash value and row pointer, and does not store the field value, so the value in the index cannot be used to avoid reading the row.
  • Hash index data is not stored in the order of index values, so it cannot be used for sorting.
  • The hash index also does not support partial index column matching search, because the hash index always uses the entire content of the index column to calculate the hash value.
  • Hash index only supports equivalent comparison queries, including =, IN(), <> (note that <> and <=> are different operations). Does not support any range query, such as WHERE price>100.
  • Access to hash index data is very fast, unless there are many hash collisions (different index column values ​​have the same hash value). When a hash conflict occurs, the storage engine must traverse all the row pointers in the linked list and compare row by row until it finds all rows that meet the conditions.
  • If there are many hash conflicts, some index maintenance operations will be costly. For example, if a hash index is established on a column with very low selectivity (a lot of hash conflicts), then when a row is deleted from the table, the storage engine needs to traverse each row in the linked list of the corresponding hash value, and find and Delete the reference of the corresponding row, the more conflicts, the greater the cost.

2. Adaptive hashing

In Mysql, the InnoDB engine has a special function called adaptive hash index, which will create a hash index in memory based on the B-Tree index, which gives the B-Tree index some of the advantages of the hash index .

Create a custom hash index:

If the storage engine does not support hash indexes, you can simulate creating hash indexes like InnoDB, which can enjoy the convenience of hash indexes. For example, only a small index is needed to create an index for a long key.
Idea : Create a pseudo-hash index on the basis of B-Tree. This is not the same as a real hash index, because B-Tree is still used for lookup, but it uses hash values ​​instead of the key itself for index lookup. All you need to do is to manually specify the hash function in the WHERE clause of the query.
Write picture description here
Write picture description here

Guess you like

Origin blog.csdn.net/lsx2017/article/details/113961937