Index type - Hash index

I. Introduction

Earlier we briefly introduced the B-Tree index of the database . Next we introduce another index type- hash index.

2. Introduction to hash index

Hash index (hash index) is based on hash table implementation, and only queries that exactly match all columns of the index are valid. For each row of data, the storage engine will calculate a hash code for all index columns. The hash code is a smaller value, and the codes calculated for rows with different key values ​​are different. A hash index stores all the hash codes in the index and a pointer to each data row in the hash table.

In Mysql, only the Memory engine shows support for hash indexes. is its default storage engine . It is worth noting that the Memory engine supports non-unique hash indexes. If the hash values ​​of multiple columns are the same, the index will store multiple record pointers into the same hash entry in a linked list.

3. Case description

  1. Create table statement
CREATE TABLE testhash (
	fname VARCHAR ( 50 ) NOT NULL,
	Iname VARCHAR ( 50 ) NOT NULL,
	KEY USING HASH ( fname )
) ENGINE = MEMORY;
  1. insert statement
INSERT INTO `test`.`testhash` (`fname`, `Iname`) VALUES ('Aerjen', 'Lentz');
INSERT INTO `test`.`testhash` (`fname`, `Iname`) VALUES ('Baron', 'Schwartz');
INSERT INTO `test`.`testhash` (`fname`, `Iname`) VALUES ('Peter', 'Zaitsev');
INSERT INTO `test`.`testhash` (`fname`, `Iname`) VALUES ('Vadim', 'Tkachenko');
  1. Data content
SELECT * from testhash

Insert image description here

Assuming that the index uses the imaginary hash function f(), it returns the following values ​​(all are sample data, not real data)
f('Arjen') = 2323
f('Baron') = 7437
f('Peter') = 8784
f('Vadim') = 2458

The data structure of the hash index is as follows:
Insert image description here

Hash search method
Note that the numbers of each slot are sequential, but the data rows are not.

  1. Search method
SELECT Iname FROM testhash WHERE fname = 'Peter'

Mysql first calculates the hash value of 'Peter' and uses this value to find the corresponding record pointer . Because f('Peter') = 8784, Mysql searches for 8784 in the index and can find the pointer to row 3. The last step is to compare whether the value of row 3 is 'Peter' to ensure that it is the row to be found.

4. Advantages and Disadvantages of Hash Index

advantage:

  • The index itself only needs to store the corresponding hash value , so the structure of the index is very compact, which also makes hash index searches very fast.
  • Accessing hash-indexed data is very fast unless there are many hash collisions . (Different index column values ​​have the same hash value). When a hash conflict occurs, the storage engine must traverse all row pointers in the linked list and compare them row by row until the result is found.

shortcoming:

  • A hash index only contains hash values ​​and row pointers , but does not store field values , so you cannot use values ​​in the index to avoid reading rows. However, accessing rows in memory is fast , so in most cases this effect is not noticeable.
  • Hash index data is not stored in the order of index values , so it cannot be used for sorting.
  • Hash index pages do not support partial index column match searches because hash indexes always use the entire contents of the index column to calculate the hash value. For example, if a hash index is established on data column (A,B), if the query only has data column A, the index cannot be used.
  • Hash index only supports equality comparison queries , including =, IN(), <=>. It also doesn't support any range queries like WHERE price > 100

5. Hash index in InnoDB

1 Introduction

The InnoDB engine has a special feature called " Adaptive Hash Index ". When InnoDB notices that certain index values ​​are used very frequently, it will create a hash index in memory based on the B-Tree index, so that the B-Tree index also has some advantages of the hash index, such as Fast hash lookup . This is a completely automatic, internal behavior that cannot be controlled or configured by the user, but this feature can be turned off if necessary.

2. Case description

If a large number of URLs are stored in the table, and you need to search based on the URLs. If you use B-Tree to store URLs, the stored content will be very large because the URLs themselves are very long. Normally there will be the following query:

SELECT id FROM url WHERE url = 'http://www.mysql.com';

If you delete the index on the original URL column, add an indexed url_crc column, and use CRC32 for hashing, you can query using the following method:

SELECT id FROM url WHERE url = 'http://www.mysql.com'
AND url_crc=CRC32("http://www.mysql.com");

The performance of this will be very high, because the Mysql optimizer will use this highly selective and small index based on the url_crc column to complete the search. Even if there are multiple records with the same index value, the query is still very fast. Just do a quick integer comparison based on the hash value to find the index entry, and then compare one by one to return the corresponding row. This is much faster than doing B-Tree indexing directly through URL.

Guess you like

Origin blog.csdn.net/TheWindOfSon/article/details/135359887