The difference and application of several index types in Mysql

As we all know, Mysql currently mainly has the following index types: FULLTEXT, HASH, BTREE, RTREE.

So, what are the functional and performance differences between these indexes?

FULLTEXT

That is, full-text indexing, currently only supported by the MyISAM engine. It can be used in CREATE TABLE, ALTER TABLE, and CREATE INDEX, but currently only CHAR, VARCHAR, and TEXT columns can create full-text indexes. It is worth mentioning that when the amount of data is large, now put the data into a table without a global index, and then use CREATE INDEX to create a FULLTEXT index, rather than creating a FULLTEXT for a table and then writing the data. is much faster.

The full-text index was not born with MyISAM. It appeared to solve the problem of low efficiency of fuzzy queries against text such as WHERE name LIKE "%word%". Before there is no full-text index, such a query statement needs to traverse the data table. It can be seen that it is extremely time-consuming when the amount of data is large. If there is no asynchronous IO processing, the process will be held hostage, which is a waste of time. Of course, There is no further explanation of asynchronous IO here. If you want to know about the children's shoes, you can do it yourself.

The use of full-text indexing is not complicated:

创建ALTER TABLE table ADD INDEX `FULLINDEX` USING FULLTEXT(`cname1`[,cname2…]);

使用SELECT * FROM table WHERE MATCH(cname1[,cname2…]) AGAINST ('word' MODE );

Among them, MODE is the search mode (IN BOOLEAN MODE, IN NATURAL LANGUAGE MODE, IN NATURAL LANGUAGE MODE WITH QUERY EXPANSION / WITH QUERY EXPANSION).

Regarding these three search methods, Yu An will not explain much here. Simply put, the Boolean mode allows some special characters in the word to be used to mark some specific requirements. For example, + means must have, - means There must be no, * means a general match character, do you think of a regular, similar; natural language mode is simple word matching; natural language mode with expressions is to use natural language mode to process first, and then the returned results are processed. Do expression matching.

Students who have a little understanding of search engines must know the concept of word segmentation . The FULLTEXT index is also indexed according to the principle of word segmentation. In Western languages, most of them are alphabetic characters, and word segmentation can be easily divided according to spaces. But obviously, Chinese cannot do word segmentation in this way. What then? This will introduce a Mysql Chinese word segmentation plugin Mysqlcft . With it, you can segment Chinese. Students who want to know more, please move to Mysqlcft . Of course, there are other word segmentation plugins that can be used.

HASH

The word Hash , it can be said, has been seen and used continuously since the day we started coding. In fact, hash is a key-value pair in the form of (key=>value), such as function mapping in mathematics, which allows multiple keys to correspond to the same value, but does not allow one key to correspond to multiple values. It is precisely because of this feature that hash is very suitable for indexing. To establish a hash index for a certain column or columns, a hash value will be calculated by a certain algorithm using the value of this column or columns, corresponding to one or more rows of data (here It is conceptually different from function mapping, not to be confused). In the java language, each class has its own hashcode() method, and those not explicitly defined are inherited from the object class. This method makes each object unique. In the equal comparison between objects and serialization transmission played a very important role. There are many ways to generate hash, which is enough to ensure the uniqueness of the hash code. For example, in MongoDB, each document has a unique objectID (including timestamp, host hash value, process PID, and auto ID) generated for it by the system. Increase ID) is also a hash performance. Um, I seem to have gone too far -_-!

Since the hash index can be located at one time, it does not need to be searched layer by layer like a tree index, so it has extremely high efficiency. So why do you need other tree indexes?

Yu An will not summarize it by himself here. Quoting the articles of other great gods in Xiayuanzi: The difference between MySQL's btree index and hash index from the road of 14

(1) Hash index can only satisfy "=", "IN" and "<=>" queries, and cannot use range queries.
Since the Hash index compares the Hash value after the Hash operation, it can only be used for equal-value filtering, not for range-based filtering, because the size relationship of the Hash value processed by the corresponding Hash algorithm cannot be used. Guaranteed to be exactly the same as before the Hash operation.
(2) Hash indexes cannot be used to avoid data sorting operations.
Since the Hash index stores the Hash value after Hash calculation, and the size relationship of the Hash value is not necessarily exactly the same as the key value before the Hash operation, the database cannot use the index data to avoid any sorting operation;
(3) Hash indexes cannot be queried with partial index keys.
For a composite index, the Hash index calculates the Hash value after merging the composite index keys when calculating the Hash value, instead of calculating the Hash value separately. Therefore, when querying through the first one or several index keys of the composite index, the Hash index also cannot be used.
(4) Hash indexes cannot avoid table scans at any time.
As we already know, the Hash index is to store the Hash value of the Hash operation result and the corresponding row pointer information in a Hash table after the index key is subjected to the Hash operation. The number of records of data with a Hash key value cannot be directly queried from the Hash index, but it is still necessary to compare the actual data in the table and obtain the corresponding result.
(5) The performance of Hash index is not necessarily higher than that of B-Tree index when a large number of Hash values ​​are equal.
For index keys with low selectivity, if a Hash index is created, there will be a large number of record pointer information associated with the same Hash value. In this way, it will be very troublesome to locate a certain record, which will waste multiple accesses of table data, resulting in low overall performance.

 

Yu'an, let me add a little bit, talk about the process of HASH indexing, and explain the above items 4 and 5 by the way:

When we create a hash index for a column or columns (currently only the MEMORY engine explicitly supports this kind of index), a file similar to the following will be generated on the hard disk:

hash value  storage address    
1db54bc745a1 77#45b5 
4bca452157d4 76#4556,77#45cc…

The hash value is calculated from the specified column data through a specific algorithm, and the disk address is the address of the data row stored on the hard disk (it may also be other storage addresses, in fact, MEMORY will import the hash table into the memory).

In this way, when we perform WHERE age = 18, we will calculate a hash value of 18 through the same algorithm ==> find the corresponding storage address in the hash table ==> obtain data according to the storage address .

Therefore, each time a query is made, the hash table must be traversed until the corresponding hash value is found, such as (4). After the amount of data is large, the hash table will also become larger, the performance will decrease, and the traversal time will increase, such as (5). ).

BTREE

BTREE index is a kind of index value stored in a tree-shaped data structure according to a certain algorithm. I believe that children who have learned data structure will still remember the experience of learning the data structure of binary tree. Anyway, I was at that time. For the soft exam, I was tossed with this thing, but I didn't seem to take this exam that much. Like a binary tree, each query starts from the entry root of the tree, traverses the nodes in turn, and obtains the leaf.

The form of BTREE in MyISAM is slightly different from Innodb

In Innodb, there are two forms: one is the primary key form, the leaf node stores data, and not only stores the data of the index key, but also stores the data of other fields. The second is the secondary index, whose leaf node is similar to the ordinary BTREE, but also stores the information pointing to the primary key.

In MyISAM, the primary key is not much different from the others. However, the difference from Innodb is that in MyISAM, the leaf node does not store the information of the primary key, but the information that points to the corresponding data row in the data file.

RTREE

RTREE is rarely used in mysql and only supports the geometry data type. The only storage engines that support this type are MyISAM, BDb, InnoDb, NDb, and Archive.

The advantage of RTREE over BTREE is range lookup.

Usage of various indexes

(1) For BTREE, the default index type of Mysql, it has universal applicability

(2) Since FULLTEXT does not support Chinese very well, it is best not to use it without a plug-in. In fact, for some small blog applications, it is only necessary to create a keyword list for them during data collection, and indexing by keywords is also a good method, at least Yu'an I often do this.

(3) For some search engine-level applications, FULLTEXT is also not a good solution. The files created by Mysql's full-text index are still relatively large, and the efficiency is not very high. Even if the Chinese word segmentation plug-in is used, the Chinese Participle support is also only general. If you really want to encounter this kind of problem, Apache's Lucene may be your choice.

(4) It is precisely because the hash table has an unparalleled prime advantage in processing a small amount of data, so the hash index is very suitable for caching (memory database). For example, the in-memory version of mysql database Memsql, the widely used caching tool Mencached, NoSql database redis, etc., all use the form of hash index. Of course, if you don't want to learn these things, Mysql's MEMORY engine can also meet this demand.

(5) As for RTREE, I haven’t used Yu’an yet. I don’t know how it works. Students who have experience in using RTREE can communicate at that time!

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=326446910&siteId=291194637