MySQL - Detailed explanation of indexes

Table of contents

1. Why is there an index?

2. What is an index?

3. Principle of Index

4. MySQL storage engine

5. Index data structure

6. Clustered and non-clustered indexes

7. Index design principles


1. Why is there an index?

In general application systems, the read-write ratio is about 10:1, and insertion operations and general update operations rarely cause performance problems. In a production environment, what we encounter the most, and the ones most prone to problems, are some complex ones. Query operations, so the optimization of query statements is obviously a top priority. Speaking of accelerating queries, we have to mention indexes.

2. What is an index?

An index is also called a "key" in MySQL, which is a data structure used by the storage engine to quickly find records. Indexes are critical to good performance, especially when the amount of data in the table becomes larger and larger, the impact of indexes on performance becomes more and more important. Index optimization should be the most effective method for optimizing query performance. Indexes can easily improve query performance by orders of magnitude. The index is equivalent to the phonetic sequence of the dictionary. If you want to look up a certain word, if you don't use the phonetic table, you need to look it up page by page from hundreds of pages.

3. Principle of Index

The purpose of indexing is to improve query efficiency, which is the same as the table of contents we use to check books: first locate the chapter, then locate a section under the chapter, and then find the page number. Similar examples include: looking up in a dictionary, looking up train numbers, airplane flights, etc.

The essence is: filter out the final desired results by constantly narrowing the scope of the data you want to obtain, and at the same time turn random events into sequential events . In other words, with this indexing mechanism, we can always use Use the same search method to lock data.

The same is true for the database, but it is obviously much more complicated, because it not only faces equivalent queries, but also range queries (>, <, between, in), fuzzy queries (like), union queries (or), etc. How should the database choose to deal with all problems? Let’s think back to the dictionary example. Can we divide the data into segments and then query them in segments? The simplest way is if there are 1,000 pieces of data, 1 to 100 are divided into the first section, 101 to 200 are divided into the second section, 201 to 300 are divided into the third section... In this way, if you search for the 250th piece of data, you only need to find the third section, all at once. 90% of invalid data was removed. But if it is a record of 10 million, how many segments should it be divided into? According to the search tree model, its average complexity is lgN, which has good query performance. But here we have overlooked a key issue. The complexity model is based on the same operation cost each time. The database implementation is more complicated. On the one hand, the data is saved on the disk. On the other hand, in order to improve performance, part of the data can be read into the memory for calculation each time, because we know that the cost of accessing the disk is about 100,000 yuan of accessing the memory. About times, so a simple search tree is difficult to meet complex application scenarios.

4. MySQL storage engine

#查询索引
show engines;

The characteristics of each storage engine are shown in the following table:


The two most common storage engines are MyISAM and InnoDB

Features InnoDB MyIsam Memory Archive BDB
storage limit 64TB No have No No
transaction security support support
lock mechanism row lock table lock table lock row lock page lock
B-tree index support support support support
Hash index support support
Full text index support
cluster index support
Data cache support support
index cache support support support
Data is compressible support support
space usage high Low N/A very low Low
memory usage high Low medium Low Low
Batch insert speed Low high high very high high
Support foreign keys support


5. Index data structure

MySQL mainly uses two structures: B+ Tree index and Hash index: InnoDB storage engine defaults to B+Tree index, and Memory storage engine defaults to Hash index.

In MySQL, only Memory (Memory tables only exist in memory and disappear when power is off, and are suitable for temporary tables) storage engine supports Hash indexes, which is the default index type for Memory tables. Memory tables can also use B+Tree indexes. Hash index organizes data in hash form, so when searching for a certain record, it is very fast. But because of the hash structure, each key only corresponds to one value, and it is distributed in a hash manner. So it does not support functions such as range search and sorting. B+Tree is the most frequently used index data structure in MySQL and is the index type of InnoDB and MyIsam storage engine modes. Compared with Hash index, B+Tree is not as fast as Hash index in searching a single record, but it is more popular because it is more suitable for operations such as sorting. After all, it is impossible to operate only a single record in the database.

The B+ tree is a balanced multi-fork tree. The height difference from the root node to each leaf node does not exceed 1, and there are pointer-related connections between the two nodes at the same level. Conventional retrieval on the B+ tree, from the root node to The search efficiency of leaf nodes is basically the same and will not fluctuate significantly. Moreover, during index-based sequential scanning, bidirectional pointers can also be used to quickly move left and right, which is very efficient. Therefore, B+ tree indexes are widely used in scenarios such as databases and file systems.
Hash index uses a certain hash algorithm to convert the key value into a new hash value. When retrieving, there is no need to search step by step from the root node to the leaf node like a B+ tree. It only needs one hash algorithm to locate it immediately. Getting to the corresponding location is very fast.

A comparison of the two indexes is as follows:

If it is an equivalent query, then the hash index obviously has an absolute advantage, because the corresponding key value can be found only through one algorithm, provided that the key values ​​are all unique. If the key value is not unique, you need to find the location of the key first, and then scan backwards according to the linked list until you find the corresponding data.

If it is a range query retrieval, the hash index is useless at this time, because the originally ordered key values ​​may become discontinuous after the hash algorithm, and there is no way to use the index to complete the range. Query retrieval; hash indexes cannot use the index to complete sorting, and partial fuzzy queries such as like; hash indexes do not support the leftmost matching rule of multi-column joint indexes.

The keyword retrieval efficiency of the B+ tree index is relatively average and does not fluctuate as much as the B-tree. When there are a large number of duplicate key values, the efficiency of the hash index is also extremely low, so there is a hash collision problem.

Comparison summary:

  • Hash type index: single query is fast, range query is slow
  • btree type index: b+ tree, the more layers, the amount of data increases exponentially (InnoDB supports it by default)

6. Clustered and non-clustered indexes

The index type of mysql is related to the storage engine. InnoDB storage engine data files and index files are all placed in the ibd file, while MyIsam's data files are placed in the myd file, and the index is placed in the myi file. In fact, there is a distinction between clustered index and Non-clustered indexes are very simple, just determine whether the data and index are stored together.

When the InnoDB storage engine inserts data, the data must be placed together with the index. If there is a primary key, use the primary key. If there is no primary key, use the unique key. If there is no unique key, use the 6-byte rowid. Therefore, it is bound to the data. Together is the clustered index. In order to avoid redundant data storage, the leaf nodes of other indexes store the key values ​​​​of the clustered index. Therefore, there are both clustered indexes and non-clustered indexes in InnoDB, while in MyIsam Only non-clustered indexes.

7. Index design principles

When designing the index, you should ensure that the space occupied by the index field is as small as possible. This is just a general direction, and there are some details that need to be paid attention to:

  1. Columns suitable for indexing are those that appear in the where clause, or are specified in the join clause
  2. Tables with small cardinality have poor index performance and there is no need to create indexes.
  3. When selecting index columns, the shorter the better. You can specify part of certain columns. It is not necessary to use the values ​​of all fields.
  4. Don’t create an index for every field in the table. The more indexes, the better.
  5. Data columns defined with foreign keys must create indexes
  6. Do not have indexes for frequently updated fields
  7. Do not create an index with too many columns. You can create a composite index, but it is not recommended to have too many columns in the composite index.
  8. Do not create indexes for large text and large objects

Guess you like

Origin blog.csdn.net/DreamEhome/article/details/128836827