The MySQL Story—Creating High-Performance Indexes

Create high-performance indexes


1. Index Basics

The easiest way to understand how indexes work in MySQL is to look at the "index" section of a book: if you find a specific topic in a book, you will usually look at the "index" of the book first to find the corresponding page number.

In MySQL, the storage engine uses indexes in a similar way. It first finds the corresponding value in the index, and then finds the corresponding data row based on the matching index record. Suppose you want to run the following query:
mysql> SELECT first_name from sakila.actor where actor_id=5;
If there is an index on the actor_id column, MySQL will use the index to find the row with actor_id 5. That is to say, MySQL first indexes Lookup by value and return all rows that contain that value.

An index can contain values ​​from one column or multiple columns. If the index contains multiple columns, the order of the columns is also important. Because MySQL can only efficiently use the leftmost prefix column of the index.

The sequence of fields in the index structure is not subject to the order of fields in the equality judgment condition expression in the query, but is subject to the inequality judgment expression.

Types of Indexes
In MySQL, indexes are implemented at the storage engine layer rather than at the server layer.
Insert image description here
B-Tree index
full-value matching: refers to matching with all columns in the index. For example, match all fields of key (last_name, first_name, dob).
Match the leftmost prefix: only use the first column of the index, for example, only use last_name
to match the column prefix: it can also match the beginning of the value of a column. For example, for people starting with J, only the first column of the index is used here.
Match range values: Find people with last names ranging from Allen to Barry. Governance only uses the first column of the index.
Exactly match a column and range match another column: last_name in the first column matches exactly, first_name in the second column matches range.
Query that only accesses the index: that is, the query only needs to access the index.

B-Tree index restrictions
If you do not start searching according to the leftmost column of the index, you cannot use the index. An index such as the example above cannot find a person named Bill, nor can it find a person with a specific birthday.
Columns in the index cannot be skipped. It is not possible to find people whose last name is Smath and who were born on a specific date because the first_name column is skipped.
If there is a range query of a certain column in the query, all the columns to the right cannot use the index to optimize the query. For example, query where last_name="Smath" and first_name like "%J" and dob='1970-02-10'. This query can only use the first two columns of the index because like is a range condition.

Hash Indexes
Insert image description here
Hash indexes are implemented using hash tables, and only queries that exactly match all columns in the index are valid.

Hash index limitations
A hash index can only contain hash values ​​and row pointers. So you cannot use the value of the index to avoid reading the row.
Hash indexes are not stored in the order of index values, so they cannot be sorted.
Hash indexes do not support partial index matching column lookups.
Hash indexes only support equality comparison queries.
Accessing a hashed index is very fast unless there is a hash collision. When a hash conflict occurs, linked list storage is performed.
When hash conflicts occur, index rebuilding will be very expensive.

InnoDB engine has a special feature called "adaptive hash index (adaptive hash index)". When InnoDB notices that some index values ​​are used very frequently, it will create a hash index based on the B-Tree index in memory, so that the B-Tree index also has some advantages of the hash index, Such as fast hash lookup.

Second, the advantages of the index

1. The index greatly reduces the amount of data that the server needs to scan.
2. Indexes can help the server avoid sorting and temporary tables.
3. Indexes can turn random I/O into sequential I/O

3. High-performance indexing strategy

The independent column
select actor_id from actor where actor_id+1=5
actor_id+1=5 cannot be parsed into actor_id=4, so the index column must be stored separately on the more consistent side.

Prefix indexes and index selectivity
Sometimes it is necessary to index very long character columns, which can make the index large and slow.
Usually some characters at the beginning of the index can be used, which can greatly save index space and improve index efficiency. But this will also reduce the selectivity of the index.
select count(DISTINCT city)/count(*) from city_demo
ALTER table city ADD KEY (city(7))
The selectivity of the index refers to the unique index value (also known as cardinality, cardinality) and the total number of records in the data table (#T) ratio, ranging from 1/#T to 1. The higher the selectivity of the index, the higher the query efficiency, because a highly selective index allows MySQL to filter out more rows when searching. The selectivity of a unique index is 1, which is the best index selectivity and has the best performance.
It can be used when the unique index value/total number of records is close to 0.031.

Multi-column indexes
Multiple single-column indexes will cause index merging, which is not an optimal strategy.

Choosing the Appropriate Index Column Order
When sorting and grouping do not need to be considered, it is often good to put the most selective columns in the index. At this time, the role of the index is only to optimize the search for WHERE conditions.
select * from payment where staff_id=584 and customer_id=30.
By executing
select sum(customer_id=30), sum(staff_id=584) from payment
which column has a small cardinality, put which column in the forefront

When the condition value cardinality of a prefix index is higher than the normal value, the index is basically useless. For example, the index column satisfies all rows of the entire table.

Clustered index
Clustered index is not a separate index type, but a data storage method. InnoDB clustered index saves B-Tree index and data row.
Insert image description here
Insert image description here
Insert image description here

Covering Index
If the index contains the values ​​of the required query fields, it is called a covering index.
The benefits of covering indexes:
1. Index entries are usually much smaller than the size of data rows, so if you only need to read the index, then MySQL will greatly reduce the amount of data access.
2. Because the index is stored in the order of column values ​​(at least within a single page), I/O-intensive range queries will require much less I/O than randomly reading a row of data from disk.
3. Due to InnoDB’s clustered index, covering indexes are particularly useful for InnoDB tables. InnoDB's secondary index saves the primary key value of the row in the leaf node. If the secondary primary key can cover the query, it can avoid the secondary query of the primary key index.

Disadvantages of covering index:
1. The insertion speed is heavily dependent on the insertion order. Inserting in the order of the primary key is the fastest way to load data pages into the InnoDB table. But if the data is not loaded in primary key order, it is best to use the optimize table command to reorganize the table after the loading is completed.
2. Updating clustered index columns is expensive because it forces InnoDB to move each updated row to a new location. When inserting, you will face the problem of page splitting. Page splits cause the table to take up more disk space.
3. The secondary index may be larger than expected because the leaf nodes of the secondary index store the primary key of the reference row.
4. Secondary index access requires two index searches to return the table. For InnoDB, adaptive hash index can reduce such repeated work.

To search for rows in the secondary index, you need to find the primary key value corresponding to the leaf node, and then find the value corresponding to the clustered index. The
Extra column using the covering index EXPLAIN will display Using index

Insert rows in the order of the primary key in InnoDB
Disadvantages of random insertion:
the written target page may not be in the memory cache, so when inserting records, you need to read the target page from the disk into the memory first. This results in a lot of random I/O. If it is a sequential insertion, since it is inserted after the previous record, in most cases (when there is no need to open a new page) the disk page has already been loaded into the memory.
Because writes are out of order, InnoDB may need to continuously perform page split operations to allocate space for new rows. Page splitting will cause a large amount of data to be moved, and a split will modify at least three pages instead of one page.
Due to frequent paging, pages will become sparse and filled irregularly, eventually leading to data fragmentation.

When does a sequential primary key cause worse results?
For highly concurrent workloads, inserting in primary key order in InnoDB can cause significant contention. The upper bound of the primary key becomes a "hot spot". Because all insertions occur here, concurrent insertions may cause gap lock contention. Another hotspot may be the AUTO_INCREMENT lock mechanism. If you encounter this problem, you may need to consider redesigning the table or application, or changing the innodb_autoinc_lock_mode configuration. If your server version does not support the innodb_autoinc_lock_mode parameter, you can upgrade to a new version of InnoDB, which may work better for this scenario.

If the index covers the fields in the where condition, but not the fields involved in the entire query, it will still go back to the table to get the data rows.
select * from products where actor='SEAN' and title like '%APOLLO%'
can be solved using delayed association:
select * from products JOIN(
select product_id from products where actor='SEAN' and title like '%APOLLO%' )
t1 ON t1.product_id=products.id

Sorting using index scans
MySQL has two ways to generate ordered results: through a sort operation, or by scanning in index order. If the value of the type column from EXPLAIN is "index", it means that MySQL uses index scanning for sorting (do not confuse it with the "Using index" of the Extra column).
Only when the column order of the index is exactly the same as the order of the ORDER BY clause, and the sorting direction (forward or reverse) of all columns is the same, MySQL can use the index to sort the results.
If the query needs to associate multiple tables, the index can be used for sorting only when all the fields referenced by the ORDER BY clause are from the first table. The restriction of the ORDER BY clause is the same as that of the search query: it needs to meet the requirements of the leftmost prefix of the index. Otherwise, MySQL needs to perform sorting operations and cannot use index sorting.

Compressed Indexes
MyISAM uses prefix compression to reduce the size of indexes.

Redundant and Duplicate Indexes
Duplicate indexes are not necessary
. Redundant indexes can satisfy queries with different conditions.

There are some differences between redundant indexes and duplicate indexes. If index (A,B) is created, index (A) is a redundant index when creating index (A), because this is just a prefix index of the previous index. Therefore, index (A, B) can also be used as index (A) (this redundancy is only for B-Tree index). If index (B,A) is created again, it is not a redundant index, nor is index (B), because B is not the leftmost prefix column of index (A,B). In addition, other different types of indexes (such as hash indexes or full-text indexes) will not be redundant indexes for B-Tree indexes, regardless of the index columns covered.

Indexes and Locks
InnoDB uses shared locks on secondary indexes and exclusive locks on primary key indexes.


Guess you like

Origin blog.csdn.net/weixin_45841848/article/details/132703322
Recommended