High Performance MySQL (3): create high-performance index

First, the basic index

1.1 What is indexing

Index, also known as "key key" (the index can be understood as a book), is the storage engine used to quickly find records of a data structure,

Storage engine layer is the service layer rather than the implementation, different storage engines indexing (even if multiple storage engines support the same type of index, its underlying implementation may not be the same).

1.2, the index advantage

(1) allows the server to quickly locate the position of the specified table , the server reduces the amount of data to be scanned;

(2) B-Tree as the common data are stored in order to help avoid server sorting and temporary tables;

(3) the random I / O becomes the sequential I / O.

1.3, the index type

1.3.1, B-Tree index

It uses the B-Tree data structure to store data.

(In fact, many storage engine using B + Tree index (with the InnoDB) : i.e. each leaf node contains a pointer to the next leaf node, thereby facilitating traversal range of the leaf node)

B-Tree means that all values ​​are stored sequentially, and each leaf to the page with the same distance below show how the B-Tree index InnoDB works:

(1) B-Tree Principle :

- B-Tree can speed up access to data, the storage engine is no longer needed because a full table scan to obtain the required data, instead of the root node start searching from the index ;

- storing groove root pointers to child nodes, pointers to these storage engine to find the next level;

- You can find a suitable pointer by comparing the values ​​of leaf nodes and enter the values ​​you want to find the child nodes;

- Final storage engine either find the corresponding value or record does not exist.

- as stored in the order, the B-Tree will be present with values associated column, and can do ORDER BY GROUP BY operations .

(2) B-Tree index of the type of query is valid:

        Full value match: matching all the columns in the index

        The most left-prefix match: using the index of the first column

        Prefix match column: matching only the first section may be a column value

        Match value range: to find information between A and B

        Exact match a column and a further range matching:

        Access only index query:

(3) B-Tree indexes limits:

       If it is not in accordance with the leftmost column index to start looking, you can not use the index:

       You can not skip index columns:

       If the query has a column range queries, it can not use all the columns to the right of the Index Tuning find;

1.3.2, hash indexes

(1) hash indexes principle:

Based on the hash table to achieve, only an exact match to query all the columns to be effective. Each row of the data storage engine are calculated for all the columns of a hash code indexes, hash indexes all the hash code stored in the index, while preserving a pointer to each row of data in a hash table.

In MySQL Only Memory storage engine supports hash indexes , but also its default index type.

(2) InnoDB engine has a special feature called "adaptive hash index":

When InnoDB notices that some of the index value is used so often, it will again be created in memory on the B-Tree index is based on a hash index, so B-Tree index hash index also has some advantages, such as fast Ha hope to find.

How to solve adaptive hash index from time to time cause the collapse of MySQL reboot?

....pending upgrade

(3) hash indexes restrictions:

        Hash index contains only the row pointer and a hash value, without storing the field value, the value of the index can not be used to avoid reading lines;

        Hash index data is not stored in the order according to index values, it can not be used for sorting;

        It does not support partial index matching lookup column;

        Equivalence comparison only supports queries, including =, IN (), <=>;

        Access hash index very quickly, unless there is a lot of conflict

(4) Example using hash index

Example 1: such as the need to store large amounts of URL, and the need to search to find based on the URL of . If you use the B-Tree to store the URL, the stored content will be great, because the URL itself is very long, the query query as follows under normal circumstances:

The optimized query: delete the original index on the URL column, add a column indexed url_crc, use CRC32 hashes do :

Because MySQL optimizer will use this very selective and very small to accomplish based on url_crc Liede index lookup, even if there are multiple records with the same index value, look for still fast, just do a quick based on a hash value integer comparison will be able to find the index entry, and then return to-one comparison of the corresponding row.

1.3.3, R-Tree spatial index

And different B-Tree index, the index without a prefix such a query, it will come to index data from all dimensions, you must use the GIS to maintain data correlation function

1.3.4, full-text indexing

Find the text of the keyword, rather than a direct comparison of the value of the index, similar to the search engines do rather than matching WHERE condition. Match against suitable for operation instead of the normal operation WHERE condition.

 

1.4, the index is the best solution? Use the index under what circumstances?

(1) For the small table in most cases more efficiently scan the entire table;

(2) is effective for medium-sized to large table index;

(3) For large tables, indexes is costly, a technique is needed to distinguish between query a set of data required, rather than a match record a record , - can be used e.g. divisions table art;

- can also create a " meta data information table ", used to query the need to use certain features, such as performing a plurality of applications that require a polymerization distributed data query in a plurality of tables , it is necessary to record the "information for which users store Bureau element in which the table ", and when such a query can be directly executed ignore those tables do not contain user information;

- For TB level data, the positioning is not sense a single record, use the block-level metadata index alternative techniques.

 

Second, high-performance indexing strategy

2.1, a separate column

"Independent column" refers index primers column can not be part of an expression, not a function of the parameters . If the query columns are not independent, MySQL will not use the index.

Error Example 1: The following queries can not use actor_id Index:

SELECT actor_id  FROM sakila.actor WHERE actor_id+1=5;

 Analysis: WHERE expression is equivalent to actor_id = 4, but MySQL can not resolve this equation automatically. We should cultivate the habit of simplifying the WHERE condition, always alone on one side of the comparison index column symbols.

Error Example 2: Index is not a function of the parameters

2.2, the prefix index and index selectivity

 Sometimes a long string index, such as the URL hash indexes mentioned earlier. But if the thief is long such as BLOB, TEXT, VARCHAR type column, MySQL does not allow the full length of the index, then you need to use the "prefix index."

What is the prefix of the index : the index portion of the character can usually be started, so you can save index space, thereby enhancing the efficiency of the index, but the index will reduce the use of the prefix index selectivity.

What is selective index : the ratio of the total number of records do not overlap index value (base) and the data table, the index allows high selectivity filters MySQL more rows. The selectivity is a unique index is the best performance index selectivity.

How to choose the prefix length:

To select a prefix long enough to ensure high integrity, at the same time not too long in order to save space, the prefix "base" should be close to a complete list of "base." Need to find a list of the most common values, and then a list of the most common prefixes are compared.

2.3, multi-column index

Common Errors (1) multi-column index

a. Column create a separate index for each is - " the WHERE condition inside the columns were built on the index," the misconception leads can be optimized but not absolute

5.0 introduction of the " index merging ", can use a plurality of separate index table to locate a specific row, and the combined result, there are three variants: OR joint, AND joint, before merging the two.

But when you need to use the combined index means that the index is poorly designed:

  • When multiple AND: implies the need for a multi-column index contains all the columns, rather than multiple separate single-column index;
  • A plurality of OR: it takes on the caching algorithms, sort, merge operations in a large number of CPU and memory.

When you see how to optimize the index merge? - Check to see if the query and the table structure has been optimized to minimize the use of "Index Merge"

b. Create a multiple-column index in the wrong order

2.4, select the appropriate column index order (only for B-Tree, only it has the sequence)

(1) Why should we pay attention to the order of the index

To meet the demand in line with the order of columns in a query ORDER BY, GROUP BY clause, etc., because the index is the order in accordance with the leftmost column, second row, third column ... this sort.

(2) How to choose the order of the index

The highest selectivity index of the column at the forefront of most

2.5, clustered index (the index is not a separate type)

(1) What is the clustered index:

While the data lines because it can not be stored in two different places so a table only one clustered index. It is not a separate index type, but a data storage , in a structure that holds the same B-Tree index and data lines.

(2) clustered index advantages:

a, can be saved together relevant data - for example, when implementing e-mail, you can gather data based on user ID, so just read a few pages of data from the disk will be able to obtain all messages to a user, if you do not use every message Mail may cause a disk I / O.

B, faster data access - clustered index and the index data stored in the same B-Tree, so to obtain data from a clustered index faster

c, using the query index scan may be used to cover the leaf nodes directly in the primary key.

(2) clustered index disadvantages:

a, insertion speed is heavily dependent on the order of insertion

b, update clustered index column costly

c, resulting in a full table scan slower

d, secondary index (non-clustered index) access requires twice the index to find, because the secondary index leaf node contains the primary key columns of reference row

2.6, covering index

(1) What is covered by the index:

If an index contains (or covering) the value of all the fields to be queried, it is called "covering index."

(2) covering index using scene

Usually to create appropriate indexes based on the WHERE condition, but also need to consider not only the entire query WHERE part, MySQL can use an index direct access to data column, so do not read data line, if the index leaf node has been included to be queried data, no need to query back to the table yet, or " scan only needs to scan the index without scanning the table " performance increase.

(3) covering index advantages:

a, only need to read the index, traffic can be reduced to improve performance;

b, some storage engines such as MyISAM in-memory cache only the index data is dependent on the operating system to the cache, so the access data to a system call, but do not have access to index it faster;

C, are particularly useful for the clustered index InnoDB, InnoDB secondary index is maintained in the primary key of the row leaf nodes, if two queries to cover the primary key, the primary key index to avoid secondary query.

(4) which can not be covered where the index:

A, the MySQL LIKE operation can not be performed in most cases in the index, which is the underlying storage engine API limits. It can be done in the index LIKE comparisons leftmost match, because the operation can be converted into a simple comparison (eg LIKE abc), but can not be a wild card at the beginning of LIKE query (such as LIKE%, LIKE _abc, LIKE []) , Anti do not use over LIKE indexed columns . In this case MySQL can not read the index value can only extract the value of the data row.

2.7, using an index scan to sort do

And that the order of columns in the ORDER BY coincides sequence set index, and the same sort direction all columns, MySQL can use to sort the results based index;

If the query associated with multiple tables, only the ORDER BY clause references a field all to do with sorting the index at the first table

2.8, compression (prefix compression) Index

MyISAM prefix compression to reduce the size of the index, so that more index into memory

2.9, redundancy and duplication index

MySQL allows you to create multiple indexes on the same column. But the impact on performance.

(1) What is the duplicate index: index finger to create the same type in the same order in the same column;

There are serious errors when creating an index : After you create a primary key, the only limit to add, then add the index for the query . In fact, the only restrictions and limitations MySQL is the primary key index achieved by the above practice is in fact created on the same column three duplicate index , no need to do so, unless you create different types of indexes on the same column to meet the needs of different queries .

(2) What is redundant indexes: If an index (A, B) and then create an index is created (A) is redundant indexes, because (A) just before a prefix index index index (A, B) can also be used as index (A) using the - B-Tree engines, only for.

2.10, and index locks

Although the high efficiency but InnoDB row lock to lock when the line still bring additional overhead, while locking lock line will increase competition more than necessary to reduce concurrency. InnoDB only when its access lines will be locked, and the index can reduce the number of access lines InnoDB, make inquiries locked fewer lines , thereby reducing the number of locks.

Third, the index case study

(1) supports a variety of filter conditions

To see which column has a lot of different values, which columns appear most frequently in the WHERE clause, you create an index on such a column.

(2) a plurality of range condition to avoid

If there is a last_online column and hope that through the following query to display the user in the past few weeks on the line before:

The reason: MySQL can use last_online column index column index or age, but they can not be used simultaneously. You can not use multi-range conditions combined index.

(3) Optimization sorting

If a match millions of rows of data query, and the WHERE clause only sex column, how to sort?

For sex and other very low selectivity columns can add some special index to do the sorting, such as creating (sex, rating) index for the following query, the query uses the ORDER BY and LIMIT If you do not have the index will be very slow:

如果要分页,翻页翻到比较靠后时查询比较慢,随着偏移量增加MySQL需要花费大量时间扫描需要丢弃的数据:

如何优化:可以通过延迟关联,通过使用覆盖索引查询返回需要的主键,再根据这些主键关联原表获得需要的行,从而减少MySQL扫描安歇需要丢弃的行数

 

 

上一篇:https://blog.csdn.net/RuiKe1400360107/article/details/103778112

下一篇:

  参考资料:《高性能MySQL 第三版》

 

### 若对你有帮助的话,欢迎点赞!评论!+关注!

发布了52 篇原创文章 · 获赞 116 · 访问量 5万+

Guess you like

Origin blog.csdn.net/RuiKe1400360107/article/details/103783635