High-performance MYSQL (study notes) - index 4

Index Case Study

The best way to understand indexes is to combine examples, here is an example of indexes.

Suppose you want to design an online dating site, the user information table has many columns, including country, region, city, gender, eye color, and so on. Sites must support various combinations of these features to search for users, and must also allow results to be restricted by sorting users based on their last online time, other members' ratings of users, and so on.

Supports multiple filter conditions

The selectivity of the Count column is not high, but it may be used in many queries. The selectivity of the sex column is very low, but it is also used in many queries. Therefore, considering the frequency of use, it is recommended to use the (sex, country) column as a prefix when creating different composite indexes. Reason for choice:

Almost all queries will use the sex column. If a query does not restrict gender, we can add AND SEX IN ('m', 'f') to let MYSQL select the index, although this does not filter any rows and does not The result is the same when added, but after adding it, MYSQL can match the leftmost prefix of the index. This trick works great in these kinds of scenarios, but only if the IN() list is short.

Principle: Consider all options on the form. When designing indexes, do not only consider which indexes are needed for existing queries, but also consider optimizing the query. If you find that some queries need to create new indexes, but this index will reduce the efficiency of other queries, you should Consider whether you can optimize the original query. Queries and indexes should be optimized at the same time to find the best balance!

Next consider which combinations of where conditions are and need to understand which combinations will be slow without a suitable index. An index on (sex, country, age) is an obvious choice, and possibly a combination of (sex, country, region, age) and (sex, country, region, city, age) indexes. The IN() trick can be used to avoid needing different indices at the same time. And there's a reason we put the age index on the last column, we always try to make mysql use as many index columns as possible, because queries can only use the leftmost prefix of the index until the first range query is encountered . Age is mostly a range query, between 18 and 25.

Principle: In the combined index, put the range index in the last column, because of the leftmost matching principle. You can add more columns to the index and cover those columns that are not in the where clause by IN(). But not to be abused.

Avoid multiple range conditions

  Where eye_color in(‘brown’,‘blue’,‘Hazel’)

And hair_colorin(‘black’,’red’,’blonde’,’brown’)

And sex in(‘F’,’M’)

And last_online>DATE_SUB(NOW(),INTERVAL7 DAY)

And age between 18 and 25

What is a range condition? Wehre actor_id>12 is a range conditional query, and other index columns behind the range can no longer be used, where actor_idin(1,4,99) is a multiple equal value query.

Optimizing sorting:

When sorting is required, for those columns with low selectivity, you can add some special indexes for sorting. You can create (sex, rating) indexes for the following queries:

Select cols from profiles where sex=’M’order by rating limit 10;

However, when the user interface needs to turn pages and it is relatively late, it will be very slow. As the offset increases, mysql needs to spend a lot of time scanning the data that needs to be discarded. Denormalization and precomputing caches may It is the only strategy to solve this kind of query. There are two suggestions: 1. Limit the number of pages turned by users; 2. Use delayed association, return the required primary key through index query, and then associate the original table with these primary keys to obtain the required rows. E.g:

Select cols from profiles inner join(select<primary key cols> from profiles where x.sex=’M’ order by rating limit100000,10) as x using(<primary key cols>);

Maintain indexes and tables

Find and repair damaged tables refer to the Mysql section, which is not very understandable here. . . .

Summarize

MySQL indexes are a very complex topic! The way MYSQL and storage engines access data, coupled with the characteristics of indexes, make indexes a powerful and flexible work that affects data access.

There are three principles to always keep in mind when choosing indexes and writing queries that utilize them:

1. Single-line access is very slow. Especially in mechanical hard drive storage (SSD random I/O is much faster, but this still holds). A lot of work is wasted if the server reads a block of data from storage just to get one of the rows. It is best to read blocks that contain as many lines as possible. Use indexes to create positional references for efficiency.

2. Accessing range data sequentially is fast for two reasons: first, sequential I/O does not require multiple disk seeks, so it is much faster than random I/O (especially for mechanical hard disks). Second, if the server can read data in the required order, then no additional sorting operations are required, and group by queries do not need to be sorted and aggregated by rows and groups.

3. The index coverage query is very fast. If an index contains all the columns required by the query, the storage engine does not need to return to the table to query rows. This avoids a lot of one-line accesses.

In general, when writing query statements, you should choose an appropriate index to avoid single-row queries, use the native order of data as much as possible to avoid additional sorting operations, and use indexes to cover queries as much as possible. Samsung" rating system is consistent.

How to judge whether an index created by a system is reasonable:

It is recommended to analyze the query according to the response time, find out the query that takes the longest time or the query that puts the most pressure on the server, and then check the schema, SQL and index structure of these queries to determine whether any query scans too many Are the rows doing too much extra sorting or using temporary tables, using random I/O to access the data, or returning too many unwanted column operations.


Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325476539&siteId=291194637