Single key index and composite index

Reprinted from http://blog.csdn.net/linminqin/article/details/44342205

from http://talentluke.iteye.com/blog/1843868

from http://book.51cto.com/art/200906/132406.htm

8.4.5 The pros and cons of indexing and how to determine whether indexing is needed I

believe readers know that indexing can greatly improve the efficiency of data retrieval and make Query execute faster, but not everyone may know that indexing greatly improves retrieval. At the same time of efficiency, it also brings some negative effects to the database. The following is a brief analysis of the advantages and disadvantages of indexes in MySQL. The benefits of indexing The benefits of indexing

may be considered by many readers as "improving the efficiency of data retrieval and reducing the IO cost of the database". Indeed, the biggest benefit of creating an index on a field of a table in the database is that when the field is used as a retrieval condition, it can greatly improve retrieval efficiency, speed up retrieval time, and reduce the amount of data that needs to be read during retrieval. But does the benefit of indexing just improve the retrieval efficiency of table data? Of course not, the index also has a very important purpose, which is to reduce the sorting cost of data. We know that the data in each index is sorted and stored according to the key value of the index key. Therefore, when the Query statement contains a sorting and grouping operation, if the sorting field and the index key field are exactly the same, MySQL Query Optimizer will tell mysqld does not need to sort after obtaining the data, because the data obtained according to the index already meets the sorting requirements of the customer.







What if it is a group operation? The grouping operation cannot be done directly using the index. However, the grouping operation needs to be sorted first and then grouped, so when the Query statement contains the grouping operation, and the grouping field is exactly the same as the index key field, then mysqld can also use the feature that the index is already sorted and omit the grouping. sorting operation.

The sorting and grouping operation mainly consumes memory and CPU resources. If the index can be used well in the sorting and grouping operation, the consumption of CPU resources will be greatly reduced.

Disadvantages of

indexes The benefits of indexes are clear, but we can't just see these benefits and think that indexes are the bible for Query optimization. As long as you find that Query is not running fast enough, put all the conditions in the WHERE clause in the index.

Indeed, an index can greatly improve the efficiency of data retrieval and the performance of sorting and grouping operations, but one problem that cannot be ignored is that the index is a part of the data that is completely independent of the basic data. Assuming that the index idx_ta_ca is created in Column ca in Table ta, any operation to update Column ca, MySQL needs to update the index data of Column ca while updating Column ca in the table, and adjust the index whose key value changes due to the update. information. And if Column ca is not indexed, all MySQL has to do is update the information of Column ca in the table. In this way, the most obvious resource consumption is the increase in the amount of IO caused by the update and the amount of calculation caused by the adjustment of the index. In addition, the index idx_ta_ca of Column ca needs to occupy storage space, and as the amount of Table ta data increases, the space occupied by idx_ta_ca will continue to increase, so the index will also increase the consumption of storage space resources.

How to determine whether to create an index

After understanding the advantages and disadvantages of an index, how do we determine whether an index should be created?

In fact, there is no very clear law that clearly defines what fields should be indexed and what fields should not be indexed. Because the application scenarios are too complex and there are too many differences. Of course, some basic decision strategies can still be found to help with the analysis.

1. Fields that are frequently used as query conditions should be indexed. The most effective way to

improve the efficiency of data query and retrieval is to reduce the amount of data that needs to be accessed. From the benefits of the above index, we know that the index is to reduce the use of index key fields as queries. The most effective means of IO volume of conditional Query. Therefore, in general, indexes should be created for more frequent query condition fields.

2. Fields with too poor uniqueness are not suitable for creating indexes separately, even if they are frequently used as query conditions. What are the fields with too poor

uniqueness ? The data stored in such fields as status fields, type fields, etc. may be reused in a total of several or dozens of values, and each value will exist in thousands or more records. For such fields, there is absolutely no need to create a separate index. Because even if an index is created, MySQL Query Optimizer will not choose to use it most of the time. If MySQL Query Optimizer chooses this kind of index, I regret to tell you that this may cause great performance problems. Since each value in the index field contains a large number of records, the storage engine will bring a lot of random IO when accessing data according to the index, and even sometimes a lot of repeated IO will occur.

This is mainly due to the characteristics of data based on index scan. When we access the data in the table through the index, MySQL will access it sequentially according to the key value order of the index key. Generally speaking, each data page will store multiple records, but most of these records may not be consistent with the key value order of the index key you use.

Suppose there is the following scenario, we use the index to find some data whose key values ​​are A and B. After finding the first record that meets the requirements through the A key value, it will read the X data page where this record is located, and then continue to search down the index, and find that another record corresponding to the A key value also meets the requirements, but this A record is not on the X data page, but on the Y data page. At this time, the storage engine will discard the X data page and read the Y data page. This continues until all records corresponding to the A key value are found. Then it is the turn of the B key value. At this time, it is found that the record being searched is on the X data page again, but the X data page read before has been discarded, and only the X data page can be read again. At this point, the X data page has actually been read twice. In the continued search, repeated reads may occur again and again, which undoubtedly greatly increases the amount of IO access to the storage engine.

Not only that, if a key value corresponds to too many data records, that is to say, when a record that accounts for a large proportion of the entire table is returned through this key value, since all random IO is generated according to the index scan, its efficiency is higher than that of the whole table. The sequential IO efficiency of table scan is much lower, even if there is no repeated IO reading, it will also cause the overall IO performance to decline.

Many experienced query tuning experts often say that when the data returned by a query exceeds 15% of the entire table, index scan should not be used to complete the query. For the "15%" number, we can't judge whether it is very accurate, but at least it proves that fields with poor uniqueness are not suitable for creating indexes.

3. Fields that are updated very frequently are not suitable for creating indexes. The disadvantages of indexes have been analyzed

above . When a field in an index is updated, not only the data in the table but also the index data must be updated to ensure that the index information is precise. This problem leads to a large increase in IO access, which not only affects the response time of updating Query, but also affects the resource consumption of the entire storage system and increases the load of the entire storage system.

Of course, it is not suitable to create an index if there is an updated field. From the terms of the decision strategy, it can be seen that it is a "very frequent" field. What kind of update frequency should be considered "very frequent"? per second? every minute? Or hourly? Honestly, it's really hard to define. Most of the time, it is judged by comparing the number of updates in the same time period and the number of queries using this field as a condition. If there are not many queries through this field, it may only be executed once a few hours or longer. Instead, it is more frequent than the query, so such a field is definitely not suitable for creating an index. On the contrary, if we query through this field frequently, but the updates are not particularly numerous, such as dozens or more queries may generate an update, then I personally think the additional cost of the update is acceptable.

4. Fields that do not appear in the WHERE clause should not be indexed.
No one will ask why? I also think this is nonsense, haha!
8.4.6 Single-key index or composite index

After having a general understanding of various types of indexes in MySQL, as well as the pros and cons of the index itself and judging whether an index needs to be created for a field, it is necessary to create an index to optimize Query. In many cases, the filter conditions in the WHERE clause are not only for a single field, there are often multiple fields that exist together as query filter conditions in the WHERE clause. At this time, it is necessary to decide whether to build an index for only the most filterable fields, or to build a composite index on all fields (in the filter conditions).

For this kind of problem, it is difficult to have an absolute conclusion. It is necessary to analyze and consider from many aspects, balance the advantages and disadvantages of the two schemes, and then choose the best one. Because we have learned from the previous section that indexes can improve the performance of some queries, but also reduce the efficiency of some updates. In the composite index, because there are multiple fields, the possibility of being updated in theory is definitely much larger than that of the single-key index, and the additional cost brought by this is higher than that of the single-key index. However, when the query condition in the WHERE clause contains multiple fields, the query efficiency of the composite index formed by these multiple fields is definitely higher than the index created with only one field in the filter condition. Because the data filtered by the single-key index is not complete, compared with the composite index, the storage engine needs to access more records, which will naturally access more data, which means higher IO costs.

Some friends may say that you can create multiple single-key indexes. It is indeed possible to create a single key index for each field in the WHERE clause. But does this really work? In such a case, MySQL Query Optimizer will only select one of the indexes most of the time, and then discard the other indexes. Even if he chooses to use two or more indexes simultaneously to optimize the query through INDEX_MERGE, the effect received may not be more efficient than choosing one of the single-key indexes. Because if you choose to optimize the query through INDEX_MERGE, you need to access multiple indexes and merge several indexes at the same time, which may be more expensive than choosing one of the most efficient indexes.

In general application scenarios, as long as not one of the filter fields can filter more than 90% of the data in most scenarios, and other filter fields will be updated frequently, it is generally more inclined to create a composite index, especially in the case of concurrency. in higher scenarios. Because when the concurrency is high, even if only a small amount of IO consumption is saved for each Query, the total amount of resources saved is still very considerable because of the large amount of execution.

Of course, creating a composite index does not mean that all fields in the query conditions need to be placed in one index, but also should try to make one index used by multiple Query statements, try to reduce the number of indexes on the same table, and reduce the number of indexes on the same table. The index update cost caused by the update can also reduce the storage space consumed by the index.
In addition, MySQL also provides another function to optimize indexes, that is, prefix indexes. In MySQL, you can only use the front part of a field as an index key to index the field, so as to reduce the storage space occupied by the index and improve the efficiency of index access. Of course, the function of prefix index is only applicable to fields with small random repetition of field prefixes. If the prefix content of the fields that need to be indexed has more repetitions, the filterability of the index will naturally decrease, and the amount of data accessed through the index will increase. At this time, although the prefix index can reduce storage space consumption, it may cause The efficiency of Query access is greatly reduced, which is not worth the loss.


Excerpted from http://www.canphp.com/article/show-130.html

Compound Index Optimization An index on

two or more columns is called a compound index.
Additional columns in the index allow you to narrow your search, but using one index with two columns is not the same as using two separate indexes. The structure of a compound index is similar to that of a phone book, where a person's name consists of a first and last name, and the phone book is first sorted by last name pairs, and then by first name for people with the same last name. A phone book is very useful if you know your last name; it is even more useful if you know your first and last name, but it is useless if you only know your first name and no last name.
So when creating a composite index, you should carefully consider the order of the columns. Compound indexes are useful when performing searches on all columns in the index or only the first few columns; they are not useful when performing searches on only any of the following columns.
Such as: building a compound index of name, age, and gender.

The principles of building a composite index:

If you are likely to perform searches on only one column multiple times, that column should be the first column in the compound index. If you are likely to perform separate searches on two columns in a two-column index, you should create another index that includes only the second column.
As shown in the figure above, if you need to query age and gender in the query, you should create a new composite index that includes age and gender.
A primary key with multiple columns is always automatically indexed as a composite index with the columns in the order in which they appear in the table definition, not the order specified in the primary key definition. Determine which column should come first, considering future searches performed by the primary key.
Note that creating a composite index should contain a few columns, and these columns are often used in select queries. Including too many columns in a compound index not only does not bring much benefit. And since a considerable amount of memory is used to store the values ​​of the columns of the composite index, the consequences are memory overflow and performance degradation.

        
Compound index is optimized for sorting:

Compound index is only optimized for order by statements with the same or opposite order as in the index.
When creating a composite index, each column is defined in ascending or descending order. For example, to define a composite index:
Sql code 
CREATE INDEX idx_example   
ON table1 (col1 ASC, col2 DESC, col3 ASC) 
 
There are three columns: col1 ascending, col2 descending, col3 ascending. Now if we execute two queries
1: Select col1, col2, col3 from table1 order by col1 ASC, col2 DESC, col3 ASC
  and index order is the same
2: Select col1, col2, col3 from table1 order by col1 DESC, col2 ASC, col3 DESC

and the index order is reversed.
Query 1 and 2 can be optimized without compound index.
If the query is:
Select col1, col2, col3 from table1 order by col1 ASC, col2 ASC, col3 ASC,

  and the sorting result is completely different from the index, the query will not be optimized by the composite index.


The role of the query optimizer in where queries:

If a multicolumn index exists on columns Col1 and Col2, the following statement: Select * from table where col1=val1 AND col2=val2 The query optimizer will try to determine which index will Find fewer lines. Then use the obtained index to get the value.
1. If a multicolumn index exists, any leftmost index prefix can be used by the optimizer. Therefore, the order of the joint index is different, which affects the selection of the index. Try to put the few values ​​in the front.
For example: a multi-column index is (col1, col2, col3),
    then the search in the index columns (col1), (col1 col2), (col1 col2 col3) will work.

Sql code
SELECT * FROM tb WHERE col1 = val1  
SELECT * FROM tb WHERE col1 = val1 and col2 = val2  
SELECT * FROM tb WHERE col1 = val1 and col2 = val2 AND col3 = val3 
 

2. If the column does not form the leftmost prefix of the index, the built index will not work.
Such as:
Sql code
SELECT * FROM tb WHERE col3 = val3  
SELECT * FROM tb WHERE col2 = val2  
SELECT * FROM tb WHERE col2 = val2 and col3=val3 
 
3. An index is used if the query condition of a Like statement does not start with a wildcard.
For example: %car or %car% do not use an index.
    car% uses the index.
Disadvantages of indexes:
1. Takes up disk space.
2. Increase the operation time of insert and delete. The more indexes a table has, the slower the insertion and deletion. For systems that require quick entry, it is not advisable to build too many indexes.

The following are some common index restriction problems

1. Use the not equal operator (<>, !=)
In this case, even if there is an index on the column dept_id, the query statement still performs a full table scan
select * from dept where staff_num < > 1000;
But such a query is really needed in development, is there no solution to the problem?
have!
By replacing the inequality sign with the or syntax, the index can be used to avoid a full table scan: the above statement can be changed to the following, and the index can be used.
Sql code
select * from dept shere staff_num < 1000 or dept_id > 1000; 
 

2. Using is null or is not null
Using is null or is nuo null will also limit the use of indexes, because the database does not define a null value. If there are many nulls in the indexed column, the index will not be used (unless the index is a bitmap index, which will be explained in detail in a future blog post). Using null in sql statement can cause a lot of trouble.
The solution to this problem is to define the column that needs to be indexed as not null when building the table. 3.

Using functions
Causes the optimizer to ignore these indexes. The following query will not use the index:
Sql code
select * from staff where trunc(birthdate) = '01-MAY-82'; 
 
But when the function is applied to the condition, the index can take effect. Change the above statement to the following statement, you can search through the index.
Sql code
select * from staff where birthdate < (to_date('01-MAY-82') + 0.9999); 
 

4. Comparing mismatched data types
Comparing mismatched data types is also one of the performance problems that are difficult to find.
In the following example, dept_id is a varchar2 type field and there is an index on this field, but the following statement will perform a full table scan.
The Sql code
select * from dept where dept_id = 900198; 
 
this is because oracle will automatically convert the where clause to to_number(dept_id)=900198, which is the case in 3, which limits the use of the index.
Change the SQL statement to the following form to use the index
Sql code
select * from dept where dept_id = '900198'; 
 

Well, there is something to pay attention to here:

from Lao Wang's blog (http://hi.baidu.com/thinkinginlamp /blog/item/9940728be3986015c8fc7a85.html)

For example, there is an article table, we want to implement the function of displaying a reverse time list under a certain category:

SELECT * FROM articles WHERE category_id = ... ORDER BY created DESC LIMIT ...

like this Queries are very common. Basically, you can find a lot of similar SQLs in any application. Academic readers who see the above SQL may say that SELECT * is not good. You should only query the required fields. Then we will To be thorough, change the SQL to the following form:

SELECT id FROM articles WHERE category_id = ... ORDER BY created DESC LIMIT ...



We assume that the id here is the primary key. As for the specific content of the article, it can be stored in a key-value type cache such as memcached. As a result, academic readers should not be able to pick out any faults. Consider how to build an index according to this SQL:

regardless of special cases such as data distribution, any qualified web developer knows SQL like this, and should build a "category_id, created" composite index, but this is the best The answer is no? Not necessarily, now is the time to look back at the title: Indexing in MySQL should consider the type of database engine!

If our database engine is InnoDB, then building a "category_id, created" composite index is the best answer. Let's take a look at the index structure of InnoDB. In InnoDB, the index structure has a special place: the non-primary key index will additionally save the value of the corresponding primary key on the leaf node of its BTree. One of the most direct benefits of this is the Covering Index. Instead of going to the data file to get the value of id, you can get it directly in the index.

If our database engine is MyISAM, then building a "category_id, created" composite index is not the best answer. Because in the index structure of MyISAM, the non-primary key index does not additionally store the value of the corresponding primary key. If you want to use the Covering Index at this time, you should establish a "category_id, created, id" composite index.

After chatting, you should understand what I mean. I hope that everyone can think more comprehensively when considering indexes in the future. There are many similar problems in practical applications. For example, most people do not use Cardinality when building indexes (SHOW INDEX FROM ... can see this parameter) From the perspective of whether it is suitable or not, Cardinality represents the number of unique values. Generally speaking, if the proportion of unique values ​​in the total number of rows is less than 20%, it can be considered that Cardinality is too small. At this time, the index is not only slow. In addition to the speed of insert/update/delete, it will not have much effect on select; another detail is that the impact of the character set is not considered when creating the index, such as the username field, if only English, underscore and other symbols are allowed , then do not use character sets such as gbk and utf-8, but should use simple character sets such as latin1 or ascii, the index file will be much smaller, and the speed will naturally be much faster. These details require readers to pay more attention to themselves, so I won't go into details.
Excerpted from http://blog.chinaunix.net/uid-7692530-id-2567605.html
For a statement with two and join conditions, and the correlation between the two columns is low, the multi col index has certain advantages.
For a statement with two and join conditions, and the correlation between the two columns is high, the multi col index has a great advantage.
For a statement with two or join conditions, isolate col index has certain advantages, because in this case, multi col index will lead to a full table scan, and the former can use the optimization of index merge.

Guess you like

Origin http://10.200.1.11:23101/article/api/json?id=326928085&siteId=291194637