Mysql topic III: B + tree index combat

First, the price index

1, the cost of space on
a corresponding index are a B + tree, the tree each node is a data page, a page will take the default 16KB of storage space, so an index will also take up disk space.

2, the cost of time
index for data is ordered, then when the data tables to add, delete, change operation, the need to maintain the content B + tree index involved. So during the add, delete, may require additional time to move some records, the page split, recycling and other operations when the page change operations to maintain good order.

Two, B + tree index combat

; Because it is based on actual data front, see the topic in front of the
scenario for the joint index bcd, bcd case based on the following joint index

  • 1, all the values ​​match
select * from t1 where b = 1 and c = 1 and d = 1;

The query optimizer analyzes the query and the order of columns in the index that can be used to decide which query to use.

  • 2, matching the left column
    of the most left-front principle
select * from t1 where b = 1;
select * from t1 where b = 1 and c = 1;

The following is less than the index of sql

select * from t1 where c = 1;

Because the B + tree in accordance with the first column of the sorted values ​​b, c was used at the same value b of the case where the column to sort the column, that is a value different from the record column b c it may be disordered. And now you skip directly to find b column based on the value of c, which is impossible.

  • 3, column prefix matching
    a string is given only if the middle or suffix, such as:
select * from t1 where b like '%101%';

This is less than the index, the middle of the string because the string '101' is not sorted, so that only the full table scan. Sometimes we have some matches some string suffix needs, say, a table has a url column, which stores many url:

www.baidu.com 
www.google.com 
www.qq.com

Assumed to have been the url column index is created, if we want to query the URL com suffix, then you can write the query: WHERE url LIKE '% com' , but then can not use an index the url column.
To use this index in the query and not to a full table scan, we can rewrite the query suffix prefix inquiry, but we have to put all the data in the table storing it in reverse order, which means that we can save the url column data:

moc.udiab.www 
moc.elgoog.www
moc.qq.www
  • 4, matching the range of values
select * from t1 where b > 1 and b < 20000;

Since the B + tree data pages and records are sorted first by b column, so we actually top of the query process is as follows:

找到b值为1的记录。
找到b值为20000的记录。
由于所有记录都是由链表连起来的(记录之间用单链表,数据页之间用双链表),所以他们之间的记录都可以很容易的取出来
找到这些记录的主键值,再到聚簇索引中回表查找完整的记录。

But note that when using the joint range looking for, if multiple columns, then find the range at the same time, only the left-most column of the index that will be used to find the time range B + tree index, such as:

select * from t1 where b > 1 and c > 1;

This query can be divided into the upper two parts:
1) 1 b to perform a range of conditions by b>, there may be multiple lookup results under different b value records;
2) different recording these values continue b c> 1 continues filter.
This way for joint index, it can only be used part of the column b, and c can not use part of the column, because the value of c can be sorted by column when only b values are the same, and this query by b when might not be sorted according to c record column to find the range, it continues to look for in your search criteria c column is less than the B + tree index.

  • 5, the range of an exact match and matching a column an ​​additional

For the same joint index, it can only be used although multiple columns have carried out the scope of the index to find the leftmost column, but if you look to the left column is accurate, then the right-hand column may be scope to find, say this:

select * from t1 where b = 1 and c > 1;
  • 6, sorting
select * from t1 order by b, c, d;

The query result set b values ​​are sorted according to the need, if the b value of the recording is the same, it is necessary to sort according to c, if the same value c, according to the need to sort d. Because the B + tree index itself is sorted according to the above rules, the data is extracted directly from the index, and then taken back to the operating table column the index does not contain enough.

  • 7, group
select b, c, d, count(*) from t1 group by b, c, d;

This query is equivalent to three times the grouping operation made:
1) first records are grouped by the b value, the same value of b for all records into a group.
2) the same value of b in each group recorded and then grouped according to the value of c, the same record title into a value in a packet.
Packet 3) produced in the previous step and then into smaller packets according to the value of d.

Without an index, then this grouping process all need to achieve in memory, and if there is an index of the words, just the packet sequence again, and B + order of the index column of the tree is consistent, it can be used directly B + tree index grouping.

  • 8, using the joint index to sort or group Notes

For combination index there is a problem to be noted that the order of the ORDER BY clause behind the column must be given in the order of the index of the column, if the given order by the order of c, b, d, and that is does not take a B + tree index .
Similarly, order by b, B + tree index order by form b, c of this matching index leftmost column portion may be used. Where the combination index values are constants left column, the column can be used to sort back, like this:

select * from t1 where b = 1 order by c, d;
  • 9, the index can not be used to sort or group of several cases

ASC, DESC mix
scenarios for using sort of joint index, we require the sort order of the sort column is consistent, that is, either the columns are sorted ASC rule or rules are DESC sorting.
ORDER BY clause if the column without DESC or ASC ASC sorted by default ordering rules, i.e. in ascending order.

select * from t1 order by b ASC, c DESC;

This query is less than the index.

Third, how to index

  • 1, consider the selectivity index

Selectivity index (Selectivity), that a unique index value (also called base, Cardinality) ratio of the number of records in the table:

	选择性 = 基数 / 记录数

Selectivity is in the range of (0, 1], the higher the selectivity index greater value. If the selectivity is equal to 1, it means that the records will not be repeated the number of columns and tables are the same value, then the establishment of the column the index is very appropriate, if the selectivity is very small, it represents the value of the column is a lot of repetition, and is not suitable for indexing.

  • 2, consider the prefix index

Instead of using the entire column as an index key column prefix, the prefix length when suitable, such can be done either selectively prefix index column index close to full, and because the index key is shortened to reduce the size of the index file and maintenance overhead.

	使用mysql官网提供的示例数据库:https://dev.mysql.com/doc/employee/en/employees-installation.html
	github地址:https://github.com/datacharmer/test_db.git

employees table has only one index <emp_no>, so if we want to search for a person by name, you can only scan the entire table: no query column index

EXPLAIN SELECT * FROM employees.employees WHERE first_name='Eric' AND last_name='Anido';

It can be indexed <first_name> or <first_name, last_name>, two look selective index:

SELECT count(DISTINCT(first_name))/count(*) AS Selectivity FROM employees.employees; -- 0.0042
SELECT count(DISTINCT(concat(first_name, last_name)))/count(*) AS Selectivity FROM employees.employees; -- 0.9313

<First_name> Obviously selectivity is too low, <first_name, last_name> good selectivity, but first_name and last_name add up to a length of 30, there is no taking into account the length and selective approach? May be considered by indexing the first few characters first_name and last_name, e.g. <first_name, left (last_name, 3)>, the selectivity to see:

SELECT count(DISTINCT(concat(first_name, left(last_name, 3))))/count(*) AS Selectivity FROM employees.employees; -- 0.7879

Selective pretty good, but still a little distance away from 0.9313, the prefix is ​​added to the last_name 4:

SELECT count(DISTINCT(concat(first_name, left(last_name, 4))))/count(*) AS Selectivity FROM employees.employees; -- 0.9007

Very selective over time, while the length of this index is only 18, nearly half shorter than <first_name, last_name>, is a way to build a prefix index:

ALTER TABLE employees.employees ADD INDEX `first_name_last_name4` (first_name, last_name(4));

Prefix index taking into account the size of the index and query speed, but the drawback is can not be used GROUP BY and ORDER BY operations, can not be used to cover the index .

V. Summary

• index of the column type as small as possible
• using the index value of the string prefix
• increment primary keys
• Locate and delete tables duplication and redundancy index
• Use covering index for the query, bring back to the table to avoid performance loss.

More information to me in Luban college information to learn, Luban college is very good, it is worth reported classes

发布了143 篇原创文章 · 获赞 49 · 访问量 25万+

Guess you like

Origin blog.csdn.net/weixin_36586564/article/details/104005379