High performance index of mysql

High performance index of mysql

When the amount of db reaches a certain order of magnitude, the efficiency of each full table scan will be very low, so a common solution is to establish some necessary indexes as an optimization method, then the problem comes:
  • So what is an index?
  • How does an index work?
  • What is the difference between what we usually call a clustered index and a non-clustered index?
  • How to create and use indexes?

I. Introduction to the Index

MySQL's official definition of an index is that an index is a data structure that helps MySQL efficiently obtain data. In short, an index is a data structure

1. Several tree structures

a. B+ tree

Simply speaking, it is a balanced binary tree designed for disks or other storage devices. In B+tree, all records are stored on leaf nodes according to the size of the key, and each leaf node is directly connected by pointers.

b. Binary tree

The rule of binary tree is that the parent node is greater than the left child node and less than the right child node

c. Balanced binary tree

The first is a binary tree, but the height difference between the left and right child nodes of any node is required to be no greater than 1

d. B-tree

The first is a balanced binary tree, but it requires the same distance from each leaf node to the root node
So what is the difference between a B tree and a B+ tree?
  • A leaf node of a B+ tree can contain a pointer to another leaf node
  • The copy of the B+ tree key value exists in the non-leaf node; the key value + record is stored in the leaf node

2. B+ tree of InnoDB engine

The B+ tree used by mysql's InnnoDB engine, only the leaf nodes store the corresponding data columns, has the following advantages
  • Leaf nodes usually contain more records and have high fan-out (it can be understood that each node corresponds to more lower-level nodes), so the height of the tree is low (3~4), and the height of the tree also determines The number of disk IO is affected, which affects the performance of the database. In general, the number of IOs is consistent with the height of the tree
  • For the combined index, the B+tree index is sorted according to the index column name (from left to right), so random IO can be converted into sequential IO to improve IO efficiency; and it can support sorting requirements such as order by \group; Suitable for range queries

3. hash index

Compared with the B-tree, the hash index does not need to traverse from the root node to the leaf node. It can locate the position at one time, and the query efficiency is higher, but the disadvantages are also obvious.
  • Only "=","IN" and "<=>" queries can be satisfied, and range queries cannot be used
    • Because the calculation is performed by the hash value, it can only be queried accurately. The hash value has no regularity, and the order cannot be guaranteed to be the same as the original, so the range query cannot be used.
  • Can't sort
    • The reason is the same as above
  • Partial indexes are not supported
    • The calculation of the hash value is based on several complete index columns. If one or more of them are missing, the hash value cannot be calculated.
  • hash collision

4. Clustered and non-clustered indexes

a. Clustered Index

The data file of InnoDB itself is the index file, the data on the leaf node of B+Tree is the data itself, the key is the primary key, the non-leaf nodes store <key, address>, and the address is the address of the next layer
Structure diagram of clustered index:

b. Non-clustered index

For a non-clustered index, the data on the leaf node is the primary key (that is, the primary key of the clustered index, so the key of the clustered index cannot be too long). Why is the primary key stored instead of the address where the record is located? The reason is quite simple, because the address where the record is located does not guarantee that it will not change, but the primary key can guarantee
Non-clustered index structure diagram:
From the structure of the non-clustered index, we can see the positioning process in this scenario:
  • First, locate the corresponding leaf node through the non-clustered index and find the corresponding primary key
  • According to the primary key found above, in the clustered index, locate the corresponding leaf node (get data)

5. Advantages of Indexing

  • Avoid full table scan (when the index cannot be reached, it can only be matched one by one; if the index is used, it can be located according to the B tree)
  • Using an index can help the server avoid sorting or temporary tables (pointers on leaf nodes can effectively support range queries; in addition, the leaf nodes themselves are sorted by key)
  • Indexing turns random IO into sequential IO

6. Scope of application

Indexes are not suitable for every situation. Suitable for medium and large tables. More efficient full table scans for small tables. And for very large tables, consider "partitioning" techniques

II. Principles of the use of the index

Generally, when we create a table, we need to specify the primary key, so that the clustered index can be determined, so how to add a non-clustered index?

1. Several syntaxes of index

create index

-- create index create index `idx_img` on newuser(`img`);
 
-- View show create table newuser\G;
output
show create table newuser\G
*************************** 1. row ***************************
Table: newuser
Create Table: CREATE TABLE `newuser` (
`userId` bigint(20) NOT NULL AUTO_INCREMENT COMMENT '用户id',
`username` varchar(30) DEFAULT '' COMMENT 'User login name',
`nickname` varchar(30) NOT NULL DEFAULT '' COMMENT 'User nickname',
`password` varchar(50) DEFAULT '' COMMENT 'User login password & ciphertext root',
`address` text COMMENT 'User address',
`email` varchar(50) NOT NULL DEFAULT '' COMMENT 'User mailbox',
`phone` bigint(20) NOT NULL DEFAULT '0' COMMENT 'User phone number',
`img` varchar(100) DEFAULT '' COMMENT 'User avatar',
`extra` text,
`isDeleted` tinyint(1) unsigned NOT NULL DEFAULT '0',
`created` int(11) NOT NULL,
`updated` int(11) NOT NULL,
PRIMARY KEY (`userId`),
KEY `idx_username` (`username`),
KEY `idx_nickname` (`nickname`),
KEY `idx_email` (`email`),
KEY `idx_phone` (`phone`),
KEY `idx_img` (`img`)
) ENGINE=InnoDB AUTO_INCREMENT=3 DEFAULT CHARSET=utf8

Another common way to add indexes

alter table newuser add index `idx_extra_img`(`isDeleted`, `img`);
 
-- View index show index from newuser;
output result
+---------+------------+---------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+---------+------------+---------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| newuser | 0 | PRIMARY | 1 | userId | A | 3 | NULL | NULL | | BTREE | | || newuser | 1 | idx_username | 1 | username | A | 3 | NULL | NULL | YES | BTREE | | || newuser | 1 | idx_nickname | 1 | nickname | A | 3 | NULL | NULL | | BTREE | | || newuser | 1 | idx_email | 1 | email | A | 3 | NULL | NULL | | BTREE | | || newuser | 1 | idx_phone | 1 | phone | A | 3 | NULL | NULL | | BTREE | | || newuser | 1 | idx_img | 1 | img | A | 3 | NULL | NULL | YES | BTREE | | || newuser | 1 | idx_extra_img | 1 | isDeleted | A | 3 | NULL | NULL | | BTREE | | || newuser | 1 | idx_extra_img | 2 | img | A | 3 | NULL | NULL | YES | BTREE | | |
+---------+------------+---------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+

drop index

drop index `idx_extra_img` on newuser;
drop index `idx_img` on newuser;
 
-- View index show index from newuser;
output
show index from newuser;
+---------+------------+--------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+---------+------------+--------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| newuser | 0 | PRIMARY | 1 | userId | A | 3 | NULL | NULL | | BTREE | | || newuser | 1 | idx_username | 1 | username | A | 3 | NULL | NULL | YES | BTREE | | || newuser | 1 | idx_nickname | 1 | nickname | A | 3 | NULL | NULL | | BTREE | | || newuser | 1 | idx_email | 1 | email | A | 3 | NULL | NULL | | BTREE | | || newuser | 1 | idx_phone | 1 | phone | A | 3 | NULL | NULL | | BTREE | | |
+---------+------------+--------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+

A way to force the index

Syntax: select * from table force index (index) where xxx
explain select * from newuser force index(PRIMARY) where userId not in (3, 2, 5);
-- +----+-------------+---------+-------+---------------+---------+---------+------+------+-------------+-- | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |-- +----+-------------+---------+-------+---------------+---------+---------+------+------+-------------+-- | 1 | SIMPLE | newuser | range | PRIMARY | PRIMARY | 8 | NULL | 4 | Using where |-- +----+-------------+---------+-------+---------------+---------+---------+------+------+-------------+
 
 
explain select * from newuser where userId not in (3, 2, 5);
-- +----+-------------+---------+------+---------------+------+---------+------+------+-------------+-- | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |-- +----+-------------+---------+------+---------------+------+---------+------+------+-------------+-- | 1 | SIMPLE | newuser | ALL | PRIMARY | NULL | NULL | NULL | 3 | Using where |-- +----+-------------+---------+------+---------------+------+---------+------+------+-------------+

2. Index usage rules

When there are multiple indexes in a table, how do you judge whether your sql has reached the index, and which index is it?
The explain keyword can be used to assist judgment. Of course, when actually writing sql, we also need to understand the rules of index matching, so as to avoid setting some redundant indexes, or writing some sql that cannot reach the index.
The table structure of the test is as follows
*************************** 1. row ***************************
Table: newuser
Create Table: CREATE TABLE `newuser` (
`userId` bigint(20) NOT NULL AUTO_INCREMENT COMMENT '用户id',
`username` varchar(30) DEFAULT '' COMMENT 'User login name',
`nickname` varchar(30) NOT NULL DEFAULT '' COMMENT 'User nickname',
`password` varchar(50) DEFAULT '' COMMENT 'User login password & ciphertext root',
`address` text COMMENT 'User address',
`email` varchar(50) NOT NULL DEFAULT '' COMMENT 'User mailbox',
`phone` bigint(20) NOT NULL DEFAULT '0' COMMENT 'User phone number',
`img` varchar(100) DEFAULT '' COMMENT 'User avatar',
`extra` text,
`isDeleted` tinyint(1) unsigned NOT NULL DEFAULT '0',
`created` int(11) NOT NULL,
`updated` int(11) NOT NULL,
PRIMARY KEY (`userId`),
KEY `idx_username` (`username`),
KEY `idx_nickname_email_phone` (`nickname`,`email`,`phone`)
) ENGINE=InnoDB AUTO_INCREMENT=3 DEFAULT CHARSET=utf8

a. The leftmost prefix matching principle

This is mainly for multi-column non-clustered indexes, such as the following index idx_nickname_email_phone(nickname, email, phone) , nickname is defined in front of email, then the following statements correspond to
-- go to the index explain select * from newuser where nickname='little grey grey' and email=' [email protected] ';
 
-- 1. To match nickname, you can go to the index explain select * from newuser where nickname='little gray gray';
 
-- 输出:-- +----+-------------+---------+------+--------------------+--------------------+---------+-------+------+-----------------------+-- | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |-- +----+-------------+---------+------+--------------------+--------------------+---------+-------+------+-----------------------+-- | 1 | SIMPLE | newuser | ref | idx_nickname_email | idx_nickname_email | 92 | const | 1 | Using index condition |-- +----+-------------+---------+------+--------------------+--------------------+---------+-------+------+-----------------------+
 
 
-- 2. Although the email is matched, the leftmost match is not satisfied, and the index is not followed explain select * from newuser where email=' [email protected] ';
 
-- 输出-- +----+-------------+---------+------+---------------+------+---------+------+------+-------------+-- | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |-- +----+-------------+---------+------+---------------+------+---------+------+------+-------------+-- | 1 | SIMPLE | newuser | ALL | NULL | NULL | NULL | NULL | 3 | Using where |-- +----+-------------+---------+------+---------------+------+---------+------+------+-------------+

b. Cannot skip a column and use subsequent index columns

That is, for the index idx_nickname_email_phone(nickname, email, phone) , if your sql contains only nickname and phone, then phone cannot go to the index, because the middle email cannot be skipped to go to the index

c. The column after the range query cannot use the index

Such as >, <, between, like is a range query. In the following sql, neither email nor phone can go to the index, because the nickname uses a range query
select * from newuser where nickname like '小灰%' and email=' [email protected] ' and phone=15971112301 limit 10;

d. Columns as function arguments or as part of expressions

-- Can't get to the index explain select * from newuser where userId+1=2 limit 1;
 
 
-- 输出-- +----+-------------+---------+------+---------------+------+---------+------+------+-------------+-- | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |-- +----+-------------+---------+------+---------------+------+---------+------+------+-------------+-- | 1 | SIMPLE | newuser | ALL | NULL | NULL | NULL | NULL | 3 | Using where |-- +----+-------------+---------+------+---------------+------+---------+------+------+-------------+

3. Indexing Disadvantages

  • Although the index greatly improves the query speed, it will reduce the speed of updating the table, such as INSERT, UPDATE and DELETE of the table. Because when updating the table, MySQL not only saves the data, but also saves the index file.
  • Indexing an index file that consumes disk space. In general, this problem is not serious, but if you create multiple composite indexes on a large table, the index file will swell very quickly.

4. Precautions

  • Index will not contain columns with NULL values
  • use short index
  • Index column ordering
    • MySQL queries only use one index, so if an index is already used in the where clause, the columns in the order by will not use the index. Therefore, do not use the sorting operation when the default sorting of the database can meet the requirements; try not to include sorting of multiple columns, if necessary, it is best to create composite indexes for these columns
  • like statement operation
    • In general, the use of the like operation is discouraged. If it must be used, how to use it is also a problem. like "%aaa%" will not use the index while like "aaa%" will use the index
  • Do not operate on columns
    • select * from users where YEAR(adddate)<2007;
  • Try not to use NOT IN and <> operations

5. sql usage strategy

a. Use one sql instead of multiple sqls

It is generally recommended to use one sql instead of multiple sql queries
Of course, if the execution efficiency of sql is very low, or if there are operations such as delete that cause table locks, you can also use multiple sqls to avoid blocking other sqls

b. Decomposing associated queries

Put the association join as much as possible in the application, and try to execute small and simple sql
  • The decomposed sql is simple, which is conducive to the use of mysql cache
  • Execute the decomposed sql to reduce lock competition
  • Better scalability and maintainability (sql simple)
  • The associated sql uses the nested loop algorithm, and the application can use structures such as hashmap to process data, which is more efficient

c. count

  • count(*) counts the number of rows
  • count(column name) counts the number of columns that are not null

d. limit

  • limit offset, size; paging query, offset + size pieces of data will be queried, and the last size pieces of data will be obtained
For example, limit 1000, 20 will query 1020 pieces of data that meet the conditions, and then return the last 20, so try to avoid large page-turning queries

e. union

The limitations of where, order by, and limit need to be put into each subquery to improve efficiency by re-segmentation. In addition, if not necessary, try to use Union all, because union will add distinct to the temporary table of each subquery, and check the uniqueness of each temporary table, which is inefficient.

6. mysql query

a. View the index

-- 单位为GBSELECT CONCAT(ROUND(SUM(index_length)/(1024*1024*1024), 6), ' GB') AS 'Total Index Size'FROM information_schema.TABLES WHERE table_schema LIKE 'databaseName';

b. View the tablespace

SELECT CONCAT(ROUND(SUM(data_length)/(1024*1024*1024), 6), ' GB') AS 'Total Data Size'
FROM information_schema.TABLES WHERE table_schema LIKE 'databaseName';

c. View information about all tables in the database

SELECT CONCAT(table_schema,'.',table_name) AS 'Table Name',
table_rows AS 'Number of Rows',
CONCAT(ROUND(data_length/(1024*1024*1024),6),' G') AS 'Data Size',
CONCAT(ROUND(index_length/(1024*1024*1024),6),' G') AS 'Index Size' ,
CONCAT(ROUND((data_length+index_length)/(1024*1024*1024),6),' G') AS'Total'
FROM information_schema.TABLES
WHERE table_schema LIKE 'databaseName';
 

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325977459&siteId=291194637