[MySQL] (13) Talking about MySQL Index Optimization Analysis

Written in front : I am "Yun Qi", a big data developer who loves technology and can write poetry. The nickname comes from a line in Wang Anshi's poem [ 云之祁祁,或雨于渊 ], which I really like.


On the one hand, blogging is to summarize and record a little bit of what you have learned. On the other hand, it is to help more friends who are interested in big data. If you are also 数据中台、数据建模、数据分析以及Flink/Spark/Hadoop/数仓开发interested, you can focus on my dynamic https://blog.csdn.net/BeiisBei , let us tap the value of the data -


每天都要进步一点点,生命不是要超越别人,而是要超越自己! (ง •_•)ง

One, the concept of index

1.1 What is

MySQL's official definition of index is: Index is a data structure that helps MySQL obtain data efficiently. You can get the essence of the index: Index is a data structure. It can be simply understood as the row quickly find the good order of the data structure .

In addition to data, the database system also maintains data structures that meet specific search algorithms. These data structures reference (point to) data in a certain way, so that advanced search algorithms can be implemented on these data structures. This data structure is the index. The following figure is an example of a possible indexing method:

Insert picture description here
On the left is the data table, there are two columns of seven records, and the leftmost is the physical address of the data record. In order to speed up the search of Col2, a binary search tree as shown on the right can be maintained. Each node contains an index key value and a pointer to the physical address of the corresponding data record, so that binary search can be used within a certain complexity Obtain the corresponding data, so as to quickly retrieve the records that meet the conditions.

Generally speaking, the index itself is also very large, and it is impossible to store all of it in memory. Therefore, the index is often stored on a disk in the form of an index file.

1.2 Advantages and disadvantages

Advantage:

  • Improve the efficiency of data retrieval and reduce the IO cost of the database.
  • Sorting data by index column reduces the cost of data sorting and reduces CPU consumption.

Disadvantages:

  • Although the index greatly improves the query speed, it will reduce the speed of updating the table, such as INSERT, UPDATE and DELETE on the table. Because when updating a table, MySQL not only saves the data, but also saves the index file every time the index file is updated to add the index column field, it will adjust the index information after the key value changes brought about by the update.
  • In fact, the index is also a table, which stores the primary key and index fields, and points to the records of the entity table, so the index column also takes up space.

Two, MySQL index

2.1 Btree index

MySQL uses Btree indexes.

Insert picture description here

[Introduction to Initialization]

For a B-tree, the light blue block is called a disk block. You can see that each disk block contains several data items (shown in dark blue) and pointers (shown in yellow).

For example, disk block 1 contains data items 17 and 35, and contains pointers P1, P2, P3,

P1 indicates disk blocks smaller than 17, P2 indicates disk blocks between 17 and 35, and P3 indicates disk blocks larger than 35.

The real data exists in leaf nodes, namely 3, 5, 9, 10, 13, 15, 28, 29, 36, 60, 75, 79, 90, 99.

Non-leaf nodes only store real data, only data items that guide the search direction. For example, 17 and 35 do not really exist in the data table.

[Search process]

If you want to find data item 29, first load disk block 1 from the disk to the memory. At this time, an IO occurs. Use a binary search to determine that 29 is between 17 and 35 in the memory. Lock the P2 pointer of disk block 1. The time is very short (compared to the disk IO) and can be ignored. The disk block 3 is loaded from the disk to the memory through the disk address of the P2 pointer of disk block 1, and the second IO occurs. 29 is between 26 and 30, locked The P2 pointer of disk block 3 loads disk block 8 into the memory through the pointer, and the third IO occurs. At the same time, a binary search is performed in the memory to find 29, and the query ends, and a total of three IOs.

The real situation is that a 3-level B+ tree can represent millions of data. If millions of data searches only require three IOs, the performance improvement will be huge. If there is no index, each data item will have one IO , Then a total of millions of IOs are required, which is obviously very expensive.

2.2 B+tree index

Insert picture description here

The difference between B+Tree and B-Tree

1) The keywords and records of the B-tree are put together. The leaf nodes can be regarded as external nodes and do not contain any information; the non-leaf nodes of the B+ tree only have the keywords and the index to the next node, and the records are only put In the leaf node.

2) In the B-tree, the closer the record to the root node, the faster the search time, as long as the keyword is found, the existence of the record can be determined; while the search time of each record in the B+ tree is basically the same, and it needs to start from the root node Go to the leaf node, and compare the keywords in the leaf node.

From this perspective, the performance of the B-tree seems to be better than the B+ tree, but in practical applications the performance of the B+ tree is better. Because the non-leaf nodes of the B+ tree do not store actual data, the number of elements that each node can hold is more than that of the B-tree, and the tree height is smaller than that of the B-tree. This has the advantage of reducing the number of disk accesses.

Although the B+ tree requires more comparisons to find a record than the B-tree, the time of one disk access is equivalent to the time of hundreds or thousands of memory comparisons. Therefore, the performance of the B+ tree in practice may be better, and B+ The leaf nodes of the tree are connected together using pointers to facilitate sequential traversal (for example, viewing all files in a directory, all records in a table, etc.). This is why many databases and file systems use B+ trees.

Thinking: Why is B+ tree more suitable for file index and database index of operating system in practical application than B-tree?

1) The disk read and write cost of B+ tree is lower

The internal node of the B+ tree does not have a pointer to the specific information of the keyword. Therefore, its internal nodes are smaller than the B-tree. If all the keywords of the same internal node are stored in the same disk block, the more keywords the disk block can hold. The more keywords that need to be searched are read into the memory at one time. Relatively speaking, the number of IO reads and writes is reduced.

2) The query efficiency of B+ tree is more stable

Because the non-terminal point is not the node that ultimately points to the content of the file, but only the index of the keyword in the leaf node. Therefore, any keyword search must take a path from the root node to the leaf node. The path length of all keyword queries is the same, resulting in the same query efficiency for each data.

2.3 Clustered index and non-clustered index

Clustered index is not a separate index type, but a data storage method. The term "clustering" means: data rows and adjacent key value clusters are stored together. As shown in the figure below, the index on the left is a clustered index, because the arrangement of data rows on the disk is consistent with the index order.

Insert picture description here

Benefits of clustered index: According to the order of clustered index, when querying and displaying a certain range of data, because the data is closely connected, the database does not need to extract data from multiple data blocks, so it saves a lot of io operations.

Limitations of clustered indexes: For mysql database, only the innodb data engine currently supports clustered indexes, while Myisam does not support clustered indexes. Since there can only be one sort of physical data storage, each Mysql table can only have one clustered index. Under normal circumstances, it is the primary key of the table.

In order to make full use of the clustering characteristics of the clustered index, the primary key column of the innodb table should be used as far as possible to use ordered sequential id, and it is not recommended to use unordered id, such as uuid.

2.4 Time complexity (extended)

The same problem can be solved by different algorithms, and the quality of an algorithm will affect the efficiency of the algorithm and even the program. The purpose of algorithm analysis is to select suitable algorithms and improve algorithms.
Time complexity refers to the amount of computational work required to execute an algorithm, which is represented by big O as: O(...)
Insert picture description here

Three, MySQL index classification

3.1 Single value index

Concept: An index contains only a single column, and a table can have multiple single-column indexes.
Syntax:

//所表一起创建:

CREATE TABLE customer (
	id INT(10) UNSIGNED AUTO_INCREMENT ,
	customer_no VARCHAR(200),
	customer_name VARCHAR(200), 
	PRIMARY KEY(id), 
	KEY (customer_name)  // Σ(っ °Д °;)っ
);

//单独建单值索引:
CREATE INDEX idx_customer_name ON

3.2 Unique index

Concept: The value of the index column must be unique, but null values ​​are allowed

//随表一起创建: 
CREATE TABLE customer (
	id INT(10) UNSIGNED AUTO_INCREMENT ,
	customer_no VARCHAR(200),
	customer_name VARCHAR(200), 
	PRIMARY KEY(id), 
	KEY (customer_name), 
	UNIQUE (customer_no)   // Σ(っ °Д °;)っ
);

//单独建唯一索引: 
CREATE UNIQUE INDEX idx_customer_no ON customer(customer_no);

3.3 Primary key index

Concept: After setting the primary key, the database will automatically create an index, and innodb is a clustered index

//随表一起建索引
CREATE TABLE customer (
	id INT(10) UNSIGNED AUTO_INCREMENT ,
	customer_no VARCHAR(200),
	customer_name VARCHAR(200), 
	PRIMARY KEY(id)   // Σ(っ °Д °;)っ
);

//单独建主键索引:
ALTER TABLE customer add PRIMARY KEY customer(customer_no);

//删除建主键索引:
ALTER TABLE customer drop PRIMARY KEY ;

//修改建主键索引:
必须先删除掉(drop)原索引,再新建(add)索引

3.4 compound index

Concept: that is, an index contains multiple columns

随表一起建索引: 
CREATE TABLE customer (
	id INT(10) UNSIGNED AUTO_INCREMENT ,
	customer_no VARCHAR(200),
	customer_name VARCHAR(200), 
	PRIMARY KEY(id), 
	KEY (customer_name), 
	UNIQUE (customer_name), 
	KEY (customer_no,customer_name)  // Σ(っ °Д °;)っ
);

单独建索引: 
CREATE INDEX idx_no_name ON customer(customer_no,customer_name);

3.5 Basic syntax

operating command
create CREATE [UNIQUE ] INDEX [indexName] ON table_name(column))
delete DROP INDEX [indexName] ON mytable;
View SHOW INDEX FROM table_name\G
Use Alter command ALTER TABLE tbl_name ADD PRIMARY KEY (column_list): This statement adds a primary key, which means that the index value must be unique and cannot be NULL.
ALTER TABLE tbl_name ADD PRIMARY KEY (column_list)
ALTER TABLE tbl_name ADD INDEX index_name (column_list): Add a common index, the index value can appear multiple times.
ALTER TABLE tbl_name ADD FULLTEXT index_name (column_list): This statement specifies that the index is FULLTEXT for full-text indexing.

Fourth, the timing of index creation

4.1 Situations suitable for index creation

  • The primary key automatically creates a unique index
  • Fields frequently used as query conditions should be indexed
  • Query the fields associated with other tables in the query, and create indexes for foreign key relationships
  • Single key/combined index selection problem, combined index is more cost-effective
  • The sorted field in the query, if the sorted field is accessed through the index, the sorting speed will be greatly improved
  • Statistics or grouping fields in the query

4.2 Not suitable for creating indexes

  • Too few table records
  • Frequently add, delete and modify tables or fields
  • Fields not used in the Where condition are not indexed
  • Poor filtering is not suitable for indexing

Guess you like

Origin blog.csdn.net/BeiisBei/article/details/108524474