How to choose a common index and a unique index

series of articles

1. It turns out that a select statement is executed in MySQL like this "Deadly Kick MySQL Series 1"

2. Lifelong friends redo log, binlog "Deadly Kick MySQL Series II"

3. It is difficult for MySQL strongman to "lock" "Deadly Kick MySQL Series III"

4. The love-hate relationship between S lock and X lock "Deadly Kick MySQL Series 4"

foreword

insert image description here

Partners who have read the previous articles will find that they have not talked about the knowledge points about indexes and transactions. These two major points have been written in the previous articles.

Here is a portal for you to click to view directly!

Demystifying MySQL Indexes

When I came up, I asked about MySQL affairs, shivering...

MVCC: I heard that some people are curious about my underlying implementation

Phantom reading: I heard some people think I was killed by MVCC

Next open the world of normal and unique indexes.

1. Understand common indexes and unique indexes

normal index

The basic index type in MySQL has no restrictions, allowing to insert duplicate values ​​and null values ​​in the columns that define the index, purely to query data faster.

unique index

Values ​​in indexed columns must be unique, but null values ​​are allowed.

A primary key index is a special kind of unique index that does not allow nulls.

扩展一下其它两中索引,知识点放在一起记忆会更好

full text index

Full-text indexing can only be used on char, varchar, and text type fields. The requirements are introduced, and what is full-text indexing, that is, in a pile of text, through a certain keyword, etc., you can find the record to which the field belongs. OK, for example, "You are a pretty boy, a pretty girl..." You may be able to find the record through pretty boy.

Spatial index

A spatial index is an index built on a field of spatial data type. There are four types of spatial data in MySQL, GEOMETRY, POINT, LINESTRING, and POLYGON. When creating a spatial index, use the SPATIAL keyword. It is required that the engine is Myisam, and the column that creates the spatial index must be declared as not null.

How to add index

1、 主键索引:alter table table_name add primary key (column)

2. Unique index: alter table table_name add unique (column)

3. Ordinary index: alter table table_name add index index_name (column)

4. Full text index: alter table table_name add fulltext (column)

5、多列索引:alter table table_name add index index_name (column1,column2,column3)

2. Application scenarios

Now that you know the difference between a normal index and a unique index, let's take a look at how to choose between two indexes in some scenarios.

In Mr. Ding's article, a business scenario is mentioned in the citizen system, which uses the ID number to check the name.

Here Kaka also borrows this scene to describe the process through Kaka's ideas.

The execution statement is select name from user where card = '6104301996xxxxxxxx';

The first reaction in this scenario must be to create an index for the card, but what index to create? Primary key indexes are definitely not recommended.

Thinking: Why can't I use the ID number as the primary key index?

3. Why can't you use a too large value as the primary key?

The primary key index structure of the Innodb storage engine is as follows

The general index data structure is as follows

The leaf nodes of the primary key index store the entire row of data corresponding to the primary key.

The leaf nodes of ordinary indexes store the corresponding primary key values.

If the depth of B+Tree read data is three layers, the size of each disk is 16kb.

How much data can non-leaf nodes store in B+Tree? Generally speaking, each table will have a primary key.

According to the calculation of three layers, the first layer and the second layer store the key value, that is, the primary key value.

We all know that the memory occupied by the int type is 4Bytes (bytes), and the storage of the pointer is 6Bytes, a total of 10Tybes, then the first layer of nodes can store 16 * 1000 / 10 = 1600.

Similarly, each node in the second layer can also store 1600 keys.

The third layer is the leaf node. The storage size of each disk is also the same as the calculation of installing BTree, and each piece of data occupies 1kb.

The data that can be stored in three layers in B+Tree is 1600 * 1600 * 16 = 40960000

结论:若主键过大会直接影响索引存储的数据量,所以非常不建议使用过大的数据作为主键索引。

Fourth, from the perspective of query analysis

Assuming that the record of card = 5 is to be checked now, the query process is to start from the root of the tree through the B+ tree, search for the leaf nodes by layer, and then locate the record of card = 5 by dichotomy.

normal index

For ordinary indexes, when the record of card = 5 is found, the search will continue until the first record that does not satisfy card = 5 is encountered.

unique index

For the unique index, it is very simple. The characteristic of the unique index is the uniqueness of the data, so after the record of card = 5 is found, the next record will not be searched.

Does one query of a common index have a big impact on performance?

This effect can almost be ignored. In the previous articles, Kaka popularized a term " 局部性原理".

Both data and programs tend to cluster in groups. After accessing a piece of data, there is a great possibility to access this piece of data and the adjacent data of this piece of data again.

Therefore, MySQL's Innodb storage engine also adopts this locality principle when reading data. The data read each time is 16kb, which is one page.

The default size of each page under the Innodb storage engine is 16kb, this parameter can also be adjusted, the parameter is innodb_page_size.

But there is one situation, although the probability is very low, but it still needs to be known.

When the index is a common index, the data found is exactly the last data of a page, and the data of the next page needs to be read at this time. This operation is a bit complicated, but it can be ignored for the current CPU.

Five, understand the change buffer

First of all, you need to understand a new knowledge point change buffer.

When the record of card = 5 needs to be updated, the data page where this data is located is directly updated in memory. If it is not there, the updated operation needs to be cached in the change buffer. When the next query needs to access this data page, read this data page into memory, and then perform operations related to this page in the change buffer.

Next, learn about another new knowledge point, merge.

When the data in the change buffer is applied to the data page, the process of getting the latest result is called merge. In addition, the merge operation is also performed during the normal shutdown of the database.

Conclusion: The update operation records the record in the change buffer first, which can reduce disk I/O and improve the execution speed of the statement.

Notice

1. Reading data from the change buffer into the memory needs to occupy the buffer pool. Using the change buffer can avoid occupying the memory.

2. The change buffer can also persist data. The change buffer has a copy in memory and is also written to disk.

6. Under what conditions is the change buffer used?

Thinking: Why the unique index does not use the change buffer

The unique index is definitely not used. If you feel a little uncomfortable with this answer, you need to go back to the previous articles and take a good look.

When a row of data is inserted into the unique index, a query operation is performed to determine whether the record already exists in the table and whether the unique constraint is violated. Since the data of the data page must be read into the memory, what is the use of the change buffer!

Therefore, only normal indexes can be used.

In the above, we know that when the change buffer data is read into the memory, the memory of the buffer pool needs to be occupied, so a parameter is also given in MySQL to set the size of the change buffer. It may be a little different from other data units. If it is set to 30, it means that the change buffer only occupies 30% of the buffer pool memory.

Thinking: In what scenarios can the change buffer not be used?

The function of the change buffer is to cache the updated actions, so when merging a data page, the more changes recorded in the change buffer, the greater the benefit.

But not all scenarios are applicable. Kaka is currently developing an account software, and most of them are checked immediately after the update. Does this situation violate the above-mentioned when merging a data page? The more the change buffer records, the greater the benefit.

Therefore, the change buffer can only play a very important role in the scenario of writing more and reading less.

Thinking: Why is it useless to query the change buffer immediately after updating?

After a record initiates an update operation, it is first recorded in the change buffer. Then, when the queried data is on the data page, the merge will be triggered immediately, so that the number of random access IOs will not be reduced, but instead, the maintenance cost of the change buffer will be increased. Therefore, the use of change biffer in this business model will be counterproductive.

Thinking: How to close the change buffer

Just set the parameter innodb_change_buffer_max_size = 0.

Seven, from the perspective of the impact of update statement performance

In the first case, the data page to be updated for this data is in memory.

Unique index: Find whether there is this record in memory, and insert this value if it does not exist.

Ordinary index: You can directly update the value that needs to be updated.

Conclusion: When the data page to be updated is in memory, the unique index is one more judgment than the ordinary index.

In the second case, the data page to be updated by this data is not in memory.

Unique index: You need to read the data page where this data is located into memory, find out whether this record exists, and then update the data.

Ordinary index: record the data to be updated in the change buffer.

Conclusion: change buffer When the updated data is not in the data page, if your index is a normal index, the performance can be significantly improved.

Note: When you change an index from a common index to a unique index, you must pay attention to the impact of the change buffer, which will directly affect the memory hit rate.

8. Summary

Returning to the topic of the article, how to choose a common index and a unique index, there is no difference between the two in terms of query, mainly in the impact of update operations.

If your business is the same as the Ka Ka scene, and you want to query this record immediately after the update, you can choose to close the change buffer directly.

If this is not the case, try to choose a common index, and using the change buffer can significantly improve the update performance.

Persistence in learning, perseverance in writing, perseverance in sharing are the beliefs that Kaka has upheld since her career. I hope the article can bring you a little help on the huge Internet, I am Kaka, see you in the next issue.

Guess you like

Origin blog.csdn.net/fangkang7/article/details/120803132