MySQL index (clustered index and non-clustered index)

1. Prerequisite

MySQL index has always been a frequent visitor in interviews. When it comes to indexing, many people can answer. I know Hash index and B+ tree index. Hash is relatively simple, so B+ tree index is a bit more complicated.

When creating the table, we can see that the engines include MyISAM, InnoDB, etc. These two are what we often talk about. The engine defaults to InnoDB from version 5.5 and later, and it is also said that the default is from version 5.1. It's InnoDB, but don't worry about this. Anyway, the default engine of mysql used now is basically InnoDB. Our article is based on InnoDB.
Insert picture description here

2. Define the concept

In a relational database, an index is a single, physical storage structure that sorts the values ​​of one or more columns in a database table. It is a collection of one or more column values ​​in a table and a corresponding pointing table A list of logical pointers in the data page that physically identify these values . The role of the index is equivalent to the catalog of books , you can quickly find the content you need according to the page number in the catalog .

The index provides pointers to the data values ​​stored in the specified column of the table, and then sorts these pointers according to the sort order you specify. The database uses the index to find a specific value, and then moves forward to find the row that contains that value. In this way, SQL statements corresponding to the table can be executed faster, and specific information in the database table can be quickly accessed.

When there are a large number of records in the table, if you want to query the table, the first way to search for information is the full table search, which is to take out all the records one by one, compare them with the query conditions one by one, and then return the records that meet the conditions. Doing this will consume a lot of database system time and cause a lot of disk I/O operations; the second is to create an index in the table, and then find the index value that meets the query conditions in the index, and finally pass the ROWID stored in the index (equivalent to Page number) Quickly find the corresponding record in the table.

3. Index type

According to the function of the database , four indexes can be created in the database designer : single-column index, unique index , primary key index and clustered index .

Primary key indexes, unique indexes, etc., are relatively simple, so I won’t introduce them here. Mainly introduce the clustered index.

4. Clustered index and non-clustered index

Define the concept

Clustered index is also called clustered index (Clustered Index), clustered index, clustered index. Similarly, non-clustered indexes are also called non-clustered indexes, non-clustered indexes, and non-clustered indexes.

A clustered index means that the physical order of the data in the database table rows is the same as the logical (index) order of the key values. A table can only have one clustered index, because there is only one case for the physical order of a table, so there can only be one corresponding clustered index. If an index is not a clustered index, the physical order of the rows in the table does not match the index order. Compared with a non-clustered index, a clustered index has a faster retrieval speed.

In InnoDB, the data in MySQL is stored in the order of the primary key. Then the clustered index is to construct a B+ tree according to the primary key of each table. The B+ tree is improved on the basis of the B tree. If you don’t, you can read the previous article. The leaf nodes store the row data of the entire table. Since the data in the table can only be sorted according to a B+ tree, a table can only have one clustered index.

Introduce a question: What if the table does not have a primary key? (Usually a primary key will be built)

The answer is, if there is no primary key, use the following rules to build a clustered index

  • When there is no primary key, a unique and non-empty index column will be used as the primary key to become the clustered index of this table.
  • If there is no such index, InnoDB will implicitly define a primary key as a clustered index.

ps: **What is the difference between auto-incrementing primary key and uuid as primary key? **Because the primary key uses a clustered index, if the primary key is an auto-increment id, the corresponding data must also be stored adjacently on the disk, and the write performance is relatively high. If it is in the form of uuid, frequent inserts will cause InnoDB to move disk blocks frequently, and the write performance will be relatively low.

Examples to explain the clustered index

First, create a table user as follows: pId is the primary key, with fields name and birthday.

img

Here we are talking about the InnoDB engine, the primary key index is also a clustered index, and the underlying structure is a B+ tree.

The following picture is a simplified picture. If you understand the structure of the B+ tree, it will be easy to understand.

20210115133550330

The illustration is very clear, the upper part is the B+ tree formed by the primary key, and the lower part is the actual data stored on the disk.

When we execute the following statement:

select * from user where pId = 11

Then the execution process is as follows:

Insert picture description here
As shown in the figure above, starting from the root, after 3 searches, the real data can be found. If you don't use an index, you have to scan line by line on the disk until you find the data location. Obviously, using the index will be very fast. However, when writing data, the structure of this B+ tree needs to be maintained, so the writing performance will decrease!

Next, let's understand the non-clustered index and add a non-clustered index to the name field. To add an index by executing the following command, you can also use a visualization tool to add an index.

CREATE INDEX NAME ON test(NAME);

Let's take a look at what the structure picture looks like after adding an index to name?
Insert picture description here
From the above figure, we can see that a new B+ tree is generated according to the index name. Every time we add an index, it will increase the size of the table and take up disk storage space. And there is a little change here. The leaf nodes and non-clustered index leaf nodes are not real data. Its leaf nodes are still index nodes, storing the value of the index field and the corresponding primary key index (clustered index).

If we execute the following statement

select * from user where name = 'lisi'

Now that we have added a non-clustered index, how to query this SQL statement?

Let's take another picture: From the
Insert picture description here
above figure, we can see that we start from the non-clustered index tree and find the leaf node named lisi. According to the primary key pId of lisi, go to the clustered index B+ tree to find it, and you can get it To the row data corresponding to pId.

Look at the following statement again. When querying, the desired name value is found on the non-clustered index, and then it will not be searched on the clustered index. higher efficiency.

select name from user where name = 'lisi'

to sum up

Through the above example, we can see that adding one more index will generate one more non-clustered index tree. Therefore, the index cannot be added randomly. When doing the insert operation, you need to maintain the changes of these trees at the same time! If there are too many indexes, insert performance will decrease!

5. The advantages and disadvantages of indexes

advantage:

1. Greatly speed up the retrieval of data;

2. Create a unique index to ensure the uniqueness of each row of data in the database table;

3. The connection between the accelerometer and the table;

4. When using grouping and sorting clauses for data retrieval, the time for grouping and sorting in the query can be significantly reduced.

Disadvantages:

1. The index needs to take up physical space.

2. When adding, deleting and modifying the data in the table, the index should also be dynamically maintained, which reduces the speed of data maintenance.

The picture is from https://zhuanlan.zhihu.com/p/62018452

Guess you like

Origin blog.csdn.net/hello_cmy/article/details/112673026