MySQL process control, index

1. Conditional statement

if conditional statement

delimiter //
CREATE PROCEDURE proc_if ()
BEGIN

    declare i int default 0;
    if i = 1 THEN
        SELECT 1;
    ELSEIF i = 2 THEN
        SELECT 2;
    ELSE
        SELECT 7;
    END IF;

END //
delimiter ;

Second, the loop statement

while loop

delimiter //
CREATE PROCEDURE proc_while ()
BEGIN

    DECLARE num INT ;
    SET num = 0 ;
    WHILE num < 10 DO
        SELECT
            num ;
        SET num = num + 1 ;
    END WHILE ;

END //
delimiter ;
delimiter //
CREATE PROCEDURE proc_repeat ()
BEGIN

    DECLARE i INT ;
    SET i = 0 ;
    repeat
        select i;
        set i = i + 1;
        until i >= 5
    end repeat;

END //
delimiter ;

loop cycle

BEGIN

    declare i int default 0;
    loop_label: loop

        set i=i+1;
        if i<8 then
            iterate loop_label;
        end if;
        if i>=10 then
            leave loop_label;
        end if;
        select i;
    end loop loop_label;

END

3. Index overview

A database index is a data structure that improves the speed of data retrieval operations. Database tables maintain the index data structure at the expense of additional writes and storage space. Indexes are used to quickly locate data without having to search for every row in a database table every time it is accessed. An index can be created using one or more columns of a database table, which provides the basis for fast random lookups and efficient access to ordered records.

1. What is an index

We create an index for data, which is similar to creating a catalog for books to a certain extent. You can imagine, what are the benefits of building a catalog for a book? Before you create a catalog for a book, if you need to find a specific page of the book , you can only flip through page by page. This kind of operation can be called traversal, but it is actually quite unreliable. You may find what you need in the first few pages, but you may also find it in the first few pages The last few pages are found, so the number of queries will become very uncontrollable and the efficiency is very low. But once you create a catalog for the book, your search method will change, and you can query what you need through the catalog content,

So what exactly is an index? An index is a data structure. In layman's terms, it is a way of organizing data. Creating an index for a row of data in a database table is like creating a table of contents for the pages of a book.

2. Why use an index?

Obviously, we create an index for the data to improve the speed of querying the data.

When adding, deleting, or modifying data, if an index is established, an additional key-value pair correspondence needs to be established, so the efficiency of adding, deleting, and modifying after the index is created will be reduced instead.

However, in our daily work, the ratio of reading and writing to the database is about 10:1, and operations of adding, deleting, and modifying are almost all performed in one table, so efficiency problems are rarely encountered, and queries are sometimes very complicated and will Across multiple tables, so the use of indexes can greatly improve our work efficiency in general.

3. How should we treat indexes correctly?

Many people have a misunderstanding. After a software is developed, it is a wrong view to wait until it is found that the software is running slowly before adding an index. Because if a software is found to be running very slowly after it goes online, this "slowness" may not be a long time in the absolute sense, but even if a certain SQL statement is only stuck for 2 to 3 seconds, it will seriously affect the user's performance. experience. Once this problem occurs, if you go to "fix it after a dead sheep" again, you will find that it is not necessarily a problem caused by the database structure. When you exclude other problems one by one and determine that it is caused by SQL statements, it may have passed. a long time. In addition, many people in charge of operation and maintenance are not good at analyzing SQL statements.

So, how should we correctly treat the index?

First, it is best to consider the issue of indexing in advance when the program is ready to go online, analyze which functions are likely to be frequently used by users, and add indexes to these parts

The more indexes you add, the better. It is true that if you add indexes to all fields, the speed of querying all fields will increase accordingly, but if you really do this, even if you perform a very simple "write" The operation means that the index structure on the hard disk needs to be changed again, which may bring a lot of IO behaviors.

Second, the index data structure

Although building an index for data is very similar to building a catalog for a book, the two are slightly different. The difference is that the creation of the index is divided into two steps

The first step is to use the index field as the key to correspond to the data,

The second step is to construct a B+ tree based on the key

As we mentioned before, the default storage engine used by MySQL is the InnoDB engine, and the InnoDB storage engine creates a B+ tree structure index, so the question is, what is a B+ tree structure index?

Before introducing the B+ tree index structure, let’s introduce another example to help you better understand the index. For example, if you want to buy a train ticket when you go home during the Chinese New Year, the national train ticket is equivalent to a row of records in the table. If you want to directly query the specific The train going home, since there are so many trains across the country, it is obviously tantamount to finding a needle in a haystack, so how would you check it?

You can search for the place of departure and destination first, so that you can filter out some records and narrow the scope, and then search for a specific time period to further narrow the scope. So you can continue to narrow the scope, which can greatly improve query efficiency. The conditions stipulated by the railway station for screening are actually equivalent to indexes.

Now let's look at the development of the B+ tree.

B+ tree is developed step by step from binary search tree, balanced binary tree, and B tree. Let's introduce the development history of B+ tree

binary search tree

In the user table in the above figure, the data is initially arranged in a haphazard manner. Now, I want to change this table into a binary tree structure, so how do I do it?

The first step is to add an index to it. Suppose we create an index based on the value of the ID field. For example, if the id of the first record is 10, we will take it out as the key, and the subsequent name as The value corresponds to one by one. When we perform a similar extraction operation on all the records in turn, the extracted table is equivalent to each key corresponding to a record.

In the second step, we can use the size of the key value as a basis to sort the entire table and put it into a binary tree structure similar to the one on the right in the above figure. In this picture, each circle is called a node, those without child nodes are called leaf nodes, the uppermost nodes are called root nodes, and those in the middle are called branches. It is easy to find that the key value of the left child node of any node in the binary search tree is smaller than the key value of the current node, and the key value of the right child node is greater than the key value of the current node.

In the figure above, if we want to query the data whose id is 8, if we don’t use the index, we have to traverse in order, and the data can be queried for the fifth time, but if we use the binary tree index, because 8 is smaller than 10, we can directly locate it The branch on the left, and on the second floor, since 8 is larger than 7, we can locate the branch on the right, and finally find the data we need on the third floor, which requires a total of 3 searches. More importantly, in this table, no matter what data you want to query, since there are only 3 layers in the tree structure, you only need 3 queries at most to find it. The 3 layers here are called the height of the binary tree. Obviously, the height of the binary tree is the maximum number of IO times of the data to be found.

balanced binary tree

However, according to the characteristics of the binary search tree, the above figure can also be called a binary search tree, but the height of this binary search tree is too high to achieve the purpose of reducing the number of IOs, so we introduced a new concept - Balance the binary tree. The characteristic of a balanced binary tree is that the height difference between the left and right subtrees of each node cannot exceed 1. However, the data structure of a balanced binary tree will also have certain problems. Take the model in the above figure as an example. When we create an index, we need to write the key-value pairs of the index to the hard disk, and then read the data of a node every time IO, and put it into the memory. , this can certainly speed up, but it's like driving a truck to transport goods. Why do we only put a small cargo in each truck?

B-tree

It is easy to think that when we read data from the hard disk to the memory, what we read is a disk block, or a page of data in the database. So we transitioned to the next stage, the B-tree. Based on the balanced binary tree, the B-tree introduces the concept of disk blocks, and puts more nodes in one disk block. In each page of the B-tree, the concept of a pointer is introduced. For example, in the above figure, the root page is divided according to different ranges, and a pointer is used to point to a branch, which further improves the query efficiency. However, there are still some problems in the B-tree. The nodes in each page of the B-tree store both the key and the corresponding record, which means that each node takes up a large space, which means that each page can only store less nodes, so that the height of the tree will be high when storing the same amount of data. Is there any way to further reduce the height of the tree?

B+ tree

The B+ tree can be regarded as based on the B tree. Non-leaf nodes do not store data but only keys, and only leaf nodes store keys and recorded values. In this way, each node can store more key values, further Reduce the height of the tree. In addition, there are also pointers between the leaf nodes of the B+ tree, which are arranged in an orderly manner, which means that the B+ tree has natural advantages in the sorting operation. In addition, in the range query, once a leaf node is matched, you can Find other leaf nodes directly in order, without having to search from the root of the tree again. These advantages greatly improve the search efficiency.

3. Index classification and difference

1. Hash index
Index can be divided into hash index and B+ tree index

In terms of query speed, if it is an equivalent query, then the hash index obviously has an absolute advantage, because it only needs to go through the Hash algorithm once to find the corresponding key value. Of course, this premise is that the key value is unique. If the key value is not unique, you need to find the location of the key first, and then scan backwards according to the linked list until you find the corresponding data, which will greatly reduce the search efficiency of the Hash index

The Hash index is unordered. If it is a range query retrieval, the Hash index will not work at this time. Even if the key values ​​​​are originally ordered, they will become discontinuous after the Hash algorithm, so the hash algorithm is not Supports range queries and sorting operations, and cannot support multi-column joint indexes and partial fuzzy queries

In most scenarios, there will be query features such as combined query, range query, sorting, grouping, and fuzzy query. Hash index cannot meet the requirements. It is recommended that the database use B+ tree index. However, in some high-discrete, large-scale data bases, and equivalent queries, the Hash index is more advantageous.

2. Classification and difference of B+ tree index

The index of B+ tree can be divided into cluster index and auxiliary index

Clustered index: Also known as clustered index, it is an index created with the value of the primary key field (usually id) as the key (only one in a table). If no primary key field is specified, it will automatically use a non-empty and unique field as the primary key If there is no field that meets the conditions, the Innodb storage engine will automatically create a hidden field as the primary key field, and the leaf node of the clustered index stores a whole data record corresponding to the id.

Auxiliary index: An index created for a non-primary key field (one table can contain multiple), and the corresponding primary key field is stored in the leaf node of the auxiliary index

Summarize:

The same point between clustered index and auxiliary index: they are all B+ tree structures, non-leaf nodes only store key values, while leaf nodes store key and value

The difference: the value corresponding to the clustered index key is a complete row of records

The value corresponding to the auxiliary index key is the value of its corresponding primary key field

Two concepts are also involved here: query back to the table and covering index

Next, let's take an example to explain in detail the covering index and query back to the table

First, create a table and insert data

create table t3(
	id int primary key auto_increment,
	name varchar(16),
	age int
)engine=innodb;

insert into t3(name,age) values
("egon",18),
("tom",18),
("lili",18);

Id

name

age

1

ly

18

2

tom

18

3

lili

18

Covering index: In the currently hit index structure, you can get all the data you need

For example, the following two SQL statements can query this table to cover the index

select * from t3 where id = 3;  -- 覆盖了索引
select name,id from t3 where name="ly";  -- 覆盖了索引

Return to the table query: get the primary key value through the auxiliary index, then go back to the clustered index and check from the root to get the values ​​of other fields.
For example, this SQL statement

select name,id,age from t3 where name="ly";

Because the age cannot be directly queried, we need to obtain the primary key id through the auxiliary index, and then find the age value from the clustered index. This is the table return operation

4. The principle of matching the joint index and the leftmost prefix

A joint index refers to the combination of various columns (multiple fields) on the table to make an index

Leftmost prefix match: start matching from the leftmost of the joint index, be sure to bring the leftmost field

Examples of leftmost prefixes:

Table Structure

create index idx_id_name_gender on s1(id,name,gender);

When the following fields appear in the query conditions, the joint index can be hit because it conforms to the leftmost prefix principle

id
id name
id gender
id name gender

For example, the following query statement can hit

where  age = 18 and name = "egon"  and gender="male"

And if only one gender field appears in the query condition, it cannot be hit

where gender="male"

5. Index pushdown technology

Index condition pushdown (index condition pushdown) referred to as ICP, launched in the version of Mysql5., is used to optimize the query.

When the index pushdown technology is not used, when the auxiliary index is used for query, the storage engine retrieves the data through the index, and then returns it to the MySQL server, and the server then judges whether the data meets the conditions.

In the case of using the index pushdown technology, if some columns meet the judgment conditions of the index, the MySQL server will pass this part of the judgment conditions to the storage engine, and then the storage engine will judge whether the index meets the conditions passed by the MySQL server , only when the index meets the conditions will the data be retrieved and returned to the MySQL server.

Index pushdown technology can reduce the number of times the storage engine queries the underlying table, and can also reduce the number of times the MySQL server receives data from the storage engine.

Six, how to use the index correctly?

Seeing this, I believe that many readers will have a question, will the hit index be significantly accelerated?

The answer should be not necessarily. When the following situations occur during the query process, such as the query range is too large, the field occupies too much space or the degree of discrimination is low, or the index is put into a function or participates in the operation, even if the index is hit, it will not will significantly increase the query speed

To this end, we summarize how to use indexes correctly.

1 Create indexes for fields with high discrimination and small footprint

2 The index is hit for the range query, if the range is large, the query efficiency is still very low

Solution: Narrow down the scope of the query, or segment/paginate the values, fetch one by one and finally finish the large range, or use caching together

3 Use index pushdown technology (enabled by default)

4 Do not put query fields into functions or participate in operations

Example:

Deprecated sql statement (put the query field into the function)

	select count(*) from where id*12 = 3;

5 Index coverage: On the premise of hitting the index, the select lookup value exists in this index tree

	create index idx_name on s1(name);
	select name,id from s1 where name="egon";  -- 覆盖了索引
	select name,id,age from s1 where name="egon";  -- 没有覆盖索引,需要回表操作

Note: If the query condition is the primary key field, the index will be covered no matter what is selected, and there is no need to return to the table. Therefore, in the query statement, if you can use the primary key field as the query basis, use it as much as possible

6 Create a joint index and follow the leftmost prefix matching principle

	create index idx_xxx on s1(name,age,gender);

7 We can use the explain statement to optimize the efficiency of the query

 

Guess you like

Origin blog.csdn.net/weixin_47035302/article/details/131195681