MySQL advanced (three) index introduction and use

MySQL advanced (two) index introduction and use

4. Introduction to Index

4.1 What is an index?

  • Indexes are data structures that help MySQL obtain data efficiently , and indexes are data structures!
  • The purpose of indexing is to improve query efficiency , which can be compared to a dictionary
  • An index is a data structure that the database system maintains to meet the characteristics of the search algorithm . These data structures reference (point to) data in a certain way
  • Generally speaking, the index is also very large, it is impossible to store all in the memory, so the index is often stored on the disk in the form of an index file

Summary: The sorted fast search data structure is the index , so the index will affect the conditional filtering behind the WHERE and ORDER BY keywords.

​ The index we usually talk about, if not specifically specified, refers to the index of the **B tree (multiple search tree)** organization structure. Among them, clustered index, conforming index, prefix index, and unique index all use B+ tree index by default

4.2 2 fork BTREE index data structure case :

Insert picture description here

​ In order to speed up the search of Col2 , you can maintain a binary search tree as shown on the right. Each node contains an index key value and a pointer to the physical address of the corresponding data record, so that two search can be used within a certain complexity Obtain the corresponding data, so as to quickly retrieve the records that meet the conditions.

Note : When adding or deleting data, you need to modify the index to ensure that the index does not become invalid. Therefore, indexing is not recommended for data that is frequently added or deleted. You can use logical deletion (flag bit deletion) to solve the deletion problem.

4.3 Advantages and disadvantages of indexing :

Advantages :

  • Similar to the university library to build a bibliographic index, improve the efficiency of data retrieval, and reduce the IO cost of the database
  • Sorting data by indexing columns reduces the cost of data sorting and reduces CPU consumption

Disadvantages :

  • In fact, the index is also a table that holds the primary key and index fields, and points to record a solid table, the index is the index column to occupy the space of
  • Although the index greatly improves the query speed, it will also reduce the speed of updating the table , such as INSERT, UPDATE and DELETE on the table. Because when updating the table, MySQL must not only save the data, but also save the index file. Every time the field of the index column is updated, it will adjust the index information after the key value changes caused by the update. The data location needs to be modified at the same time.
  • Indexes are just a factor to improve efficiency. If MySQL has tables with a large amount of data, it takes time to research and build the best indexes and optimize queries.

4.4 Classification of index :

  1. Single-valued index: that is, an index contains only a single column, and a table can have multiple single-valued indexes
  2. Unique index: The value of the index column must be unique, but null values ​​are allowed
  3. Composite index: an index contains multiple columns

Basic syntax :

  • create
CREATE [UNIQUE] INDEX indexName ON mytable(columnname(length));

ALTER mytable ADD [UNIQUE] INDEX [indexName] ON (columnname(length));
  • delete
DROP INDEX [indexName] ON mytable;
  • View
SHOW INDEX FROM table_name

4.5 MySQL index structure :

​ MySQL supports 4 kinds of index structures: BTREE index, Hash index, full-text index, R-Tree index. Java development mainly uses BTREE index.

BTREE index :

Insert picture description here

[Introduction to Initialization]

​ In the figure above, a B+ tree, the light blue block is called a disk block . You can see that each disk block contains several data items (shown in dark blue) and pointers (shown in yellow). For example, disk block 1 contains data items 17 and 35, including pointers P1, P2, and P3. P1 indicates disk blocks smaller than 17, P2 indicates disk blocks between 17 and 35, and P3 indicates disk blocks larger than 35.

Real data exists in the leaf nodes , namely 3,5,9,10 .... Non-leaf nodes do not store real data, and value stores data items that guide the search direction , such as 17, 35, which do not actually exist in the data table.

[Search process]

​ If you want to find data item 29, first load disk block 1 from the disk to the memory. At this time, an IO occurs. Use binary search to determine that 29 is between 17 and 35 in the disk, and lock the P2 pointer of disk block 1. The memory time is very short (compared to disk IO) and can be ignored. The disk block 3 is loaded from the disk to the memory through the disk address of the P2 pointer of the disk block 1, and the second IO occurs, and the P2 pointer of the disk block 3 is locked by binary search, and the disk block 8 is loaded into the memory through the pointer, and the third time occurs IO, the end of the query.

​ The real situation is that a 3-layer B+ tree can represent millions of data. If millions of data searches only require three IOs, the performance improvement will be huge. If there is no index, each data item will happen once IO, then a total of millions of IOs are required, which is obviously very costly.

4.6 Analysis of index establishment:

4.6.1 What situations need to create an index :

  • The primary key automatically creates a unique index
  • Fields frequently used as where query conditions should create indexes
  • Query the fields associated with other tables in the query, and create an index for the foreign key relationship
  • The sorted field in the query, if the sorted field is accessed through the index, the sorting speed will be greatly improved
  • Statistics or grouping fields in the query , because the grouping must be sorted
  • High concurrency tends to create composite indexes

As long as the index is for where retrieval and order by sorting

4.6.2 What situations do not need to create an index :

  • Too few table records
  • Tables that are frequently added, deleted
  • Table fields with repeated and evenly distributed data , so you should only build indexes for the most frequently queried and most frequently sorted data columns. If a data column contains many duplicate content, indexing it will not have much practical effect

Index selectivity refers to the ratio of the number of different values ​​in the index column to the number of records in the table. If there are 2000 records in a table, and the table index column has 1980 different values, the selectivity of this index is 1980/2000=0.99. The closer the selectivity of an index is to 1, the higher the efficiency of the index.

Guess you like

Origin blog.csdn.net/weixin_44634197/article/details/108902575