mysql database | index

Database index

1. Index Type

An index is a structure for sorting the values ​​of one or more columns in a database table. Using an index can quickly access specific information in a database table. If you want to find a specific employee by his or her last name, the index helps to obtain information faster than searching all rows in the table.

1.1 Single column index

An index contains only one column, and there can be multiple single-column indexes in a table

  1. Primary key index

  2. Unique index

CREAT UNIQUE INDEX 索引名 ON 表名(字段名)
  1. Normal index
CREAT INDEX 索引名 ON 表名(字段名)

1.2 Combined index

Contains two or more columns

CREAT INDEX 索引名 ON 表名(字段A,字段B,字段C)

Note: Follow the "leftmost prefix" of mysql combined query when querying ,


Delete statement:

DORP INDEX indexName ON TableName

1.3 Clustered index (also called clustered index, primary key index)

A table can only have one clustered index, and it is built on the primary key. The primary key generally adopts methods such as self-incrementing ID and GUID. Since the self-incrementing ID is ordered, it occupies less bytes. So it has advantages in performance and space.

2. The advantages and disadvantages of indexing

2.1 Advantages

  1. You can ensure the uniqueness of each row of data by establishing a unique index or primary key index
  2. Indexing can greatly improve the speed of retrieval and reduce the number of retrieved rows in the table
  3. Can accelerate the connection speed of the meter and the meter
  4. Can reduce the time consumed by grouping (group by)/sorting (order by)
  5. Sort data by index, reduce the cost of data sorting, and reduce CPU consumption

2.2 Disadvantages

  1. Creating and maintaining indexes takes time and memory
  2. Index files take up physical space
  3. When insert / update / delete operations are performed in the table, the index also needs to be dynamically maintained, so the efficiency of maintenance will be reduced

2.3 Points to note when joining the index

Need to consider which columns are required and which columns are not required

  1. On columns that are often searched, adding indexes can improve query speed.
  2. The uniqueness of the column can be guaranteed on the primary key column
  3. Adding an index to the condition of the table and table connection can speed up the connection query
  4. It is often necessary to sort order by / group by / deduplication distinct and indexing can speed up the time of sorting and query
  5. Index does not contain null values

2.4 Under what circumstances is it best to add (or not) index?

  1. The primary key automatically creates a unique index
  2. Fields frequently used as query conditions should be indexed
  3. Query the fields associated with other tables in the query, and create an index for the foreign key relationship
  4. Frequently updated fields are not suitable for indexing, because each update not only updates the record but also updates the index
  5. Fields that are not used in the WHERE condition are not indexed
  6. The choice of single key and composite index (suitable to create composite index under high concurrency)
  7. The sorted fields in the query, if the sorted fields are accessed through the index, the sorting speed will be greatly improved
  8. Statistics or grouping fields in the query

Unsuitable occasions for using indexes:
9. Avoid excessive indexes on tables that are frequently updated.
10. It is best not to use indexes for tables with small amounts of data. Because of the small amount of data, it may take longer to query all data than to traverse the index. If the time is shorter, the index may not produce optimization effects.
11. Do not create indexes on columns with fewer different values
. 12.

3. Scenarios and reasons for index failure

3.1 Scene

  1. When there are calculations on the column
  2. Non-leftmost prefix query
  3. Range query (>the one on the right will be invalid)
  4. When like is on the right of%
  5. When skipping a certain data (<> time)
  6. The column type is a string, and the query conditions are not quoted
  7. The column (indexed column) is not used as the query condition
  8. Use OR in the query condition (if the index is to be effective, each column in or needs to be indexed ).

3.2 Reason

Here is an example of a specific scenario where a joint index fails

The knowledge points that need to be understood first:
the sorting of the joint index first sort the first field to sort, if the first field is the same, then sort by the second field...and so on


Insert picture description here
As shown in the above figure:
2. The leftmost prefix is ​​invalid: The previous element is not found when the binary search method is searched, so the index fails. Refer to the above knowledge point [Only when the first element is equal, the second element is ordered The / conversely disorder]
3. The index on the right of ">" is invalid: Insert picture description here
from the figure, we can get the conditions of a=2 and a=3 according to the condition a>1, then the b element is 1, 4, 1, 2 is disorder
4. The index of %xx and %xx% in the like keyword is invalid: Same as above, which violates the leftmost prefix rule

4. Index data structure

This
article mainly explains the underlying index data structure of Mysql: that is , there are other data structures in B+Tree mysql, such as hash / B tree, etc., these data structures have a common problem, which can not be solved (or is extremely inefficient) ) To perform a range query (or multiple rounds of search)

4.1 Overview of B+tree

B+tree as shown in the figure:
Insert picture description here
Basic concepts of B+Tree:

  1. A node can store two values
  2. The leaf node layer is a singly linked list
  3. Solved the problem of turning search
  4. The upper layer (not the last layer) only stores keys (0001, etc.), and the bottom layer stores key-value (data-data address)
  5. In the range lookup speed is very high in
  6. The B+ tree forms sorted data in the tree, and there is no need to generate sorted files

4.2 Disadvantages of other index data structures

  1. Hash lookup: Disordered hash lookup in mysql needs to be ordered before it can be looked up / and there is a probability of hash collision
  2. Balanced binary tree: when the height of the tree is relatively high, the query speed will also slow down / range query also need to look back
  3. B-tree: There is also the problem of convolutional search . So b+tree is introduced

5. How does the bottom layer find the real data through the index?

5.1 mysql search engine and method

5.1.1 MyISAM

Insert picture description here

  1. Search through the primary key index to find the physical address of the corresponding index
  2. Then look up the indexed information according to the address

5.1.2 InnoDB

Insert picture description here

  1. Primary index tree (actually a primary key index): You can find all information about the entire row of data through the primary key
  2. Auxiliary Index: auxiliary index Suppose that user_name for the index, it can only be found at the bottom of the name of the corresponding id, id get all the information using the bank data by primary key index to complete the inquiry.

Summary: In InnoDB, if the main index can be found only once, the efficiency is generally greater than that of MyISAM; if auxiliary index queries are required, that is, two queries, the efficiency may be lower than that of MyISAM.

5.1.3 The difference between the two

  1. Transaction aspect: InnoDB supports transaction MyISAM does not support [This is one of the important reasons for mysql to change the default storage engine from MyISAM to InnoDB]

  2. Foreign keys: InnoDB supports MyISAM and does not support it. Converting an InnoDB table containing foreign keys to a MyISAM table will fail

  3. Index terms: InnoDB is a clustered index , full-text indexing is not supported, but you can use InnoDB sphinx插件support full-text indexing, and the effect is very good; MyISAM non-clustered indexes, full-text indexing support.

  4. Lock granularity: InnoDB's smallest granularity is row locks ; MyISAM's smallest granularity is table locks. [This is also one of the important reasons why MySQL changed the default storage engine from MyISAM to InnoDB]

6. The difference between the indexes of the two engines in the local storage of files

Hard disk storage structure:

MyISAM : Divide into three files on the disk (.frm / .MYD / .MYI)

  • .frm: storage table definition
  • .MYD: Data file (MyData)
  • .MYI: Index file (MyIndex)

InnoDB: The disk is divided into two files (.Frm / .Ibd)-there is no special file to save data

  • .Frm: table definition
  • .Ibd: Data and index storage files, data is clustered indexed by the primary key, and the real data is stored in the leaf nodes (B+tree structure)

7. Expansion

7.1 Under what circumstances does the composite index fail (there are examples)

Multi-column index in MySql:

  1. Joint index is also called compound index. For compound indexes: Mysql uses the fields in the index from left to right. A query can use only a part of the index, but only the leftmost part.

    For example, the index is key index (a,b,c)to support a | a,b| a,b,cthree combinations to find, but not b, c to find, when the left-most field is a constant reference, the index will be very effective.

  2. Multi-column indexing is more advantageous than indexing for each column separately, because the more indexes you create, the more disk space it takes up, and the slower the speed when updating data. In addition, when building a multi-column index, the order also needs to be paid attention to. The strict index should be placed first, so that the filtering will be more powerful and more efficient.

  3. The effective principle of the combined index is: use from front to back to take effect, if an index in the middle is not used, then the index part before the breakpoint will work, and the index after the breakpoint will not work;

    • Such as where a=3 and b=45 and c=5. This three index sequence uses no breakpoints in the middle to all play a role;
    • where a=3 and c=5 In this case, b is the breakpoint, a has an effect, and c has no effect
    • where b=3 and c=4 In this case, a is a breakpoint, and the index behind a does not play a role, and this type of joint index does not play any effect;
    • where b=45 and a=3 and c=5The first one with the same, all play a role, abc long to spend on the line , nothing to do with the order written

Also note that (a,b,c) multi-column index and (a,c,b) are not the same

Classic example:

  • select * from mytable where a=3 and b=5 and c=4; The three indexes of abc are used in the where condition, and they all play a role
  • select * from mytable where c=4 and b=6 and a=3; This statement is listed just to show that mysql is not so stupid. The order of conditions in where will be automatically optimized by mysql's query optimizer before the query. The effect is the same as the previous sentence.
  • select * from mytable where a=3 and c=7; a uses the index, b is useless, so c does not use the index effect
  • select * from mytable where a=3 and b>7 and c=3; a is used, b is also used, c is not used, here b is the range value, which is also a breakpoint, but the index is used by itself
  • select * from mytable where b=3 and c=4; Because the a index is not used, so here bc does not use the index effect
  • select * from mytable where a>4 and b=7 and c=9; a is used, b is not used, c is not used
  • select * from mytable where a=3 order by b; a uses the index, and b also uses the effect of the index in the result sorting. As mentioned earlier, b in any paragraph below a is sorted
  • select * from mytable where a=3 order by c;a uses the index, but here c does not play the sorting effect, because the intermediate breakpoint , use explain can see filesort
  • select * from mytable where b=3 order by a; b does not use the index, and a does not play the index effect in sorting

Then if we create two column indexes on a and b respectively, mysql's processing method will be different. It will select the most stringent index for retrieval, which can be understood as the index with the strongest retrieval capability to retrieve. In addition One cannot be used, so the effect is not as good as a multi-column index.


About inwhether the trigger index?

Answer: Index may be used

  1. Too many IN conditions will cause the index to fail, and take an index scan
  2. Too many IN conditions will return a lot of data, which may cause memory overflow in the application heap.

Guess you like

Origin blog.csdn.net/weixin_40597409/article/details/115270131