Some understanding of Mysql index

What is an index?

An index is a single, physical storage structure that sorts the values ​​of one or more columns in a database table. It is a collection of one or several column values ​​in a table and the corresponding points to the physical identification of these values ​​in the table. The logical pointer list of the data page.

The role of the index is equivalent to the catalog of books, you can quickly find the content you need according to the page number in the catalog. The index provides pointers to the data values ​​stored in the specified column of the table, and then sorts these pointers according to the sort order you specify.

The role of the index?

(1) Get data quickly;

(2) Ensure the uniqueness of data records;

(3) Realize the referential integrity between tables;

(4) When using ORDER by and group by clauses for data retrieval, using indexes can reduce the time for sorting and grouping.

What types of indexes are there?

1. Ordinary index

That is, an index that does not apply any restriction conditions, and the index can be created in any data type. The constraints of the field itself can determine whether its value is empty or unique.

2. Unique index

Use the UNIQUE parameter to set a unique index. When creating the index, the value of the index must be unique. The primary key is a special unique index.

3. Full-text index

Use the FULLTEXT parameter to set the full-text index. Full-text indexes can only be created on fields of type CHAR, VARCHAR, and TEXT. When querying a string type field with a large amount of data, using a full-text index can increase the query speed. Note: Full-text indexing is insensitive to uppercase and lowercase letters by default. You can sort the indexed columns by using binary to perform case-sensitive full-text indexing.

4. Combined index

The composite index is to create an index on multiple fields of the table. The index only wants to create the corresponding multiple fields, you can query through these fields. To apply the index, the user must use the first of these fields.

The difference between the mysql index and the composite index. If the composite index has A, B, C where conditions are followed by B and C or A and B or A and C, will the index hit?

Scenes:

where condition A and condition B and condition C all three indexes hit

where condition B and condition A and condition C all three indexes hit (mysql will automatically optimize, the effect is the same as above)

where condition A and condition B A and B indexes all hit

where condition A and condition C A index hit

where condition B and condition C B, C indexes are not hit

where a=3 order by b; all hit the index

where a=3 order by c; a hit index

where a=3 and b>7 and c=3; (The range value is even a breakpoint)

Summary: there must be the front index (A index), if the breakpoint is broken, the index will be invalid

 5. Covering index (expanding knowledge)

      Before understanding the covering index, let's first understand what a clustered index (primary key index) and secondary index (secondary index) are.

       Clustered index (primary key index):

           The clustered index is to construct a B+ tree according to the primary key of each table, and the record data of the entire table is stored in the leaf nodes.

           The leaf nodes of a clustered index are called data pages. This feature of the clustered index determines that the data in the index-organized table is also part of the index.

       Auxiliary index (secondary index):

           Non-primary key index, leaf node = key value + bookmark. The bookmark of the Innodb storage engine is the primary key index value of the corresponding row of data.

   Let's take a look at what a covering index is. There are three understandings:

  • Explanation 1: The data column of select can be obtained only from the index, without reading from the data table (without returning to the table), in other words, the query column should be covered by the index used.
  • Explanation 2: Indexes are a way to find rows efficiently. When the desired data can be read by retrieving the index, there is no need to read rows from the data table. If an index contains (or covers) data that meets the fields and conditions in the query statement, it is called a covering index.
  • Explanation 3: It is a form of non-clustered composite index, which includes all columns used in the Select, Join and Where clauses of the query (that is, the indexed field happens to cover the query statement [select clause] and query condition [ The fields involved in the Where clause], that is, the index contains all the data that the query is looking for).

  Not all types of indexes can become covering indexes. Covering indexes must store indexed columns, while hash indexes, spatial indexes, and full-text indexes do not store the value of indexed columns, so MySQL can only use B-Tree index as a covering index

 

advantage:

  • Index entries are usually smaller than records, so MySQL accesses less data.
  •  Indexes are stored in terms of value size, which requires less I/O than random access records.
  •  The data engine can better cache indexes, for example, MyISAM only caches indexes.
  •  Covering indexes are especially useful for InnoDB, because InnoDB uses a clustered index to organize data. If the secondary index contains the data required by the query, it is no longer necessary to look up in the clustered index.

 limit:

  •  Covering indexes are also not suitable for any index type, the index must store the value of the column.
  •  Hash and full-text indexes do not store values, so MySQL can only use BTree.
  •  Different storage engines implement covering indexes differently, and not all storage engines support covering indexes.
  •  If you want to use a covering index, you must pay attention to the SELECT list value to take out the required columns, not SELECT *, because if all the fields are indexed together, the index file will be too large and the query performance will decrease.

Index method?

HASH

Because HASH is unique (almost 100% unique) and similar key-value pairs, it is very suitable as an index.

The HASH index can be located at one time and does not need to be searched layer by layer like a tree index, so it has extremely high efficiency. However, this efficiency is conditional, that is, it is only efficient under the conditions of "=" and "in", and it is still not efficient for range queries, sorting, and combined indexes.

BTREE (most commonly used)

BTREE index is a kind of index value stored in a tree-shaped data structure (binary tree) according to a certain algorithm. Each query starts from the entry root of the tree, traverses the node in turn, and obtains the leaf. This is the default and most commonly used index type in MySQL.

What is the difference between B+tree and B-tree?

First look at the picture above

B-tree:

 It can be seen that the biggest difference between the B+ tree and the B tree is that each leaf node contains a pointer to the next leaf node, and the pointer of the leaf node points to the indexed data, not other node pages. Non-leaf nodes only have an indexing function, and the information related to the data is stored in the leaf nodes. When searching, the storage engine can perform a binary search through the root node layer by layer. Since the B+ tree does not contain data information on internal nodes (each non-leaf node stores more key values, greatly reducing the height of the tree), it takes up less space; a linked list is formed between leaf nodes, which is convenient The traversal and range search of leaf nodes.

to sum up

B-tree

Each node stores key and data, all nodes form this tree, and the leaf node pointer is null, the leaf node does not contain any key information

B+tree

1. All non-leaf nodes only store keyword information. No specific data is stored, which means that a node can store more keys.
2. All satellite data (specific data) are stored in leaf nodes.
3. All leaf nodes contain information about all elements.
4. There is a chain pointer between all leaf nodes.

5. Reduce the height of the tree. The smaller the tree height, the fewer I/O times. The higher the efficiency.

We found that b+tree has the following characteristics:

  • It is particularly effective and fast for queries within a range (through the chain pointer of the leaf);
  • The specific key value query is only a little less efficient than b-tree (because it needs to reach the leaf level), but it can also be ignored;

Why is B+ more suitable for file indexing and database indexing of operating systems in practical applications than B-tree?

1. B+ disk read and write costs are lower

The internal node of B+ does not have a pointer to the specific information of the keyword. Therefore, its internal nodes are smaller than the B-tree. If all the keywords of the same internal node are stored in the same disk block, the more keywords the disk block can hold. The more keywords that need to be searched are read into the memory at one time. Relatively speaking, the number of IO reads and writes is reduced.

2. The query efficiency of B+tree is more stable

Because the non-terminal point is not the node that ultimately points to the content of the file, but only the index of the keyword in the leaf node. Therefore, any keyword search must take a path from the root node to the leaf node. The path length of all keyword queries is the same, resulting in the same query efficiency for each data.

What is the difference between a clustered index and a non-clustered index?

Clustered index (clustered index) :

The arrangement order of the clustered index table records is consistent with the arrangement order of the index, so the query efficiency is fast. As long as the first index value record is found, the remaining continuous records are stored continuously in the same physical environment. The disadvantage of the clustered index is that it is slow to modify, because in order to ensure that the physical and index order of the records in the table are consistent, the data pages will be reordered when the records are inserted.
The clustered index is similar to the Xinhua dictionary using pinyin to find Chinese characters. The pinyin search table is arranged in the order of a~z in the secretary, just like the same logical order and physical order. When you need to find the two pronunciations of a and ai When you want to find multiple silly (sha) homophones at once, you may turn back a few pages, or just follow the next line to get the result.

Non-clustered index (nonclustered index):

The non-clustered index specifies the logical order of the records in the table, but the physical and index of the records are not necessarily the same. Both indexes use the B+ tree structure. The leaf layer of the non-clustered index does not overlap with the actual data page, but the leaf layer Contains a pointer to the record in the table in the data page. The non-clustered index has many levels and will not cause data rearrangement.
The non-clustered index is similar to querying Chinese characters through radicals in the Xinhua dictionary. The retrieval table may be arranged in horizontal, vertical, and prime, but because the text is in the phonetic order of a~z, it is similar to the logical address in the physical The address does not correspond. At the same time, the applicable situation is grouping, a large number of different values, and frequently updated columns. These situations are not suitable for clustered indexes.

The fundamental difference:

The fundamental difference between a clustered index and a non-clustered index is whether the order of the table records is consistent with the order of the index.

It also comes with the difference between the two storage engines:

1. MyISAM is non-transaction-safe, while InnoDB is transaction-safe

2. The granularity of MyISAM locks is table-level, while InnoDB supports row-level locks

3. MyISAM is relatively simple, and its efficiency is better than InnoDB. Small applications can consider using MyISAM

4. The MyISAM table is saved as a file, which is more convenient for cross-platform use

5. MyISAM manages non-transactional tables, provides high-speed storage and retrieval, and full-text search capabilities. If you perform a large number of select operations in the application, you can choose

6. InnoDB is used for transaction processing and has features such as ACID transaction support. If you perform a large number of insert and update operations in your application, you can choose.

Guess you like

Origin blog.csdn.net/qq_43037478/article/details/113544147