Single column index and joint index

1. Introduction

With additional columns in the index, you can narrow the scope of the search, but using one index with two columns is different from using two separate indexes.

The structure of the joint index is similar to that of the phone book. The names of people are composed of surnames and first names. The phone book is first sorted by last name, and then the people with the same surname are sorted by first name. If you know the last name, the phone book is very useful. If you know the first and last name, the phone book is more useful, but if you only know the first name but not the last name, the phone book will be useless.

So when creating a joint index, you should carefully consider the order of the columns. The joint index is very useful when performing a search on all columns in the index or only the first few columns; when only performing a search on any subsequent columns, the joint index is useless.

Two, single column index

When multiple single-column indexes are used for multi-condition queries, the optimizer will give priority to the optimal index strategy. It may use only one index, or it may use multiple indexes. However, multiple single-column indexes will create multiple B+ index trees at the bottom layer, which will take up space and waste a certain search efficiency. Therefore, it is best to build a joint index if there are only multi-condition joint queries.

Third, the principle of the leftmost prefix

As the name implies, it is left-most first. Any continuous index starting from the left-most can be matched. If the first field is a range query, a separate index needs to be built. When creating a joint index, according to business needs, in the where clause The most frequently used column is placed on the far left. In this case, the scalability is better. For example, username is often used as a query condition, but age is not often used, and username needs to be placed in the first position of the joint index, that is, the leftmost.

Fourth, there is a joint index and a single-column index (the fields are duplicated), how will the index be used when querying mysql?

This involves the query optimizer strategy of MySQL itself. When a table has multiple indexes to go, MySQL chooses which index to go according to the cost of the query statement;

Some people say that the where query is in the order from left to right, so the conditions with strong filtering force should be placed first. I have read Baidu on the Internet, and there is such a statement, but I personally tested it, MySQL execution optimizer will optimize it, when the index is not considered, the order of where conditions has no effect on efficiency, the real impact is whether the index is used!

Five, the nature of the joint index

When creating **(a, b, c) joint index, it is equivalent to creating (a) single-column index, (a, b) joint index and (a, b, c) joint index. If you want the index to take effect, only There are three combinations of users; of course, we have tested the combination of a and c, but in fact only the index of a is used, and c is not used.

Six, index failure

 1. Like subquery, put% in front;

2. The non-empty judgment is not null; the index is not used before and after the or statement. When only one of the query fields on the left and right of or is an index, the index is invalid. It will take effect only when the query fields on the left and right of or are indexes;

3, or statement (there are indexes before and after, SQL optimization should avoid writing or statement);

4. Implicit conversion of data type occurs. For example, varchar without single quotes may be automatically converted to int type, invalidating the index and generating a full table scan.

Seven, other knowledge points

1. The fields that need to be indexed must be in the where condition

2. Fields with a small amount of data do not need to be indexed, because there is a certain cost to build an index. If the amount of data is small, there is no need to build an index, and the speed range is slow.

3. The joint index is more advantageous than the index for each column, because the more indexes are built, the more disk space is occupied. The speed will be slower when updating data. In addition, when building a multi-column index, the order needs to be paid attention to. Strict index is put in the front, so that the filtering will be more powerful and more efficient.

8. Introduction to MySQL Storage Engine

1、InnoDB

Support transaction processing, support foreign keys, support crash repair capabilities and concurrency control. If you need to have relatively high requirements for transaction integrity (such as banks) and concurrency control (such as ticket sales), choosing InnoDB has great advantages. If you need to update and delete the database frequently, you can also choose InnoDB, because it supports transaction commit and rollback.

2、MyISAM

Fast insertion speed, low space and memory usage. If the table is mainly used to insert new records and read records, then choosing MyISAM can achieve high processing efficiency. It can also be used if the application integrity and concurrency requirements are relatively low.

Note that the same database can also use tables from multiple storage engines. If a table requires relatively high transaction processing, you can choose InnoDB. In this database, tables with high query requirements can be stored in MyISAM. If the database needs a temporary table for query, you can choose the MEMORY storage engine.

Nine, index structure (methods, algorithms)

Two index structures (algorithms) BTree and Hash are commonly used in mysql. The retrieval methods of the two algorithms are different, and their effects on queries are different.

1、Hash

The underlying implementation of the Hash index is implemented by the Hash table, which is very suitable for querying in the form of key-value, that is, a single key query, or equivalent query.

Hash index can easily provide equivalent query scenarios. Because it is a one-time positioning of data, unlike BTree index, which needs to go from the root node to the branch node, and finally can access the page node, the retrieval efficiency is much higher than that of BTree. index. But for range queries, a full table scan is required.

But why do we use BTree more than Hash? The main Hash itself has many limitations and drawbacks due to its particularity:

  1. Hash index can only satisfy "=", "IN", "<=>" queries, and cannot use range queries.

  2. In the joint index, the Hash index cannot be queried by some index keys. For multiple columns in the joint index, Hash is either all used or not used at all. It does not support the optimal prefix of the joint index supported by BTree, that is, when querying the previous one or several index keys of the joint index, Hash The index cannot be used.

  3. Hash index cannot avoid data sorting operation. Because the Hash index stores the Hash value after Hash calculation, and the size relationship of the Hash value is not necessarily the same as the key value before the Hash operation, the database cannot use the indexed data to Avoid any sorting operations.

  4. Hash index cannot avoid table scanning at any time. Hash index is to store the Hash value of the Hash operation result and the corresponding row pointer information in a Hash table after the index key is subjected to the Hash operation. Because different index keys have the same Hash value, Therefore, even if the number of records of data with a certain Hash key value is satisfied, the query cannot be directly completed from the Hash index. It is still necessary to compare the actual data in the access table and obtain the corresponding result.

  5. The performance of Hash index is not necessarily higher than that of BTree after encountering a large number of equal Hash values. For index keys with relatively low selectivity, if a Hash index is created, there will be a large number of record pointer information associated with the same Hash value. It will be very troublesome to locate a record in this way, and it will waste multiple table data accesses, resulting in lower overall performance.

2、B+ Tree

B+Tree index is the most commonly used MySQL database indexing algorithm, because it can be used not only in the comparison operators =, >, >=, <, <= and between, but also in the like operator, as long as it The query condition of is a constant that does not start with a wildcard.

E.g:

select * from user where name like 'jack%'; select * from user where name like 'jac%k%';

 If a wildcard begins, or if no constant is used, the index will not be used,

E.g: 

select * from user where name like '%jack'; select * from user where name like simply_name;

3. Principle of B+/-Tree

In the database, the amount of data is relatively large, and the multi-way search tree is obviously more suitable for the application scenario of the database. Next, we will introduce these two types of multi-way search trees. After all, as a programmer, how can a B-tree do not have a heart?

B-tree: B-tree is B-tree, which has the following characteristics:

  1. B-trees are different from binary trees. One of their nodes can store multiple keywords and multiple subtree pointers. This is the characteristic of B+ trees;
  2. A B-tree of order m requires that all non-leaf child nodes must have [m/2,m] subtrees except for the root node;
  3. The root node must have only two subtrees, of course, if there is only one root node;
  4. The B-tree is a search binary tree, which is very similar to the binary search tree. The lower the subtree, the smaller, and in the same node, the keywords are sorted by size;
  5. A node of the B-tree requires that the number of subtrees is equal to the number of keywords + 1;

B+ tree is the plus version of B tree

  1. The B+ tree puts all the search results in the leaf nodes, which means that to search for the B+ tree, you must go to the leaf node to return the results;
  2. The number of keywords in each node of the B+ tree is the same as the number of subtree pointers;
  3. Each key of the non-leaf node of the B+ tree corresponds to a pointer, and the key is the maximum or minimum value of the subtree;

Previous: Oracle database access performance optimization

Next: Oracle Row chaining and Row Migration

 

reference:

https://www.cnblogs.com/nov5026/p/11210078.html

 

 

Guess you like

Origin blog.csdn.net/guorui_java/article/details/111144253