Several principles of mySQL index optimization

1. The mysql index minimizes the operation of returning to the table;

For example, a table has fields such as id, name, and sex, but the name is only indexed (no covering index is made), select*from table where name="a", this SQL query will involve returning to the table, It will first find the leaf node with name = "a" (which contains the id of the data), and then go back to find all the data of the id according to the id, which is equivalent to walking the index again to find the value of the sex field. If you want to solve this back table, build a covering index (joint index), that is, name and sex are combined into one index when building the index, so that you can find the name and sex clustered index at one time (the default is the primary key, if not set not null or Unique field): It contains all the fields of the data.

2. How to place the large and small tables when querying the associated table? (Use short indexes to improve I/O efficiency during index access)

The small table drives the large table, why is the small table driving the large table fast?

2.1, multi-table join, similar to loop nesting, the outer loop is 5 times, and the inner loop is 1000 times. If the small loop result is outside, only connect 5 times for the database connection, and perform 5000 operations. If the large loop result Outside, the database needs to be connected 1000 times, which wastes resources and increases consumption;

For example, the department table dept (id), (the table with large data) the employee table emp (id, dept_id) select * from emp, dept where emp.dept_id=dept.id When querying the database, dept.id is given priority and then combined with emp.dept_id to match equality, which narrows the matching range and improves search efficiency.

3. The like query follows the leftmost matching principle;

Try not to perform calculations on the field being queried that cannot determine the value. Like fuzzy search may not necessarily lead to index failure. We can analyze the underlying data structure of the index. It is a B+ tree structure, and B+ trees have a characteristic , the value of the left subtree is smaller than the value of the right subtree. It is a way to arrange the data you have stored on the tree according to the order of a certain area. When we query a certain value, the value hash is calculated first. After calculation Just match from the tree. If the % of like is in the front, such as (%es), you will not know how to search for the data from the tree, so you can’t find it. If you want to go to the index, you must follow the leftmost match. In addition, some values that need to be calculated are Uncertainty, so you can't find which area of the tree it will fall into, so you can't go to the index.

4. When querying in pages, the limit will become slower and slower at the end;

The reason why it is getting slower and slower is that for example, limit 10, 10, it has to scan the first 10 items, discard them, and then find 10 items, then limit 100, 10, it needs to scan 100 items, the more scans, the more io will be, so if you want to The method allows it to go to the index. For example, the table is self-incrementing. When we query limit100000, 10, we can select name from emp where id> (the size of the last time, for example, here is 10000) limit 10, so the efficiency will be high a lot

5. Use varchar instead of char as much as possible;

char是一种固定长度的类型，varchar则是一种可变长度的类型 
尽可能的使用 varchar 代替 char ，因为首先变长字段存储空间小，可以节省存储空间， 
其次对于查询来说，在一个相对较小的字段内搜索效率显然要高些。

6. Try to avoid using or to connect conditions in the where clause;

If a field has an index and a field does not have an index, it will cause the engine to give up using the index and perform a full table scan;

7. Try to avoid using the != or <> operator in the where clause, otherwise the engine will give up using the index and perform a full table scan.

8. Do not perform column operations: where age + 1 = 10;

Any operation on the column will cause a table scan, which includes database tutorial functions, calculation expressions, etc., will invalidate the index.

9. Solve the problem that the character set of the associated field is different when querying the joint table (error will be reported)

10. Avoid subqueries as much as possible, and optimize subqueries for multi-table queries;

Generally speaking, the connection query is more efficient, because the subquery traverses the data multiple times, while the connection query only traverses once, but if the amount of data is small, the subquery is easier to control. But if the amount of data is large, the difference between the two will be obvious. For a large amount of data, it must be faster to use connection query;

11. Why is it not suitable to add indexes to fields with low distinction? How does the index store B+?

MySQL is B+ storage, the index is to improve the efficiency of data retrieval and reduce the speed of disk io, because our data exists on the disk, when we go to the disk to load data, it will involve disk io, so we need to reduce one of the disk io times to improve performance. Then why not use B-tree but B+, he has to consider a feature of our MySQL relational database itself, not to say that B-tree is not good, it is better to use B-tree than MangoDB, the technology is not good or bad, so here it is B+ The advantages are: 1. When we can perform range search, it will be more convenient, because the B+ tree only needs to traverse the leaf nodes to realize the traversal of the entire tree, while the B tree needs to traverse from the root node from top to bottom 2. Reduce the number of disk ios, because B+ non-leaf nodes store data in an index structure, and leaf nodes store actual data, while B-tree non-leaf nodes and leaf nodes store actual data. MySQL has a page size of 16K, and B+ trees are non-leaf The smaller the data volume of the node, the lower the height will be, and the number of disk io accesses will be reduced as much as possible;

When using a field with a low degree of discrimination as an index, it may cause each branch of the entire B+ tree to be traversed. For example, adding an index to the field using gender and men and women means that the data of men and women will exist the same, so it will exist On the B+ tree, many branches will be stored in one store, which will lead to performance degradation.