MySql index analysis and query optimization

B-Tree

image-20210602200336400

  1. Core Features :
    1. Multiway, non-binary tree
    2. Each node holds both the index and the data
    3. Search is equivalent to binary search

B+Tree

image-20210602200448394

  1. Core Features
    1. multi-way non-binary
    2. Only leaf nodes save data
    3. Search is equivalent to binary search
    4. Added pointers to adjacent junctions.

B-Tree VS B+Tree

  1. The B+ tree query time complexity is fixed at log n, and the B-tree query complexity is preferably O(1).
  2. The pointers of the adjacent nodes of the B+ tree can greatly increase the range access, and can be used in range queries, etc., while the key and data of each node of the B-tree are together, and the range search cannot be performed.
  3. B+ trees are more suitable for external storage, that is, disk storage. Since the intermediate node has no data field, the range that each node can index is larger and more accurate.

MySQL InnoDB

Data storage :

image-20210602200938723

  1. The data structure is stored according to the B+ tree, which is itself an index, also called a clustered index.
  2. key is the primary key
  3. InnoDB requires that the table must have a primary key
  4. If no specification is displayed, the unique ID record will be automatically selected as the primary key, and if it does not exist, it will be created automatically

Normal index :

image-20210602201049820

  1. The primary key stored by the leaf node, not the physical address of the row
  2. Two retrievals are required, (1) retrieving the primary key (2) retrieving data based on the primary key
  3. Advantages of storing primary keys: no changes are required when pages are split or data is moved
  4. The primary key should be designed as small as possible, the reason: every index will be stored, too large waste of space
  5. The primary key is best ordered to reduce the cost of index maintenance

MySQL InnoDB page size :

image-20210602201115931

SHOW VARIABLES LIKE 'innodb_page_size';

Find data with keyword=8 :

image-20210602201228594

  1. The root page has a fixed location in the tablespace.
  2. The root page is loaded into memory, and the pointer P6 is found
  3. The page pointed to by P6 is loaded into memory
  4. Binary search found 8

A B+ tree with a height of 3 in InnoDB stores :

  1. Suppose a row of data in the data table is 1K.
  2. Assuming that the primary key ID is of type bigint, the length is 8 bytes, and the pointer size is set to 6 bytes in the InnoDB source code, so that 1 occupies 14 bytes in total.
  3. The number of pointers (number of indexes) stored in a page:
  4. 16KB(16*1024=16384 byte)16384/14=1170
  5. A B+ tree with a height of 2 can store 1170*16=18720 data records.
  6. A B+ tree of height 3 can store:
  7. 1170 (number of indexes) * 1170 (number of indexes) * 16 (number of rows per page) = 21902400 (20 million) such records

High-performance indexing strategy :

  1. Separate columns :
    1. An index column cannot be part of an expression, nor can it be an argument to a function.
    2. 如:select x ,y,z from table where x+1 = 2;
    3. If x is an index, the above sql cannot use the index and should be written as: select x ,y,z from table where x = 1;
  2. Index selectivity : the ratio of the number of unique indexes (X, X<=T) to the total number of records in the data table (T), ranging from X/T to 1. The higher the selectivity of the index, the higher the query efficiency. The unique index X=T, its selectivity is 1, so the performance of the unique index is the best.
  3. Prefix index :
    1. For a TEXT or VARCHAR type column, when the value in the column is very long and must be used for querying, the first few values ​​of the column must be used as an index, that is, a prefix index, because the value of the entire column is regarded as When indexing, B+tree will occupy a very large space, and it is inconvenient to search.
    2. Prefix index formulation principle: The selectivity of the prefix index needs to be close to the selectivity of the entire column, so that the performance will not be affected too much, and at the same time, it should not be too long and take up too much space.
    3. How to find the best prefix index?
      1. Assumption: There is a column in a table named testcol with type varchar(100)
      2. Calculate selectivity of complete columns: SELECT COUNT(DISTINCT testcol) / COUNT( * ) FROM table;
      3. Compute selectivity with prefix length x: SELECT COUNT(DISTINCT LEFT( testcol, x )) / COUNT( * ) FROM table;
      4. The selectivity of different prefixes is calculated by changing the value of x. Finally, among multiple values, a more appropriate prefix index can be selected by comprehensively considering the two aspects of selectivity proximity and prefix length.
  4. Multi-column index
    1. select x,y,z from table where x=1 and y=1;
    2. When Mysql executes a query, if a multi-column index key(x,y) is used, the dataset that matches the index of the first column will be queried first, and then the data that matches the second column will be queried in this part of the dataset. By analogy, the data can be selected without scanning the data;
    3. And if a multi-column index is split into multiple single-column indexes (key(x), key(y)), Mysql will only select one of the most restrictive indexes for use when executing a query, and other indexes is wasted, so multi-column index performance is better in the above case.
  5. index order
    1. select x,y,z from table where x=1 and y=1;
    2. x=1 and y = 1 or y=1 and x=1?
    3. Put index columns with high selectivity first; index columns are placed from high to low selectivity
  6. covering index
    1. If an index contains the values ​​of all the fields that need to be queried, it is called a "covering index"
    2. The InnoDB storage engine uses clustered indexes, and covering indexes can avoid back-to-table queries. Because the leaf nodes of the B-Tree of the secondary index of InnoDB store the corresponding primary index, if the secondary index covers the value to be queried, the primary index query is used less once, which improves efficiency.
    3. When an index coverage query is initiated, the "Using index" information is visible in the Extra column of the execution plan.
  7. Index redundancy
    1. When there is a key (a, b) index, it is redundant to create another key (a) because it is just a prefix of a multi-column index.
    2. But when key (b) is created, it is not a redundant index, because the above multi-column index cannot use b alone for index query
  8. Index related issues
    1. The more indexes, the better? Indexes need to be maintained when data is updated, which brings overhead and is created on demand.
    2. **Which columns are suitable for indexing? **Columns with high selectivity, some constants and enumerated fields (such as gender) are not suitable for indexing, and low selectivity also increases maintenance costs, which is not worth the loss.
    3. **Index field type? **Auto-increment fields, when the table is relatively large, uuid and other irregular fields are not suitable as primary keys

MySQL execution plan

image-20210602215404734

  1. select_type

    id select_type description
    1 SIMPLE Does not contain any subqueries or queries like union
    2 PRIMARY The outermost query that contains a subquery is displayed as PRIMARY
    3 SUBQUERY Queries contained in select or where clauses
    4 DERIVED Queries contained in the from clause
    5 UNION Appears in the query statement after union
    6 UNION RESULT Get result set from UNION
  2. type

    type description
    ALL Scan full table data without indexing
    index To traverse the index, you need to scan all the data of the index, such as: select count(*) from tableA
    range index range lookup
    index_merge Index merging, multiple single-column indexes, such as and conditions, or conditions
    index_subquery Use ref in subqueries
    unique_subquery Using eq_ref in subqueries
    ref_or_null Optimized ref for indexing on Null
    fulltext Use full-text indexing, word-by-word search
    ref Use non-unique index to find data, similar to eq_ref, the difference is that the index is not unique
    eq_ref Use PRIMARY KEY or UNIQUE NOT NULL index associations in join queries.
    const Use the primary key or unique index, and the result of the match is only one record.
    system const A special case of the connection type, the table to be queried is a system table, which often does not require disk IO.
  3. possible_keys : Indexes that may be used, note that they may not be used. If there is an index on the field involved in the query, the index will be listed. When the column is NULL, it is necessary to consider whether the current SQL needs to be optimized.

  4. key

    1. Displays the index actually used by MySQL in the query. If no index is used, it is displayed as NULL.
    2. TIPS: If a covering index is used in the query (covering index: the indexed data covers all the data that needs to be queried), the index only appears in the key list.
    3. When select_type is index_merge, more than two indexes may appear here, and only one index appears here for other select_types
  5. key_length : index field length

    1. Calculation formula of char(), varchar() index length:

      (Character Set: utf8mb4=4,utf8=3,gbk=2,latin1=1) * column length + 1 (allow null) + 2 (variable length column)

    2. The formula for calculating the length of the int index: 4+1 (null allowed)

  6. extra : extra information is very rich, common ones are:

    1. Using index to use covering index
    2. Using where uses where clause to filter the result set
    3. Using filesort uses file sorting, which occurs when sorting using non-indexed columns, which consumes a lot of performance and should be optimized as much as possible.
    4. Using temporary uses a temporary table.

sql optimization suggestion

  1. Don't write too complicated SQL statements: An SQL statement should be as simple as possible, and don't nest too many layers.
  2. Use "temporary table" to cache intermediate results: An important way to simplify SQL statements is to use temporary tables to temporarily store intermediate results, which can avoid multiple scans of the main table in the program, greatly reduce blocking, and improve concurrent performance.
  3. When using like, pay attention to whether it will cause a full table scan: sometimes you need to perform some fuzzy queries such as select id from table where username like '%abc%'. The keyword %abc%, because "%" is used in front of abc, so the query will use a full table scan, unless necessary, do not add % before the keyword,
  4. Try to avoid using not in, != or <> operators: use not in, != or <> in the where statement, the engine will give up the use of indexes and perform a full table scan.
  5. Try to avoid using or to connect conditions:
    1. For conditions separated by or, if the column in the condition before or has an index, but there is no index in the following column
      , then the involved index will not be used.
    2. You should try to avoid using or to connect conditions in the where clause, otherwise the engine will give up using the index and perform a full table scan, such as: Assuming that num1 has an index and num2 has no index, the query statement select id from t where num1=10 or num2 =20 will give up the use of the index, you can instead query like this: select id from t where num1=10 union all select id from t where num2=20, so although num2 does not use the index, at least num1 will use the index to improve efficiency
  6. Use numeric fields as much as possible: If the fields only contain numeric information, try not to design character fields, which will reduce query and connection performance and increase storage overhead.
  7. Try not to let the default value of the field be NULL:
    1. In MySQL, columns with null values ​​are difficult to optimize because they complicate indexes, index statistics, and comparison operations.
    2. The index will not contain columns with NULL values. As long as the columns contain NULL values, they will not be included in the index. As long as
      there is one column with NULL values ​​in the composite index, this column is invalid for this composite index.
    3. Therefore, we try not to let the default value of the field be NULL when designing the database. You should specify that the column is NOT NULL unless you want to store NULL. You should replace the null value with 0, a special value, or an empty string.
  8. If the column type is a string, you must remember to enclose the character constant value in quotation marks in the where condition. Otherwise, even if there is an index on the column, MySQL will not use it, because MySQL converts the input constant value by default. Search will be done later. Such as: select * from t_student where std_name = 3;
  9. Use insert into select with caution.
    1. 语句:insert into tableA select * from tableB where date_time > ‘2020- 07-31’
    2. Problem analysis: This statement will cause tableB to be locked gradually, and other operations cannot be performed.
    3. Solution: add an index to the data_time field
  10. Index null value problem
    1. Unique index null value
      1. Multiple rows of data with NULL values ​​are allowed in a unique index
      2. If there is a null value in the joint unique index, the uniqueness will be lost, such as unique key (email, phone). If the phone is empty, multiple records with the same email will exist.
      3. The retrieval of NULL values ​​can only use is null / is not null / <=>, and operators such as =, <, > cannot be used
    2. Ordinary index null value: the null value can still go through the index

Guess you like

Origin blog.csdn.net/a1774381324/article/details/121183991