[MySQL tuning] How to perform MySQL tuning? One article is enough!

navigation:

[Java Notes + Stepping on the Pit Summary] Java Basics + Advanced + JavaWeb + SSM + SpringBoot + St. Regis Takeaway + SpringCloud + Dark Horse Tourism + Guli Mall + Xuecheng Online + MySQL Advanced Chapter + Design Mode + Common Interview Questions + Source Code

Table of contents

1. Monitoring and alarming

2. Troubleshoot slow SQL

2.1 Enable slow query log 

2.2 Find the slowest SQL

2.3 Analyzing query plans 

3. MySQL Tuning

3.1 Basic optimization

3.1.1 Cache Optimization 

3.1.2 Hardware optimization

3.1.3 Parameter optimization 

3.1.4 Clean up garbage regularly

3.1.5 Use an appropriate storage engine

3.1.6 Read and write separation

3.1.7 Sub-database and sub-table

3.2 Table design optimization

3.2.1 Mixed business sub-table, hot and cold data sub-table

3.2.2 Change the joint query to an intermediate relational table

3.2.3 Follow three paradigms

3.2.4 Field suggestion non-null constraint

3.2.5 Using redundant fields

3.2.6 Data type optimization

3.3 Index optimization

3.3.1 11 scenarios considering index failure

3.3.2 Follow the index design principles

3.3.3 Connection query optimization

3.3.4 Subquery optimization

3.3.5 Sorting optimization

3.3.6 Group optimization

3.3.7 Deep paging query optimization

3.3.8 Try to cover the index

3.3.9 String prefix index

3.3.10 Try to use the index pushdown supported by MySQL5.6

3.3.11 For scenarios where more writes are required and fewer reads are made, try to use ordinary indexes

3.4 SQL optimization


MySQL tuning is mainly divided into three steps: monitoring and alarming, troubleshooting slow SQL, and MySQL tuning.

1. Monitoring and alarming

The monitoring tool (such as Prometheus+Grafana) monitors MySQL and finds that the query performance is slow, and the alarm reminds the operation and maintenance personnel:

2. Troubleshoot slow SQL

2.1 Enable slow query log 

View the number of slow queries:

show status like 'slow_queries';

Enable the slow query log and modify the slow query threshold:

set slow_query_log='ON';    #开启慢查询日志
set long_query_time = 1;     #设置慢查询阈值

2.2 Find the slowest SQL

The slow query log analysis tool mysqldumpslow finds the slowest statements:

The specific parameters of the mysqldumpslow command are as follows:

  • -a: Do not abstract numbers into N, abstract strings into S
  • -s: indicates which way to sort:
  • c: number of visits
  • l: lock time
  • r: return record
  • t: query time
  • al: average lock time
  • ar: the average number of records returned
  • at: average query time (default mode)
  • ac: average number of queries
  • -t: is to return the previous number of data;
  • -g: followed by a regular matching pattern, case insensitive;

Example: Sort by query time, view the top five slow query SQL statements

#命令行,按照查询时间排序,查看前五条 慢查询SQL 语句
mysqldumpslow -s t -t 5 /var/lib/mysql/xxx-slow.log    

2.3 Analyzing query plans 

explan analyzes the SQL execution plan (access type, number of records, index length, etc.); mainly focuses on the fields:

  1. possible_keys: Indexes that may be used for queries
  2. key: the index actually used
  3. key_len: The byte length of the index actually used.
  4. type: access type, see if there is an index. all (full table scan), ref (hit non-unique index), const (hit primary key/unique index), range (range index query), index_merge (use multiple indexes), system (fast query for a row of records).
  5. Extra: extra information. See if there is an index.
    1. using index: cover the index, do not return to the table.
    2. using filesort: Additional sorting is required. Sorting is divided into index sorting and filesort sorting. Index sorting is generally faster, and filesort is faster when there is a large amount of query data such as deep paging.
    3. using index condition: Index pushdown. MySQL5.6 began to support. When a field in the joint index is a fuzzy query (non-left fuzzy), after the field is judged, the next few fields can be judged directly. After the judgment is filtered, return to the table to judge the conditions of the fields that are not included in the joint index.
    4. using where: full table scan without indexing.

The role of each column in the execution plan

id Each SELECT clause or join operation will be assigned a unique number. The smaller the number, the higher the priority. Statements with the same id can be considered as a group. If the id is NULL, it means an independent subquery, and the priority of the subquery is higher than that of the main query.
select_type The type of query. Main query (primary), common query (simple), joint query, subquery (subquery), derived (temporary subquery from table), union (query after union), union result()
table Table Name. Display which table the data of the current row belongs to.
partitions Matching partition information. NULL if the table is not partitioned.
type Access type, the query optimization strategy is executed according to methods such as indexing and full table scanning. all (full table scan), ref (hit non-unique index), const (hit primary key/unique index), range (range index query), index_merge (use multiple indexes), system (fast query for a row of records).
possible_keys

Indexes that may be used. Lists which indexes MySQL can use for queries.

If there is only one possible_keys for the column, it usually means that the query is efficient.

If this column has multiple possible_keys, and MySQL only uses one of them, you need to consider whether you need to add a joint index on this column.

key The index actually used. If no KEY is explicitly specified, MySQL will automatically select the optimal index according to the query conditions.
key_len The length in bytes of the index actually used. The shorter the index, the faster it is, and generally the smaller the index field, the better.
ref When using the index column equivalence query, the object information for equivalence matching with the index column. Constant equivalent query const, expression/function when using func, associated query display associated field name
rows Estimated number of records that need to be read. The smaller the value, the better, which means the smaller the result set, the more efficient the query.
filtered The percentage of the number of remaining records after a table is filtered by the search condition. The smaller the value, the better, indicating that the data can be returned directly through the index.
Extra extra information. See if there is an index or a full table scan. Generally, it can be viewed with the type field. Using index (covering index is used), Using where (index query is not used), Using temporary (temporary table stores result set. Sorting/grouping will be used), Using filesort (sorting operation does not use index), Using join buffer (connection condition Unused index), Impossible where (the where constraint statement may have problems resulting in no result set)

3. MySQL Tuning

3.1 Basic optimization

3.1.1 Cache Optimization 

MySQL adjusts parameters such as buffer pool size; introduces redis. tip: InnoDB uses the buffer pool to cache records and indexes

3.1.2 Hardware optimization

Add memory sticks to the server, upgrade SSD solid state drives, distribute disk I/O to multiple devices, and configure multiprocessors.

3.1.3 Parameter optimization 

Close unnecessary services and logs: after tuning, close slow query logs;

Adjust the maximum number of connections: max_connections;

Number of thread pool cache threads: thread_cache_size, cache idle threads, when there is a connection, directly allocate the thread to handle the connection;

Buffer pool size: innodb_buffer_pool_size.

3.1.4  Clean up garbage regularly

Tables, data, logs, caches, etc. that are no longer used should be cleaned up in time to avoid occupying too many MySQL resources, thereby improving MySQL performance.

3.1.5  Use a suitable storage engine

MyISAM: Suitable for scenarios with frequent reads and few writes (because of table-level locks and B+ leaf storage addresses)

InnoDB: Suitable for concurrent write scenarios (because of row-level locks, B+ leaves store records).

InnoDB: supports foreign keys and transactions, row locks are suitable for high concurrency, cache indexes and data, have high memory requirements (because indexes and records need to be cached), are suitable for storing large amounts of data, and have better performance for adding, deleting, and modifying (row-level locks are highly concurrency), Disk-intensive (since there are multiple non-clustered indexes, the index may be larger than the record space).

MyISAM: does not support foreign keys and transactions, table locks are not suitable for high concurrency, cache indexes and data addresses, low memory requirements (because no cache records), better query performance (because InnoDB must maintain MVCC consistency when querying, and more caches record), saving disk (because the disk does not store complete records).

Compared

InnoDB

MyISAM

features

Support for foreign keys and transactions

Foreign keys and transactions are not supported

row table lock

Row lock, only locks a certain row during operation, does not affect other rows, suitable for highly concurrent operations

Table lock, even if one record is operated, the entire table will be locked, which is not suitable for highly concurrent operations

cache

Caching indexes and data has high memory requirements, and memory size has a decisive impact on performance

Only the index is cached, the real data is not cached

focus point

Transactions: concurrent writes, transactions, larger resources

Performance: saving resources, less consumption, simple business, fast query

use by default

5.5 and later

Before 5.5

3.1.6  Read and write separation

Read-write separation : Read-write separation can effectively improve query performance. Master-slave synchronization uses bin log and relay log.

3.1.7  Sub-database and sub-table

Sub-database and sub-table : After the data volume reaches tens of millions, vertical split (sub-database), horizontal split (sub-table), vertical + horizontal split (sub-database and sub-table).

concept:

  • Only separate tables: A single table has a large amount of data, and there is a bottleneck in reading and writing. The library where this table is located can also support growth in the next few years.
  • Only sub-library: The entire database reads and writes a performance bottleneck, and the entire library is disassembled.
  • Sub-database and sub-table: A single table has a large amount of data, and performance bottlenecks also occur in the database, so it is necessary to sub-database and sub-table.
  • Split Vertically: Separate the fields. For example, the pic field of the spu table is very long. It is recommended to split the pic field into another table (same library or different library).
  • Split Horizontally: Separate records. For example, if the amount of data in a table reaches one million, we split it into four tables of 200,000.

 Split principle:

Data Volume Growth data table type Optimize the core idea
The amount of data is tens of millions, which is a relatively stable amount of data state table If you can’t disassemble, don’t disassemble and read, the demand can be expanded horizontally
The amount of data is tens of millions, and may reach billions or higher Flow meter Business splitting, design for distributed storage
The amount of data is tens of millions, and may reach billions or higher Flow meter Distributed expansion of design data statistics requirements storage
The amount of data is tens of millions, there should not be so much data configuration table Small and simple, avoid big unification

Steps for sub-database sub-table:

  1. MySQL tuning: The amount of data can be stabilized at tens of millions, and it will not reach 100 million in the next few years. In fact, there is no need to rush to dismantle it. Try MySQL tuning first to optimize read and write performance.

  2. Target evaluation: evaluate and disassemble several libraries and tables, for example: the current 2 billion, 5 years later the evaluation will be 10 billion. Divide into several tables? Divide into several libraries? Answer: A reasonable answer, 1024 tables, 16 libraries are calculated as 1024 tables, 2 million single tables are split, and 10 million after 5 years. 1024 tables*200w ≈10 billion

  3. Table split:

    • Business layer split: mixed business split into independent business, hot and cold separation

    • Data layer split:

      • Splitting by date: This method is more common, especially splitting by date dimension. In fact, the changes at the program level are small, but the benefits in terms of scalability are great.

        • Day dimension split, such as test_20191021
        • Month dimension split, such as test_201910
        • Year dimension split, such as test_2019
      • Split according to the primary key range: for example, [1,200w] the primary key is in one table, and [200w, 400w] the primary key is in one table. The advantage is that the amount of data in a single table is controllable. The disadvantage is that traffic cannot be distributed, and write operations are concentrated on the last table.

      • Intermediate table mapping: split the table at will, introduce the field value of the intermediate table record query, and which table its corresponding data is in. The advantage is flexibility. It is determined that the introduction of an intermediate table complicates the process.

      • Hash segmentation: sharding_key%N. The advantage is that the data fragmentation is uniform and the traffic is distributed. The disadvantage is that expansion requires data migration and cross-node query problems.

      • Split by partition: hash, range, etc. Not recommended, because data is actually difficult to scale horizontally.

  4. Sharding_key (sharding table field) selection: Try to select the field with the highest query frequency, and then select the field according to the table splitting method.

  5. Code transformation: Modify the query and update statements in the code to adapt to the situation after the database is divided into tables.

  6. Data migration: the simplest is downtime migration, and the more complicated one is non-downtime migration. Incremental synchronization and full synchronization must be considered.

    1. Full synchronization: For data migration from the old database to the new database, it is necessary to control the migration efficiency and solve the consistency of incremental data.

      1. Timed task: timed task to check old library and write new library
      2. Middleware: Use middleware to migrate data
    2. Incremental synchronization: During the migration of the old library to the new library, the addition, deletion, and modification commands cannot be wrongly placed in the library

      1. Synchronous double writing: synchronously write new library and old library;
      2. Asynchronous double write (recommended): write to the old library, monitor the binlog and asynchronously synchronize to the new library
      3. Middleware synchronization tool: Synchronize data to the target database table through certain rules
  7. Data consistency verification and compensation: Assuming that the asynchronous double-write scheme is adopted, after the migration is completed, the new and old database data are compared one by one. If they are consistent, they will be skipped, and if they are inconsistent, they will be compensated:

    1. The new library exists, but the old library does not exist: the new library deletes data
    2. The new library does not exist, but the old library exists: the new library inserts data
    3. New library exists, old library exists: compare all fields, if inconsistent, update the new library to the old library data
  8. Gray-scale cut reading: Gray-scale release refers to between black (old version) and white (new version), let some users continue to use the old version, some users start using the new version, if users have no opinion on the new version, gradually release all Users migrate to the new version for a smooth transition release. in principle:

    1. If there is a problem, switch back to the old library in time
    2. The grayscale volume is first slow and then fast, and each volume is observed for a period of time
    3. Support flexible rules: store dimension grayscale, hundred (thousand) point ratio grayscale
  9. Stop the old and use the new: offline the old library, read and write with the new library.

3.2 Table design optimization

3.2.1 Mixed business sub-table, hot and cold data sub-table

For example, a large task table is separated into a task table and a historical task table. After the tasks in the task table are completed, they are moved to the historical task table. The task table is hot data, and the historical task table is cold data, which improves query performance.

3.2.2 Change the joint query to an intermediate relational table

For example, the attribute table and attribute group table do not use join query, but use "attribute-attribute group table" to store the id of each attribute and "attribute relationship".

3.2.3 Follow three paradigms

Each attribute cannot be subdivided, the table must have one and only one primary key, and the non-primary key columns must directly depend on the primary key

3.2.4 Field suggestion non-null constraint

① There may be a null pointer problem in the query;

②The aggregate function is inaccurate because it ignores null

③ You can’t use “=" to judge, you can only use is null to judge;

④ Null and other value operations can only be null, which may make you accidentally treat it as 0;

⑤The null value takes up more space than the null character, the length of the null value is 0, and the length of the null value is 1bit;

⑥In the case of not covering the index, is not null cannot use the index

3.2.5 Using redundant fields

Although the column fields cannot be too many, redundant fields can be added for query efficiency

3.2.6 Data type optimization

 Integer type:

Consider the value range, and use int to ensure stability in the early stage. The non-negative type should use UNSIGNED; the same number of bytes can store a larger range of values. The primary key generally uses bigint, Boolean type tinint

Don't use the text type when you can use integers:

Large integers tend to take up less storage space than textual data.

Avoid using TEXT, BLOB data classes:

For these two large data types, temporary memory tables cannot be used for sorting, and only temporary disk tables can be used. The efficiency is very poor. It is recommended not to use them, or divide the tables into separate extended tables. LongBlob type can store 4G files;

Avoid using enumerated types:

Sorting is slow.

Use TIMESTAMP to store time:

TIMESTAMP uses 4 bytes, DATETIME uses 8 bytes, and TIMESTAMP has the characteristics of automatic assignment and automatic update. The disadvantage is that it can only be stored until 2038. The MySQL5.6.4 version can be parameterized and automatically modified to the BIGINT type.

DECIMAL stores floating point numbers:

The Decimal type is a precise floating-point number, which will not lose precision during calculation, especially for financial-related financial data. The occupied space is determined by the defined width, every 4 bytes can store 9 digits, and the decimal point occupies one byte. Can be used to store integer data larger than bigint. 

3.3 Index optimization

3.3.1 11 scenarios considering index failure

For details, please refer to:

MySQL Advanced - 11 Cases of Index Failure - Programmer Sought

 Try to match all values:

When querying age and classId and name, the (age,classId,name) index is faster than (age,classId).

Consider the leftmost prefix:

The joint index puts the frequently queried columns on the left. Index (a, b, c), can only search (a, b, c), (a, b), (a).

Primary keys should be as ordered as possible:

If the primary key is out of order, you need to find the target location and insert it, and if the data page where the target location is located is full, you must score pages, resulting in performance loss. You can choose auto-increment strategy or MySQL8.0 ordered UUID strategy.

Calculations and functions lead to index failure:

Calculations such as where num+1=2, functions such as abs(num) take the absolute value

Type conversion invalidates the index:

For example name=123 instead of name='123'. Another example is the use of different character sets.

The column index on the right side of the range condition is invalid:

For example (a, b, c) joint index, query conditions a, b, c, if b uses a range query, then the c index on the right of b is invalid. It is recommended to put the fields that require range query at the end. Ranges include: (<) (<=) (>) (>=) and between.

When there is no covering index, "not equal to" invalidates the index:

Because "not equal to" cannot be accurately matched, the efficiency of full table scanning the secondary index tree and then returning to the table is not as good as direct full table scanning of the clustered index tree. However, when using a covering index, the data volume of the joint index is small, and the space required to load it into the memory is smaller than that of the clustered index tree, and it does not need to be returned to the table. The indexing efficiency is better than that of the full table scan clustered index tree.

Covering index: An index that contains data that satisfies the query results is called a covering index, and does not require operations such as returning to the table.

When the index is not covered, the left fuzzy query causes the index to fail:

For example LIKE '%abc'. Because the beginning of the string cannot be matched exactly. Same reason as above.

When the index is not covered, is not null, not like cannot use the index:

Because it cannot be matched exactly. Same reason as above.

There are non-indexed columns before and after "OR", causing the index to fail:

In MySQL, even if the condition on the left of or is satisfied, the condition on the right still needs to be judged.

Indexing fails with different charsets:

It is recommended to use utf8mb4. Different character sets need to be converted before comparison, which will cause the index to fail. 

3.3.2 Follow the index design principles

 For details, please refer to:

MySQL Advanced - Index Creation and Design Principles - Vincewm's Blog - CSDN Blog

  1.  Naming: The number of index fields should not exceed 5, and the naming format is "idx_col1_col2"
  2. Build indexes on frequently queried columns (especially grouping, range, sorting queries);
  3. For frequently updated tables, do not create too many indexes
  4. Fields with unique characteristics are suitable for creating indexes;
  5. Very long varchar fields, suitable for creating prefix indexes based on discrimination and length;
  6. When multiple fields need to be indexed, the joint index is better than the single-value index;
  7. Avoid creating too many indexes and avoid index failure;
  8. Try to use ordered fields as the primary key index: prevent the new primary key from being moved forward to the full data page when the order is out of order, causing the data page to be split after insertion, resulting in performance loss; 

3.3.3 Connection query optimization

For details, please refer to:

MySQL Advanced - 11 Cases of Index Failure - Programmer Sought

Indexing of the driven table connection fields is given priority during outer joins:

When performing an outer join query, the right table is the driven table, and it is recommended to add an index. Because the left table checks all the data, and the right table searches by conditions, so the value of creating indexes for the condition fields of the right table is a little higher.

When inner join, the optimizer automatically drives the index table without index, and drives the large table with small table:

First, choose a table with an index as the driven table. When neither table has an index, the query optimizer will automatically let the small table drive the large table. Creating an index on the JOIN field of the driven table will greatly improve query efficiency.

The connection fields of the two tables must be of the same type:

The data types of the JOIN fields of the two tables are absolutely consistent. Prevent automatic type conversion from invalidating indexes.

3.3.4 Subquery optimization

For details, please refer to:

MySQL Advanced - 11 Cases of Index Failure - Programmer Sought

Association instead of sub-query: Associating directly with multiple tables should be as direct as possible without sub-query. (reduce the number of times of query). A subquery is the result of one SELECT query used as the condition of another SELECT statement.

#取所有不为班长的同学
SELECT a.* FROM student a WHERE a.stuno NOT IN (
    SELECT monitor FROM class b
    WHERE monitor IS NOT NULL
);
#优化成关联查询
SELECT a.* FROM student a LEFT OUTER JOIN class b 
ON a.stuno = b.monitor WHERE b.monitor IS NULL;

Multiple queries instead of subqueries: It is not recommended to use subqueries. It is recommended to disassemble the subquery SQL and combine the program with multiple queries, or use JOIN instead of subqueries.

3.3.5 Sorting optimization

For details, please refer to:

MySQL advanced articles - sorting, grouping, paging optimization - Programmer Sought

  • The optimizer automatically selects the sorting method: MySQL supports index sorting and FileSort sorting. The index ensures the order of records and has high performance. It is recommended to use. FileSort sorting is in-memory sorting. When the amount of data is large, temporary files are generated and sorted on the disk. The efficiency is low and it takes up a lot of CPU. It's not that FileSort is necessarily inefficient, it may be efficient in some cases. For example, in the case of left fuzzy, "not equal to", not null and other index failures that do not cover the index, the efficiency of full table scanning is higher than that of non-clustered index tree traversal and table return.
  • To meet the leftmost prefix: where conditions and order by fields create a joint index, the order needs to meet the leftmost prefix. For example, index (a,b,c), query where a=1 order by b,c.
  • Either all ascending or all descending: The sort order must be either all DESC or all ASC. Out of order will cause the index to fail.
  • When the number to be sorted is large, although the index is not invalid, the index efficiency is not as good as filesort: the amount of data to be sorted exceeds about 10,000, so filesort is not used for indexing. It is recommended to use limit and where to filter to reduce the amount of data. When the amount of data is large, you need to go back to the table to check all the data after sorting the index, and the performance is very poor. It is not as efficient as FileSort for sorting in memory. It does not mean that using limit will definitely use index sorting. The key is the amount of data. When the amount of data is too large, the optimizer will use FileSort to sort.
  • The sorting index on the right side of the range query is invalid: for example, index (a,b,c), query where a>1 order by b,c, resulting in b,c sorting cannot use the index, and filesort is required.
  • When there is a large amount of range query filtering, the priority is to add indexes to the range field: when the [range condition] and [group by or order by] fields appear to be optional, if there are enough filtered data but not many data to be sorted, priority is given to Put an index on the range field. In this way, even if the range query causes the sorting index to fail, the efficiency is still higher than when only the sorting field is indexed. If you can only filter a little bit, then put the index on the sort field first.
  • Tuning FileSort: When Index sorting cannot be used, the FileSort method needs to be tuned. For example, increase sort_buffer_size (sort buffer size) and max_length_for_sort_data (maximum length of sorted data)

3.3.6 Group optimization

Basically the same idea as sorting.

Sorting and grouping consumes more cpu, so don't use it if you can.

Where is more efficient than having. Where is filtering before grouping, and having is filtering after grouping.

3.3.7 Deep paging query optimization

The requirement is to return the 2000000~2000010th record

 A table with an ordered primary key is sorted according to the primary key, first filtered and then sorted: directly check the data after the range.

EXPLAIN SELECT * FROM student WHERE id > 2000000 LIMIT 10;

 Tables with unordered primary keys are sorted according to the primary key, first page the primary key, and then connect to the original table: the current table is connected to the primary key table after sorting and interception, and the connection field is the primary key. Because the primary key is checked in the clustered index tree, there is no need to return to the table, and the sorting and paging are very fast

EXPLAIN SELECT * FROM student t,(SELECT id FROM student ORDER BY id LIMIT 2000000,10) a WHERE t.id = a.id;

 Tables with ordered primary keys are sorted according to non-primary keys: get the last record x of the previous page, then all record ids of the target page number are smaller than x.id (because the reverse order, and the sorting basis is actually age, id, the primary key is self-incrementing), All record age of the target page number is less than or equal to x.age.

EXPLAIN SELECT * FROM student WHERE id<#{x.id} AND age>=#{x.age} ORDER BY age DESC LIMIT 10;

3.3.8 Try to cover the index

For details, please refer to:

MySQL Advanced - Covering Index, Prefix Index, Index Pushdown, SQL Optimization, Primary Key Design_vincewm's Blog-CSDN Blog

An index contains the data that satisfies the query result. Because there is no need to return to the table, the query efficiency is high. "Left fuzzy" and "not equal to" cannot invalidate the index when covering the index.

Example:

#没覆盖索引的情况下,左模糊查询导致索引失效
CREATE INDEX idx_age_name ON student(age, NAME);
EXPLAIN SELECT * FROM student WHERE NAME LIKE '%abc';

Covering index: An index that contains data that satisfies the query results is called a covering index, and does not require operations such as returning to the table.

Indexes are one way to find rows efficiently, but in general databases can also use indexes to find data for a column, so it doesn't have to read the entire row. After all, the index leaf nodes store the data they index; when the desired data can be obtained by reading the index, there is no need to read the row.

Covering index is a form of non-clustered index, which includes all columns used in the SELECT, JOIN and WHERE clauses in the query (that is, the indexed fields are exactly the fields involved in the covered query conditions). Simply put, the index column + primary key contains the columns queried between SELECT and FROM.

3.3.9 String prefix index

For example (email(6)), add an index to the string prefix instead of the entire string, and the length of the prefix should be chosen according to the degree of discrimination and the length.

Example:

MySQL supports prefix indexes. By default, if you create an index without specifying a prefix length, the index will contain the entire string.

mysql> alter table teacher add index index1(email);
#或
mysql> alter table teacher add index index2(email(6));

What is the difference between these two different definitions in terms of data structure and storage? The figure below is a schematic diagram of these two indexes.

If index1 is used (the index contains the entire string), the order of execution is as follows:

  1. Find the record that satisfies the index value of '[email protected]' from the index tree of index1, and obtain the value of ID2;
  2. Go back to the table to find the row whose primary key value is ID2 on the primary key, judge that the value of email is correct, and add this row record to the result set;
  3. Take the next record at the position just found on the index tree of index1, and find that the condition of email='[email protected]' is no longer satisfied, and the loop ends.

In this process, it is only necessary to retrieve data once from the primary key index, so the system considers that only one row has been scanned.

If index2 is used (the index contains the string prefix email(6)), the execution sequence is as follows:

  1. Find the record that satisfies the index value of 'zhangs' from the index2 index tree, and the first one found is ID1;
  2. Go back to the table and find out the row whose primary key value is ID1 on the primary key, and judge that the value of email is not ' [email protected] ', and discard the record in this row;
  3. Take the next record at the location just found on index2, and find that it is still 'zhangs', take out ID2, and then go back to the table to fetch the entire row on the ID index and then judge that the value is correct this time, and add this row to the result set ;
  4. Repeat the previous step until the value obtained on index2 is not 'zhangs' , the loop ends.

That is to say, using the prefix index and defining the length can save space without adding too much extra query cost. The degree of discrimination has been mentioned before, and the higher the degree of discrimination, the better . Because the higher the degree of discrimination, the fewer duplicate key values.

3.3.10 Try to use the index pushdown supported by MySQL5.6

When a field in the joint index is a fuzzy query (non-left fuzzy), after the field is judged, the next few fields can be judged directly. After the judgment is filtered, return to the table to judge the conditions of the fields that are not included in the joint index.

For example, index (name, age), query name like 'z%' and age and address, fuzzy query results in disordered age.

When querying the joint index tree, not only the name is checked, but also the subsequent age is judged. After filtering, it returns to the table to judge the address. And if the index pushdown is turned off, the following fields of the fuzzy query (non-left) in the joint index cannot be judged directly in the joint index tree, and must be judged after returning to the table.

Detailed explanation:

Index Condition Pushdown (ICP, Index Condition Pushdown) is a new feature in MySQL 5.6. It is an optimized way to use indexes to filter data at the storage engine layer.

  • If there is no ICP : When a field of the joint index is a fuzzy query (non-left fuzzy), after the field is judged, the following fields cannot be used for direct condition judgment, and the judgment must be made after returning to the table.
  • After ICP is enabled : When a field in the joint index is a fuzzy query (not left fuzzy), after the field is judged, the next few fields can be judged directly. After the judgment is filtered, return to the table to check the conditions of the fields not included in the joint index judge. The main optimization point is to filter before returning to the table to reduce the number of times to return to the table. Main application: fuzzy query (non-left fuzzy) causes the fields behind the field in the index to be out of order, and must be judged by returning to the table. However, if index pushdown is used, there is no need to return to the table, and the judgment is directly in the joint index tree.

If there is no ICP , the storage engine will traverse the index to locate the rows in the base table, and return them to the MySQL server, and the MySQL server will evaluate whether the conditions behind WHERE are reserved.
After ICP is enabled , if part of the WHERE condition can be filtered using only the columns in the index, the MySQL server will put this part of the WHERE condition into the storage engine filter. The storage engine then filters the data by using the index entries and reads rows from the table only if this condition is met.

Benefits: ICP can reduce the number of times the storage engine must access the base table and the number of times the MySQL server must access the storage engine. However, the acceleration effect of ICP depends on the proportion of data filtered by ICP in the storage engine. 

Example:

Joint indexes that do not support index pushdown: for example, index (name, age), query name like 'z%' and age=? , the fuzzy query causes the age to be out of order. When querying the joint index tree, only the name is searched, and the following ages cannot be directly judged by the condition, and the age must be judged after returning to the table.

And the joint index that supports index pushdown: for example, index (name, age), query name like 'z%' and age and address, not only check the name when querying the joint index tree, but also judge the subsequent age, filter and return Table judgment address.

CREATE INDEX idx_name_age ON student(name,age);
#索引失败;非覆盖索引时,左模糊导致索引失效
EXPLAIN SELECT * FROM student WHERE name like '%bc%' AND age=30;
#索引成功;MySQL5.6引入索引下推,where后面的name和age都在联合索引里,可以又过滤又索引,不用回表,索引生效
EXPLAIN SELECT * FROM student WHERE `name` like 'bc%' AND age=30;
#索引成功;name走索引,age用到索引下推过滤,classid不在联合索引里,需要回表。
EXPLAIN SELECT * FROM student WHERE `name` like 'bc%' AND age=30 AND classid=2;

Benefits:  In some scenarios, ICP can greatly reduce the number of table returns and improve performance. ICP can reduce the number of times the storage engine must access the base table and the number of times the MySQL server must access the storage engine. However, the acceleration effect of ICP depends on the proportion of data filtered by ICP in the storage engine .

3.3.11 For scenarios where more writes are required and fewer reads are made, try to use ordinary indexes

The efficiency of common index and unique index is almost the same when querying; the efficiency of common index is higher when updating, because there is a change buffer (write cache) to cache the updated data page in memory, and the merge operation will be executed on the next visit or periodically in the background. Data pages are written to disk.

The change buffer will be written to the redo log when the transaction is committed to ensure data persistence. Ordinary index: without any restrictions, such as create index idx_name on student(name). Unique index: The UNIQUE parameter restricts the index to be unique, such as create UNIQUE index idx_name on student(name).

Detailed explanation: 

Write cache (change buffer):

When a data page needs to be updated, if the data page is in memory, it will be updated directly, and if the data page is not in memory, InooDB will cache these update operations in the change buffer without affecting data consistency , so that this data page does not need to be read from disk. When the next query needs to access this data page, read the data page into the memory, and then execute the operations related to this page in the change buffer. In this way, the correctness of the data logic can be guaranteed.

merge: The process of applying the operation in the change buffer to the original data page to get the latest result is called merge. In addition to accessing this data page will trigger the merge, the system has a background thread that will merge periodically. The merge operation is also performed during a normal database shutdown.

If the update operation can be recorded in the change buffer first to reduce disk reads , the execution speed of the statement will be significantly improved. Moreover, reading data into memory requires the buffer pool, so this method can also avoid occupying memory and improve memory utilization.

The update of the unique index cannot use the change buffer , in fact, only ordinary indexes can be used.

Make a distinction:

  • Read data using the buffer pool buffer pool ;
  • The redo log has a redo log buffer , which is to write the updated data in the buffer pool into the redo log buffer. When the transaction is committed, the redo log buffer is flushed to the redo log file or page cache according to the flushing strategy.

3.4 SQL optimization

For details, please refer to:

MySQL Advanced - Covering Index, Prefix Index, Index Pushdown, SQL Optimization, Primary Key Design_vincewm's Blog-CSDN Blog

 Reasonable choice of EXISTS and IN:

Following the principle of small tables driving large tables, the small table on the left is EXISTS, and the large table on the left uses IN.

Try COUNT(1) or COUNT(*):

In the case of innoDB, when COUNT(1) and COUNT(*), the query optimizer will give priority to selecting the secondary index tree with the index and the smallest space for statistics. Only when the non-clustered index tree cannot be found, the clustered index will be used. Tree statistics take up a lot of space. Of course, COUNT (minimum space secondary index field) can also be used, but the trouble is not as good as automatic selection by the optimizer. When using MyISAM, it doesn't matter, which time complexity is O(1).

Try to SELECT (explicit fields):

It is recommended to specify the fields. The query optimizer takes time to parse the "*" symbol for all column names, and the "*" symbol cannot use the covering index.

Try to use "LIMIT" when scanning the whole table:

When the full table is scanned, and you know the number of records in the result set, limit it with limit, so that it will stop after scanning enough, and no longer scan the complete table. If there is an index, there is no need to use limit.

Use limit N, use less limit M, N:

Especially when the table or M is relatively large.

Split a long transaction into multiple small transactions:

Use COMMIT as much as possible, use programmatic transactions instead of declarative transactions, and reduce transaction granularity. Resources that can be released by committing a transaction: information on the rollback segment used to restore data, locks, and space in the redo/undo log buffer.

Check first and then delete:

UPDATE, DELETE statement must have a clear WHERE condition.

Try UNION ALL instead of UNION:

UNION ALL does not deduplicate and is faster.

Guess you like

Origin blog.csdn.net/qq_40991313/article/details/131059110