MySQL database optimization steps

Table of contents

1. From what aspects can database tuning be performed?

2. Optimization steps

1. View system performance parameters

2. Compare page overhead

3. Locate the SQL statement that executes slowly: query the slow log

4. View SQL execution cost: Show Profile

5. Analyze the query statement: EXPLAIN

 3. Index optimization and query optimization

1. In case of index failure

2. Correlation query optimization

3. The underlying principle of the JOIN statement

4. Subquery optimization

5. Sorting optimization

6. GROUP BY group optimization, LIMIT page optimization

7. Covering index

8. Index condition push down (ICP)

9. Other query optimization strategies

4. Other tuning strategies for the database

1. The goal of tuning

2. How to locate the tuning problem

3. Dimensions and steps of tuning

4. Optimize the MySQL server

5. Optimize the database structure

6. Large table optimization

7. Other tuning operations


1. From what aspects can database tuning be performed?

  1. Index invalidation, not fully utilizing the index - index creation
  2. There are too many JOINs in associated queries (design flaws or unavoidable requirements)--SQL optimization
  3. Server tuning and various parameter settings (buffering, number of threads, etc.)--adjust my.cnf
  4. Too much data - sub-database sub-table

        Although there are many technologies for SQL query optimization, they can be divided into physical query optimization and logical query optimization in general.

  1. Physical query optimization is optimized through technologies such as indexing and table connection methods. Here, the key point is to learn how to use indexes.
  2. Logical query optimization is to improve query efficiency through SOL equivalent transformation. To put it bluntly, another way of query writing may be more efficient.

2. Optimization steps

1. View system performance parameters

  • Connections: The number of connections to the MySQL server.
  • Uptime: The online time of the MySQL server.
  • slow_queries: the number of slow queries
  • Innodb_rows_read: The number of rows returned by the Select query
  • Innodb_rows_inserted: The number of rows inserted by the INSERT operation
  • Innodb_rows_updated: The number of rows updated by the UPDATE operation
  • Innodb_rows_deleted: The number of rows deleted by the DELETE operation
  • Com_select: The number of query operations.
  • Com_insert: The number of insert operations. For batch insert INSERT operations, only accumulate once
  • Com_update: number of update operations
  • Com_delete: The number of delete operations.
SHOW STATUS LIKE '参数';

2. Compare page overhead

  • last_query_cost: The number of pages used. (used to compare page overhead)

3. Locate the SQL statement that executes slowly: query the slow log

  • long_query_time: When the running time of the SQL statement exceeds the value of this parameter, it is called a slow query.

3.1 Turn on the slow query log (it is off by default)

# 临时修改
# 打开慢查询日志
SET slow_query_log = ON;

# 修改慢查询门槛阈值
SET GLOBAL long_query_time=秒数;
SET long_query_time=秒数;


# 永久修改
修改配置my.cnf配置文件,在[mysqld]下修改参数,然后重启服务器。

3.2 Analyzing slow query statements

# 查看已有多少条慢查询语句
SHOW variables LIKE 'slow_queries';
# 使用mysqldumpslow来查看慢查询语句。
mysqldumpslow -s -a t /var/lib/mysql/table-slow.log

4. View SQL execution cost: Show Profile

# 打开show profile功能
SET profiling = 'ON';

# 查看最近执行的查询语句
SHOW profiles;

# 查看某一条查询语句
SHOW profile for query 1;

5. Analyze the query statement: EXPLAIN

1 Basic grammar

EXPLAIN SELECT * FROM table;

DISCRIBE SELECT * FROM table;

2 Column function of EXPLAIN statement output

column name describe
id The exclusive ID corresponding to this select statement
select_type query type
table Table Name
partitions Matching Partition Information
type Access method for single table
possible_keys Indexes that may be used
key the index actually used
key_len the length of the index actually used
ref When the index column is equivalent to query, the object information for equivalent matching with the index column
rows The number of records to be read for the statement
filtered Percentage of the number of remaining records after a table is filtered by search conditions
extra some additional information

3. Four output formats of EXPLAIN

  1. traditional format
  2. JSON format
  3. TREE format
  4. Worldbench visualization output
EXPLAIN FORMAT='格式' SELECT * FROM table1;

Detailed explanation of id

  1. If the ids are the same, they can be considered as a group and executed sequentially from top to bottom
  2. In all groups, the larger the id, the higher the priority, and the earlier the execution
  3. Each number of id represents an independent query, the fewer the number of queries, the better.

Detailed explanation of select_type

type describe
SIMPLE
PRIMARY
UNION
UNION RESULT
SUBQUERY
DEPENDENT SUBQUERY
DEPENDENT UNION
MATERIALIZED
UNCACHEABLE SUBQUERY
UNCACHEABLE UNION

Detailed explanation of type

  1. The resulting values ​​from best to worst are: system > const > eq_ref > ref > fulltext > ref_or_null > index merge > unique_subquery > index_subquery > range > index > ALL
  2. Some of the more important ones are extracted (see the blue in the figure above). The goal of SQL performance: at least reach the range level, the requirement is the ref level, preferably the consts level. (Required by Alibaba Development Manual)

key_len detailed explanation

  1. Mainly for joint index
  2. The longer the length the better.

Extra details

        slightly


 3. Index optimization and query optimization

1. In case of index failure

  1. with operation
  2. use function
  3. LIKE uses %XXX left fuzzy query, because mysql is the leftmost principle, using XXX% right fuzzy query can use the index, but the left fuzzy violates the leftmost principle, so it can’t
  4. Use range operations, not in, in >, < will not work
  5. The queried field is not the leftmost field of the index, also because of the leftmost principle
  6. The field type does not match, common implicit data type conversion, mobile=1356 will not go to the index, it will be converted to a string and can be queried, but mobile='1356' will go to the index
  7. The left side of the or condition is an index field, and the right side is not. It will not take the index, because or is a union
     

General advice:

  1. For single-column indexes, Jinling chooses indexes with better filterability for the current query.
  2. When selecting a joint index, the field with the best filterability in the current query is in the order of the index fields, and the higher the position, the better.
  3. When choosing a joint index, try to choose an index that can contain more fields in the where clause in the current query.
  4. When selecting a joint index, if a field may have a range query, try to put this field at the end of the index order.
  5. In short, when writing SQL statements, try to avoid causing index failure.

2. Correlation query optimization

  1. When using JOIN, add indexes to the driven table first.
  2. For inner joins, the query optimizer can decide who is the driving table and who is the driven table. (Usually small tables drive large tables).
  3. Can directly use multi-table association as much as possible, without using subquery (reduce the number of times of query)
  4. It is not recommended to use subquery, but to separate the subquery SQL and combine the program for multiple queries, or use JOIN instead of subquery. 

3. The underlying principle of the JOIN statement

  1. Use small tables to drive large tables (the essence is to reduce the amount of data in the outer loop)
    -- 推荐写法
    select tb1.b tb2.* from tb1 straight_join tb2 on (tb1.b=tb2.b) 
    where tb2.id <= 100;

  2. Add indexes to the conditions matched by the driven table (reduce the number of loop matches in the inner table)
  3. Increase the size of the join buffer size (the more data is cached at one time, the fewer scans the inner layer contains)
  4. Reduce unnecessary field queries of the drive table (the fewer fields, the more data cached by the join buffer)
  5. Use Hash Join

4. Subquery optimization

  1. Try to use JOIN instead of subquery

5. Sorting optimization

Two sorting methods, namely FileSort and Index sorting

  1. In index sorting, the index can ensure the order of the data and does not need to be sorted, which is highly efficient and consumes less resources.
  2. FileSort sorting is generally performed in memory and takes up a lot of CPU. If the result to be sorted is large, it will even send IO to the disk for sorting, which is inefficient.

optimization suggestion

  1. In SQL, indexes can be used in the where clause and the order by clause. The purpose is to avoid full table scanning in the WHERE clause and to avoid using FileSort sorting in the ORDER BY clause.
  2. Try to use Index to complete Order BY sorting. If the WHERE and ORDER BY are followed by the same column, a single-column index is used, and if not, a joint index is used.
  3. When Index cannot be used, the FileSort method needs to be tuned.
    1. Increase sort_buffer_size
    2. Increase max_length_for_sort_data
    3. Do not select * when using Order BY
  4. Avoid index invalidation, such as ascending and descending order doping, loss of the leftmost index, loss of the middle index, use of non-index sorting, use of range queries such as IN().

6. GROUP BY group optimization, LIMIT page optimization

GROUP BY group optimization

  1. The principle of using index by group by is almost the same as that of order by. Group by can use index directly even if there is no filter bar to use index. Group by sorts first and then groups, following the best left prefix rule for index building
  2. When index columns cannot be used, increase the settings of max_length_for_sort_data and sort_buffer_size parameters
  3. where is more efficient than having, if you can write in the conditions limited by where, don’t write in having
  4. Reduce the use of order by, and communicate with business without sorting, or put the sorting on the terminal. Statements such as order by, group by, and distinct consume more CPU, and the CPU resources of the database are extremely precious.
  5. For statements including order by, group by, and distinct, the result set filtered by the where condition should be kept within 1000 rows, otherwise the SQL will be very slow.

LIMIT page optimization

# 优化之前,不推荐
SELECT * FROM tb1 LIMIT 2000000, 10;

# 优化一,在索引上完成分页,然后根据主键回表
SELECT * FROM tb1 t1, 
(SELECT id FROM tb1 ORDER BY id LIMIT 2000000, 10) t2
WHERE t1.id = t2.id;

# 优化二,如果主键是自增的,那么可以直接使用WHERE定位到具体位置
SELECT * FROM tb1 WHERE id > 2000000 LIMIT 10;    

7. Covering index

Definition: When the index already contains the information required for the query, there is no need to return the table.

benefit:

  1. Avoid the secondary query of the InnoDB table for indexing (back to the table)
  2. Random IO can be programmed into sequential IO to speed up query efficiency

8. Index condition push down (ICP)

Explanation: When using a non-clustered index, the query statement is filtered multiple times before returning to the table to reduce the amount of data returned to the table.

Conditions of Use:

  1. ICP can be used if the type of table access is range, ref, eq_ref and ref_or_null
  2. ICP can be used for InnoDB and MyISAM tables, including partition tables InnoDB and MyISAM tables
  3. For InnoDB tables, ICP is only used for secondary indexes. The goal of ICP is to reduce the number of full row reads, thereby reducing I/O operations. When SQL uses covering indexes, ICP is not supported. Because using ICP in this case will not reduce I/O. 4.
  4. Conditions for correlated subqueries cannot use ICP

9. Other query optimization strategies

  1. The difference between EXISTS and IN
    # 当B表小时,使用IN
    SELECT * FROM A WHERE cc IN (SELECT cc FROM B);
    
    # 当A表小时,使用EXISTS
    SELECT * FROM A WHERE EXISTS (SELECT cc FROM B WHERE B.cc = A.cc);
  2. There is a difference between COUNT(*) and COUNT, MYISAM and InnoDB

4. Other tuning strategies for the database

1. The goal of tuning

  1. Save system resources as much as possible so that the system can provide services with a greater load. (larger amount)
  2. Reasonable structural design and parameter adjustment to improve the speed of user operation response. (faster response)
  3. Reduce the bottleneck of the system and improve the overall performance of the MySQL database.

2. How to locate the tuning problem

  1. User Feedback (Main)
  2. Log analysis (mainly
  3. Server resource usage monitoring
  4. Database internal status monitoring

3. Dimensions and steps of tuning

  1. First choose an appropriate database.
  2. Optimizing Table Design
    1. The table structure should follow the principle of three paradigms as much as possible
    2. If there are many queries, especially when multiple tables are jointly queried, the anti-paradigm can be used to improve the efficiency of the query.
    3. Choice of data type.
  3. Optimize logical query
  4. Optimizing Physical Queries
  5. Use redis or memcached as cache
  6. library-level optimization
    1. read-write separation
    2. data sharding

4. Optimize the MySQL server

Optimize server hardware

  1. Configure larger memory, reduce the number of disk IOs, or increase the buffer capacity.
  2. The configuration tells the disk system
  3. Reasonable allocation of disk IO
  4. Configure multiprocessor

Optimize MySQL parameters

  1. innodb_buffer_pool_size: Maximum cache for tables and indexes
  2. key_buffer_size: so the buffer size
  3. table_cache: the number of tables opened at the same time
  4. query_cache_size: The size of the query buffer.
  5. query_cache_type: involves whether to use the query cache
  6. sort_buffer_size: The size of the buffer allocated by the thread that needs to be sorted
  7. join_buffer_size = 8M: The buffer size that can be used by the joint query operation
  8. read_bufer_size: The size of the buffer allocated for each table scanned when each thread scans continuously
  9. innodb_flush_log_at_trx_commit: When to write buffer data to the log file
  10. innodb_log_buffer_size: the buffer used by the transaction log
  11. max_connections: the maximum number of connections allowed to MySQL
  12. back_log: Control the backlog request stack size set when listening to the TCP port
  13. thread_cache_size: The size of the thread pool cache thread number
  14. wait_timeout: the maximum connection time for a request
  15. interactive_timeout: Indicates the number of seconds the server waits for action before closing the connection

5. Optimize the database structure

  1. Split table: hot and cold data separation
  2. Add intermediate table
  3. Add redundant fields
  4. Optimizing Data Types 
    1. Optimizations for Integer Types
    2. choose between text type and integer type, integer type is preferred
    3. Avoid TEXT, BLOB data types
    4. Avoid using ENUM because ORDER BY is inefficient
    5. Use timestamps to store time
    6. Use DECIMAL fixed-point numbers instead of floating-point numbers
  5. Optimize the speed of inserting records
    1. Disable indexing early
    2. Disabling uniqueness checks early
    3. Use bulk insert
    4. Try to use LOAD DATA INFLE instead of INSERT
    5. Disable foreign key checks early
    6. Disable autocommit early
  6. Use not-null constraints
  7. Analysis table, check table, optimization table
    # 分析表,立即更新表索引的区分度
    ANALYZE TABLE tb1;
    
    # 检查表
    CHECK TABLE 
    
    # 优化表,但只优化字节数多的类型
    OPTIMIZE TABLE

  8.  The above methods have advantages and disadvantages, and need to be carefully optimized by weighing the advantages and disadvantages.

6. Large table optimization

  1. Limit the scope of the query
  2. read-write separation
  3. Vertical sub-library, vertical sub-table
  4. split horizontally

7. Other tuning operations

  1. Server statement timeout handling
  2. Create a global common tablespace
  3. hidden index

Guess you like

Origin blog.csdn.net/iuu77/article/details/128996958