This article explains the underlying principles of the database and discusses the optimization of Mysql.

I watched a movie some time ago called "The Age of Heroes", and there was a sentence in it that said: "Life is such a dog, it chases me so hard that I don't even have time to pee calmly."

In this era when smart people are running around on the streets, what is scarce is not intelligence, but single-mindedness, desperation, one heart and one muscle.

The most terrifying thing in life is to live a life of mediocrity and still comfort yourself that it is ordinary and valuable.

Database tuning

Slow query log

The slow query log is a built-in function of Mysql that can record Sql statements that execute for more than a specified time.

Considering that there are many details in the slow query log, I have specially recorded some notes here. Let’s take a look at them:

Related parameters and default values

parameter effect default value
log_output Where to output the log, the default is FILE, which means a file; if set to TABLE, the log will be recorded to mysql.slow_log. It is also possible to set multiple formats, such as:FILE,TABLE FILE
long_query_time The execution time exceeds this long before the slow query log is recorded. The unit is seconds. You can use decimals to express times less than seconds. 10
log_queries_not_using_indexes Whether you want to record unused index SQL into the slow query log, this configuration will ignore the log_query_time configuration. It is recommended to turn it off in the production environment; it is recommended to turn it on in the development environment. OFF
log_throttle_queries_not_using_indexes Used in conjunction with log_queries_not_using_indexes, if log_queries_not_using_indexes is turned on, this parameter will limit the number of unused indexed SQL written per minute. 0
min_examined_row_limit Slow query logs are recorded only when the number of scanned rows reaches at least this many 0
log_slow_admin_statements Whether to record management statements, it is turned off by default. Management statements include ALERT TABLE, ANALYZE TABLE, CHECK TABLE, CREATE INDEX, DROP INDEX, OPTIMIZE TABLE, and REPAIR TABLE OFF
slow_query_log_file Specify slow query log file path /varpath
log_slow_slave_statements This parameter is set on the slave database and determines whether to record SQL that exceeds long_query_time during the replication process. If the binlog format is row, this parameter is invalid OFF
log_slow_extra When log_output=FILE, whether to record additional information (available starting from MySql 8.0.14) has no effect on the result of log_output=TABLE. OFF

Usage

Method 1: Modify the configuration file my.cnf and add the following parameters in the [mysqld] paragraph
For example:

[mysqld]
#...
log_output = 'FILE,TABLE'
slow_query_log = ON
long_query_time = 0.001

Then restart mysql

service mysqld restart

Method 2: Set through global variables

This method can take effect without restarting, but once restarted, the configuration will become invalid.

For example:

set global log_output = 'FILE,TABLE';
set global slow_query_log = 'NO';
set global long_query_time = 0.001;

After this setting, the slow query log will be recorded to the file and the mysql.slow_log table at the same time.

Analyze slow query logs

Analyzing slow query log tables

When log_output = TABLE, you can directly use the following statement to analyze:

select * from `mysql`.slow_log

Then do various queries, statistics, and analysis according to the conditions.

Analyze slow query log files

mysqldumpslow
When log_output = FILE, you can use mysqldumpslow analysis

[server@server-test ~]$ mysqldumpslow --help
Usage: mysqldumpslow [ OPTS... ] [ LOGS... ]

Parse and summarize the MySQL slow query log. Options are

  --verbose    verbose
  --debug      debug
  --help       write this text to standard output

  -v           展示更详细的信息
  -d           debug
  -s ORDER     以哪种方式排序 (al, at, ar, c, l, r, t), 默认 'at' 
                al: 平均锁定时间
                ar: 平均返回记录数
                at: 平均查询时间
                 c: 访问计数
                 l: 锁定时间
                 r: 返回记录
                 t: 查询时间
  -r           将-s的排序倒序
  -t NUM       top n的意思,展示最前面的几条
  -a           不去将数字展示成N,将字符串展示成'S'
  -n NUM       abstract numbers with at least n digits within names
  -g PATTERN   后边可以写一个正则,只有符合正则的行会展示
  -h HOSTNAME  慢查询日志以 主机名 =slow.log的格式命名,-h可指定读取指定主机名的慢查询日志,默认情况下是*,读取所有的慢查询日志
  -i NAME      Mysql Server的实例名称(如果使用了mysql.server startup脚本的话)
  -l           不将锁定时间从总时间减去

pt-query-digest
We can also use pt-query-digest to analyze slow query log files. pt-query-digest is a tool developed by Percona and is one of the tools used in the Percona Toolkit tool suite. I won’t explore it in detail here. Interested students can do their own research.

EXPLAIN detailed explanation

explain can help us analyze the execution plan of mysql,

EXPLAINUse

explain can be used to analyze SQL execution plans.

Let’s demonstrate:
Insert image description here
So we can know

SELECT * from user_info where nickname = 'Ant'

How is this Sql statement executed?

Result output display:

Field The name when format=json meaning
id select_id The unique identifier of the statement
select_type none Query type
table table_name Table Name
partitions partitions matching partition
type access_type Connection Type
possible_keys possible_keys Possible index choices
key key actual selected index
key_len key_length length of index
ref ref Which column of the index is referenced
rows rows Estimated rows to scan
filtered filtered Indicates the percentage of data that meets the query conditions
Extra No Additional Information

Next, let’s analyze the execution results:
Insert image description here
First, the select_type here is SIMPLE, which means that this is a simple query, table means that the query table is user_info, and type is ALL, which means that it occurs. After a full table scan, possible_keys, key, and key_len are all empty, indicating that no index is used. rows indicates that executing this SQL requires scanning more than 250,000 data to return. filtered is 10%. Finally, Extra is Using where, indicating that the where condition is used.

According to this analysis, the performance of this SQL is relatively poor.
Let’s execute it and see:
Insert image description here
We see that it takes more than 800 milliseconds, and the performance is indeed not good.

Let’s take a look at the following demonstration:

Insert image description here
From here we can find that after explain is executed, two rows of results are displayed. This id field is useful when there are multiple results. It can describe the execution process of sql. If the execution result of an explain contains multiple ID values, such as id=1 and id=2, the larger the number, the first to execute; and for rows with the same ID, such as the IDs in the above figure are all 1, they will be executed from top to bottom. implement.

SQL performance analysis

Generally speaking, using Explain can meet the needs of analyzing SQL in most scenarios. However, if you want to analyze SQL in more detail, Explain may not be enough. Then let’s discuss how to go deep into Sql to understand a Sql. Which steps were performed, how long each step took, and where the performance bottleneck occurred.

How to go deep inside SQL and analyze its performance includes three methods:

  • SHOW PROFILE
  • INFORMATION_SCHEMA.PROFILING
  • PERFORMANCE_SCHEMA
SHOW PROFILE

SHOW PROFILE is a performance analysis command of MySQL that can track various resource consumption of SQL. The usage format is as follows:

SHOW PROFILE [type [, type] ... ]
    [FOR QUERY n]
    [LIMIT row_count [OFFSET offset]]

type: {
    ALL                     显示所有信息
  | BLOCK IO                显示阻塞的输入输出次数
  | CONTEXT SWITCHES				显示自愿及非自愿的上下文切换次数
  | CPU											显示用户与系统CPU使用时间
  | IPC											显示消息发送与接收的次数
  | MEMORY									显示内存相关的开销,目前未实现此功能
  | PAGE FAULTS							显示页错误相关开销信息
  | SOURCE									列出相应操作对应的函数名及其在源码中的位置()
  | SWAPS										显示swap交换次数
}

By default, SHOW PROFILE only displays two columns: Status and Duration. If you want to display more information, you can specify type.

  • Use the following command to check whether the SHOW PROFILE function is supported. The yes flag supports it. Starting from MySQL 5.0.37, MySQL supports SHOW PROFILE

    select @@have_profiling;
    
  • Check whether SHOW PROFILE is currently enabled, 0 means not enabled, 1 means enabled

    select @@profiling;
    
  • Use the following command to enable or disable performance analysis for the current session. Set to 1 to enable it and to 0 to disable it.

    set profiling = 1;
    
  • Use the SHOW PROFILES command to perform a summary performance analysis of the recently sent SQL statements. The number of entries displayed is controlled by the profiling_history_size session variable, which has a default value of 15. The maximum value is 100. Setting the value to 0 has the practical effect of disabling analysis.

  • – Display 15 items by default

    show profiles
    
    -- 使用profiling_history_size调整展示的条目数
    set profiling_history_size = 100;
    
  • Use show profile to analyze the specified query:

    mysql> SHOW PROFILES;
    +----------+----------+--------------------------+
    | Query_ID | Duration | Query                    |
    +----------+----------+--------------------------+
    |        0 | 0.000088 | SET PROFILING = 1        |
    |        1 | 0.000136 | DROP TABLE IF EXISTS t1  |
    |        2 | 0.011947 | CREATE TABLE t1 (id INT) |
    +----------+----------+--------------------------+
    3 rows in set (0.00 sec)
    
    mysql> SHOW PROFILE;
    +----------------------+----------+
    | Status               | Duration |
    +----------------------+----------+
    | checking permissions | 0.000040 |
    | creating table       | 0.000056 |
    | After create         | 0.011363 |
    | query end            | 0.000375 |
    | freeing items        | 0.000089 |
    | logging slow query   | 0.000019 |
    | cleaning up          | 0.000005 |
    +----------------------+----------+
    7 rows in set (0.00 sec)
    
    -- 默认情况下,只展示Status和Duration两列,如果想展示更多信息,可指定type。
    mysql> SHOW PROFILE FOR QUERY 1;
    +--------------------+----------+
    | Status             | Duration |
    +--------------------+----------+
    | query end          | 0.000107 |
    | freeing items      | 0.000008 |
    | logging slow query | 0.000015 |
    | cleaning up        | 0.000006 |
    +--------------------+----------+
    4 rows in set (0.00 sec)
    
    -- 展示CPU相关的开销
    mysql> SHOW PROFILE CPU FOR QUERY 2;
    +----------------------+----------+----------+------------+
    | Status               | Duration | CPU_user | CPU_system |
    +----------------------+----------+----------+------------+
    | checking permissions | 0.000040 | 0.000038 |   0.000002 |
    | creating table       | 0.000056 | 0.000028 |   0.000028 |
    | After create         | 0.011363 | 0.000217 |   0.001571 |
    | query end            | 0.000375 | 0.000013 |   0.000028 |
    | freeing items        | 0.000089 | 0.000010 |   0.000014 |
    | logging slow query   | 0.000019 | 0.000009 |   0.000010 |
    | cleaning up          | 0.000005 | 0.000003 |   0.000002 |
    +----------------------+----------+----------+------------+
    7 rows in set (0.00 sec) ```
    
    
  • After the analysis is completed, remember to turn off the SHOW PROFILE function:

    set profiling = 0;
    

TIPS

  • The official MySQL documentation states that SHOW PROFILE has been deprecated and recommends using Performance Schema as a replacement.
  • On some systems, performance analysis is only partially available. For example, some functions are invalid under Windows systems (show profile uses the getrusage() API, which will return false on Windows because Windows does not support this API); in addition, performance analysis is at the process level, not the thread level. Yes, this means that the activity of other threads may affect the timing information you see.
INFORMATION_SCHEMA.PROFILING

INFORMATION_SCHEMA.PROFILING is used for performance analysis. Its content corresponds to the information generated by the SHOW PROFILE and SHOW PROFILES statements. The table will not have any data unless set profiling = 1; is set. The table includes the following fields:

  • QUERY_ID: unique identifier of the statement
  • SEQ: a sequence number showing the display order of rows with the same QUERY_ID value
  • STATE: analysis status
  • DURATION: How long it lasted in this state (seconds)
  • CPU_USER, CPU_SYSTEM: User and system CPU usage (seconds)
  • CONTEXT_VOLUNTARY, CONTEXT_INVOLUNTARY: How many voluntary and involuntary context transitions occurred
  • BLOCK_OPS_IN, BLOCK_OPS_OUT: Number of block input and output operations
  • MESSAGES_SENT, MESSAGES_RECEIVED: Number of messages sent and received
  • PAGE_FAULTS_MAJOR, PAGE_FAULTS_MINOR: Major and minor page fault information
  • SWAPS: How many SWAPs occurred
  • SOURCE_FUNCTION, SOURCE_FILE, SOURCE_LINE: Where in the source code is the current state executed?

TIPS

  • SHOW PROFILE essentially uses INFORMATION_SCHEMA.PROFILING
  • The INFORMATION_SCHEMA.PROFILING table has been deprecated and may be deleted in the future. Performance Schema will be used instead in the future. For details, see "Query Profiling Using Performance Schema"
  • The following two SQLs are equivalent:
    SHOW PROFILE FOR QUERY 2;
    
    SELECT STATE, FORMAT(DURATION, 6) AS DURATION
    FROM INFORMATION_SCHEMA.PROFILING
    WHERE QUERY_ID = 2 ORDER BY SEQ;
    
PERFORMANCE_SCHEMA

PERFORMANCE_SCHEMA is the performance analysis method recommended by MySQL. SHOW PROFILE and INFORMATION_SCHEMA.PROFILING will be abandoned in the future. According to the author's research, PERFORMANCE_SCHEMA was introduced in MySQL 5.6, therefore, it can only be used in MySQL 5.6 and higher versions. You can use SHOW VARIABLES LIKE 'performance_schema'; to check the activation status. It is enabled by default starting from MySQL 5.7.

Let's use PERFORMANCE_SCHEMA to achieve similar effects to SHOW PROFILE:

  • Check whether performance monitoring is enabled

    mysql> SELECT * FROM performance_schema.setup_actors;
    +------+------+------+---------+---------+
    | HOST | USER | ROLE | ENABLED | HISTORY |
    +------+------+------+---------+---------+
    | %    | %    | %    | YES     | YES     |
    +------+------+------+---------+---------+
    

    The default is enabled.

  • You can also execute SQL statements similar to the following to monitor only the SQL executed by the specified user:

    mysql> UPDATE performance_schema.setup_actors
      	 SET ENABLED = 'NO', HISTORY = 'NO'
       	WHERE HOST = '%' AND USER = '%';
    
    mysql> INSERT INTO performance_schema.setup_actors
           (HOST,USER,ROLE,ENABLED,HISTORY)
           VALUES('localhost','test_user','%','YES','YES');
    

    In this way, only the SQL sent by the test_user user on the localhost machine will be monitored. SQL sent from other hosts and other users will not be monitored.

  • Execute the following SQL statement to enable relevant monitoring items:

    mysql> UPDATE performance_schema.setup_instruments
           SET ENABLED = 'YES', TIMED = 'YES'
           WHERE NAME LIKE '%statement/%';
           
    mysql> UPDATE performance_schema.setup_instruments
           SET ENABLED = 'YES', TIMED = 'YES'
           WHERE NAME LIKE '%stage/%';
           	       
    mysql> UPDATE performance_schema.setup_consumers
           SET ENABLED = 'YES'
           WHERE NAME LIKE '%events_statements_%';
           	
    mysql> UPDATE performance_schema.setup_consumers
           SET ENABLED = 'YES'
           WHERE NAME LIKE '%events_stages_%';  
           
    
  • Use the user who turns on monitoring to execute SQL statements, such as:

    mysql> SELECT * FROM employees.employees WHERE emp_no = 10001;
    +--------+------------+------------+-----------+--------+------------+
    | emp_no | birth_date | first_name | last_name | gender | hire_date |
    +--------+------------+------------+-----------+--------+------------+
    |  10001 | 1953-09-02 | Georgi     | Facello   | M      | 1986-06-26 |
    +--------+------------+------------+-----------+--------+------------+
    
  • Execute the following SQL to obtain the EVENT_ID of the statement.

    mysql> SELECT EVENT_ID, TRUNCATE(TIMER_WAIT/1000000000000,6) as Duration, SQL_TEXT
           FROM performance_schema.events_statements_history_long WHERE SQL_TEXT like '%10001%';
    +----------+----------+--------------------------------------------------------+
    | event_id | duration | sql_text                                               |
    +----------+----------+--------------------------------------------------------+
    |       31 | 0.028310 | SELECT * FROM employees.employees WHERE emp_no = 10001 |
    +----------+----------+--------------------------------------------------------+
    

    This step is similar to SHOW PROFILES.

  • Execute the following SQL statement for performance analysis, so that you can know the information at various stages of this statement.

    mysql> SELECT event_name AS Stage, TRUNCATE(TIMER_WAIT/1000000000000,6) AS Duration
           FROM performance_schema.events_stages_history_long WHERE NESTING_EVENT_ID=31;
    +--------------------------------+----------+
    | Stage                          | Duration |
    +--------------------------------+----------+
    | stage/sql/starting             | 0.000080 |
    | stage/sql/checking permissions | 0.000005 |
    | stage/sql/Opening tables       | 0.027759 |
    | stage/sql/init                 | 0.000052 |
    | stage/sql/System lock          | 0.000009 |
    | stage/sql/optimizing           | 0.000006 |
    | stage/sql/statistics           | 0.000082 |
    | stage/sql/preparing            | 0.000008 |
    | stage/sql/executing            | 0.000000 |
    | stage/sql/Sending data         | 0.000017 |
    | stage/sql/end                  | 0.000001 |
    | stage/sql/query end            | 0.000004 |
    | stage/sql/closing tables       | 0.000006 |
    | stage/sql/freeing items        | 0.000272 |
    | stage/sql/cleaning up          | 0.000001 |
    +--------------------------------+----------+
    

OPTIMIZER_TRACE

Let’s talk about another artifact for analyzing SQL: OPTIMIZER_TRACE
Translated into Chinese it is called “optimizer tracking”.

It can track various decisions made by the optimizer and understand the execution details of the optimizer, thereby helping us understand the execution process of SQL and optimize SQL.

OPTIMIZER_TRACE is a function introduced in MySQL 5.6. This function is turned off by default. After it is turned on, the following statements can be analyzed:

  • SELECT
  • INSERT
  • REPLACE
  • UPDATE
  • DELETE
  • EXPLAIN
  • SET
  • DECLARE
  • CASE
  • IF
  • RETURN
  • CALL

OPTIMIZER_TRACE related parameters

参考 https://dev.mysql.com/doc/internals/en/system-variables-controlling-trace.html

  • –optimizer-trace
    • –optimizer-trace: master switch, default value:enabled=off,one_line=off
    • enabled: Whether to turn on optimizer_trace; on means turned on, off means turned off.
    • one_line: Whether to enable single-line storage. on means on; off means off. It will be stored in standard JSON format. Setting it to on will give you good formatting, setting it to off will save some space.
  • optimizer_trace_features
    • Control the content tracked by optimizer_trace. Default value: greedy_search=on,range_optimizer=on,dynamic_range=on,repeated_subselect=on, which means turning on all tracking items.
    • greedy_search: Whether to track greedy search. For details on the greedy algorithm, see:https://blog.csdn.net/qq_37763204/article/details/79289532
    • range_optimizer: whether to track the range optimizer
    • dynamic_range: whether to track dynamic range optimization
    • repeated_subselect: Whether to track subqueries. If set to off, only the execution of the first Item_subselect will be tracked.
  • optimizer_trace_limit: Controls how many results optimizer_trace displays, default 1
  • optimizer_trace_max_mem_size: Maximum memory allowed by optimizer_trace stack information, default 1048586
  • optimizer_trace_offset: The offset of the first optimizer trace to be displayed, default -1.
  • end_markers_in_json: If the JSON structure is large, it can be difficult to pair the right and left brackets. To help readers read, it can be set to on, which will add comments near the closing bracket. The default is off.

The above parameters can be used as SET statement operations. For example, use the following command to open OPTIMIZER TRACE

SET OPTIMIZER_TRACE="enabled=on",END_MARKERS_IN_JSON=on;

You can also use SET GLOBAL to turn it on globally, but even if it is turned on globally, each Session can only track its own executed statements:

SET GLOBAL OPTIMIZER_TRACE="enabled=on",END_MARKERS_IN_JSON=on

The two parameters optimizer_trace_limit and optimizer_trace_offset are often used together, for example:

SET optimizer_trace_offset=<OFFSET>,optimizer_trace_limit=<LIMIT>

These two parameters are used together, somewhat similar to the limit statement in MySQL.
By default, since optimizer_trace_offset=-1 and optimizer_trace_limit=1, the most recent SQL statement is recorded and displayed one piece of data at a time;

OPTIMIZER_TRACE use

  1. Turn on the OPTIMIZER_TRACE function and set the number of data entries to be displayed:
    SET OPTIMIZER_TRACE="enabled=on",END_MARKERS_IN_JSON=on;
    SET optimizer_trace_offset=-30,optimizer_trace_limit=30;
    
  2. Send the query you want to analyze, for example:
    select *
    from user_info 
    where nickname = 'Ant'
    and ctime > '2021-02-01'
    
  3. Using the following statement analysis, you can obtain results similar to the following:
    mysql> SELECT * FROM INFORMATION_SCHEMA.OPTIMIZER_TRACE limit 30 ;
    

Insert image description here
The QUERY column will show what SQL was executed, and TRACE is the result of Object. It is a long JSON. I copied the result:

{
    
    
  "steps": [
    {
    
    
      "join_preparation": {
    
    
        "select#": 1,
        "steps": [
          {
    
    
            "expanded_query": "/* select#1 */ select `user_info`.`id` AS `id`,`user_info`.`username` AS `username`,`user_info`.`password` AS `password`,`user_info`.`real_name` AS `real_name`,`user_info`.`sex` AS `sex`,`user_info`.`birthday` AS `birthday`,`user_info`.`card_id` AS `card_id`,`user_info`.`mark` AS `mark`,`user_info`.`partner_id` AS `partner_id`,`user_info`.`group_id` AS `group_id`,`user_info`.`nickname` AS `nickname`,`user_info`.`avatar` AS `avatar`,`user_info`.`phone` AS `phone`,`user_info`.`add_ip` AS `add_ip`,`user_info`.`last_time` AS `last_time`,`user_info`.`last_ip` AS `last_ip`,`user_info`.`now_money` AS `now_money`,`user_info`.`brokerage_price` AS `brokerage_price`,`user_info`.`integral` AS `integral`,`user_info`.`sign_num` AS `sign_num`,`user_info`.`status` AS `status`,`user_info`.`level` AS `level`,`user_info`.`spread_uid` AS `spread_uid`,`user_info`.`spread_time` AS `spread_time`,`user_info`.`user_type` AS `user_type`,`user_info`.`is_promoter` AS `is_promoter`,`user_info`.`pay_count` AS `pay_count`,`user_info`.`spread_count` AS `spread_count`,`user_info`.`clean_time` AS `clean_time`,`user_info`.`addres` AS `addres`,`user_info`.`adminid` AS `adminid`,`user_info`.`login_type` AS `login_type`,`user_info`.`union_id` AS `union_id`,`user_info`.`open_id` AS `open_id`,`user_info`.`superior_user_id` AS `superior_user_id`,`user_info`.`is_indentor` AS `is_indentor`,`user_info`.`indentor_level_name` AS `indentor_level_name`,`user_info`.`direct_superior_user_id` AS `direct_superior_user_id`,`user_info`.`member_level_name` AS `member_level_name`,`user_info`.`upgrade_time` AS `upgrade_time`,`user_info`.`password_app` AS `password_app`,`user_info`.`store_name` AS `store_name`,`user_info`.`rank_indentor_id` AS `rank_indentor_id`,`user_info`.`rank_member_id` AS `rank_member_id`,`user_info`.`manage_pending` AS `manage_pending`,`user_info`.`manage_done` AS `manage_done`,`user_info`.`develop_pending` AS `develop_pending`,`user_info`.`develop_done` AS `develop_done`,`user_info`.`range_pending` AS `range_pending`,`user_info`.`range_done` AS `range_done`,`user_info`.`corpus_pending` AS `corpus_pending`,`user_info`.`corpus_done` AS `corpus_done`,`user_info`.`yeji_pending` AS `yeji_pending`,`user_info`.`yeji_done` AS `yeji_done`,`user_info`.`commission_pending` AS `commission_pending`,`user_info`.`commission_done` AS `commission_done`,`user_info`.`can_edit_material` AS `can_edit_material`,`user_info`.`poster_url` AS `poster_url`,`user_info`.`ctime` AS `ctime`,`user_info`.`mtime` AS `mtime`,`user_info`.`health_vip_ctime` AS `health_vip_ctime`,`user_info`.`makeup_vip_ctime` AS `makeup_vip_ctime`,`user_info`.`purchase_balance` AS `purchase_balance`,`user_info`.`ay_card_money` AS `ay_card_money`,`user_info`.`other_rank_id` AS `other_rank_id`,`user_info`.`superior_other_id` AS `superior_other_id`,`user_info`.`star_health_vip_ctime` AS `star_health_vip_ctime`,`user_info`.`star_makeup_vip_ctime` AS `star_makeup_vip_ctime`,`user_info`.`lock_money` AS `lock_money` from `user_info` where ((`user_info`.`nickname` = 'Ant') and (`user_info`.`ctime` > '2021-02-01'))"
          }
        ] /* steps */
      } /* join_preparation */
    },
    {
    
    
      "join_optimization": {
    
    
        "select#": 1,
        "steps": [
          {
    
    
            "condition_processing": {
    
    
              "condition": "WHERE",
              "original_condition": "((`user_info`.`nickname` = 'Ant') and (`user_info`.`ctime` > '2021-02-01'))",
              "steps": [
                {
    
    
                  "transformation": "equality_propagation",
                  "resulting_condition": "((`user_info`.`nickname` = 'Ant') and (`user_info`.`ctime` > '2021-02-01'))"
                },
                {
    
    
                  "transformation": "constant_propagation",
                  "resulting_condition": "((`user_info`.`nickname` = 'Ant') and (`user_info`.`ctime` > '2021-02-01'))"
                },
                {
    
    
                  "transformation": "trivial_condition_removal",
                  "resulting_condition": "((`user_info`.`nickname` = 'Ant') and (`user_info`.`ctime` > '2021-02-01'))"
                }
              ] /* steps */
            } /* condition_processing */
          },
          {
    
    
            "substitute_generated_columns": {
    
    
            } /* substitute_generated_columns */
          },
          {
    
    
            "table_dependencies": [
              {
    
    
                "table": "`user_info`",
                "row_may_be_null": false,
                "map_bit": 0,
                "depends_on_map_bits": [
                ] /* depends_on_map_bits */
              }
            ] /* table_dependencies */
          },
          {
    
    
            "ref_optimizer_key_uses": [
            ] /* ref_optimizer_key_uses */
          },
          {
    
    
            "rows_estimation": [
              {
    
    
                "table": "`user_info`",
                "table_scan": {
    
    
                  "rows": 229694,
                  "cost": 8441
                } /* table_scan */
              }
            ] /* rows_estimation */
          },
          {
    
    
            "considered_execution_plans": [
              {
    
    
                "plan_prefix": [
                ] /* plan_prefix */,
                "table": "`user_info`",
                "best_access_path": {
    
    
                  "considered_access_paths": [
                    {
    
    
                      "rows_to_scan": 229694,
                      "access_type": "scan",
                      "resulting_rows": 229694,
                      "cost": 54380,
                      "chosen": true
                    }
                  ] /* considered_access_paths */
                } /* best_access_path */,
                "condition_filtering_pct": 100,
                "rows_for_plan": 229694,
                "cost_for_plan": 54380,
                "chosen": true
              }
            ] /* considered_execution_plans */
          },
          {
    
    
            "attaching_conditions_to_tables": {
    
    
              "original_condition": "((`user_info`.`nickname` = 'Ant') and (`user_info`.`ctime` > '2021-02-01'))",
              "attached_conditions_computation": [
              ] /* attached_conditions_computation */,
              "attached_conditions_summary": [
                {
    
    
                  "table": "`user_info`",
                  "attached": "((`user_info`.`nickname` = 'Ant') and (`user_info`.`ctime` > '2021-02-01'))"
                }
              ] /* attached_conditions_summary */
            } /* attaching_conditions_to_tables */
          },
          {
    
    
            "refine_plan": [
              {
    
    
                "table": "`user_info`"
              }
            ] /* refine_plan */
          }
        ] /* steps */
      } /* join_optimization */
    },
    {
    
    
      "join_execution": {
    
    
        "select#": 1,
        "steps": [
        ] /* steps */
      } /* join_execution */
    }
  ] /* steps */
}

We can see that the entire JSON is divided into three parts, join_preparation: preparation phase; join_optimization: optimization phase; join_execution: execution phase

After understanding, it is not difficult to find the power of OPTIMIZER_TRACE. It can analyze the execution details of sql and tell us various overheads. If you want to go deep into the inside of sql when doing sql tuning, you can use OPTIMIZER_TRACE.

MySQL database diagnostic commands

We have finished talking about EXPLAIN, SHOW PROFILE, and OPTIMIZER_TRACE. We can now easily analyze a certain SQL problem, but problems may also occur in the database itself. So how to locate problems in the database?
Let’s discuss common database diagnostic commands together.

MySql provides a lot of diagnostic commands. Among them, we specially select a few important commands to discuss together:

SHOW PROCESSLIST

Function:
SHOW [FULL] PROCESSLIST is used to view the currently running threads. If the user executing this command has PROCESS permissions, he can see all threads; otherwise, he can only see his own thread (that is, the thread associated with the current user login). If you do not use the FULL keyword, only 100 characters will be displayed in the Info field.

SHOW PROCESSLIST is useful to understand what is going on when you encounter a "too many connections" error message. MySQL reserves an extra connection for users with CONNECTION_ADMIN (or the deprecated SUPER) This ensures that administrators can always connect and check the system.
Threads can be killed using the KILL statement.
Syntax:

SHOW [FULL] PROCESSLIST

Execution result:
Insert image description here
As can be seen from the result, the result contains the following columns:

  • Id: The unique identifier of the connection, which is the return of the CONNECTION_ID() function.

  • User: The MySql user who issued the statement.

    • system_user represents a non-client thread generated by the server to handle internal tasks. This may be used to replicate from the database or delay the row processor IO/SQL thread. For system_user, the Host field will be empty.
    • Unauthenticated user refers to a thread that is connected to the client but has not yet completed client identity authentication.
    • event_scheduler refers to the event scheduler

    The value of the User field is that system_user and SYSTEM_USER permissions are not the same thing. The former refers to the internal thread, and the latter is used to distinguish the difference between system accounts and ordinary accounts.

  • Host: The host name of the client that issued the statement (when User is system_user, Host is empty). The hostname of a TCP/IP connection is reported in the format host_name:client_port to make it easier to understand which client is doing what.

  • db: On which database the command is currently executed. If no database is specified, the value is NULL

  • Command: The command being executed by the current thread.

  • Time: The time the thread is in the current state (in seconds). For the SQL thread of the slave database, the value of this field indicates how many seconds have passed between the time of the last replication event and the actual time of the slave database machine.

  • State: refers to the operation, event or state that the thread is executing. Most states correspond to very fast operations. If a thread remains in a given state for a long time, you need to troubleshoot.

  • info The statement being executed by the current thread, or NULL if no statement has been executed. The statement is the one sent to the server, or it may be an internal statement if one executes other statements. For example, if a CALL statement executes a stored procedure that is executing a SELECT statement, the Info field will display the SELECT statement.

Among them, Command value:

  • Binlog Dump: Thread on the main library, used to send binlog content to the slave library
  • Change user: The thread is performing a change user operation
  • Close stmt: The thread is closing a prepared statement
  • Connect: A replica slave is connected to its master
  • Connect Out: A replica slave is connecting to its master
  • Create DB: The thread is performing a create-database operation
  • Deamon: a thread internal to the server, not the thread servicing client connections
  • Debug: This thread is generating debugging information
  • Delayed insert: This thread is a delayed insert handler
  • Drop DB: The thread is performing a drop-database operation
  • Error: you know
  • Execute: The thread is executing a prepared statement
  • Fetch: Obtaining execution results from Prepared Statement
  • Field List: This thread is obtaining field information of the table
  • Init DB: The thread is selecting the default database.
  • Kill: This thread is killing a thread
  • Long Data: Retrieving long data from prepared statement
  • Ping: Thread is processing server-ping request.
  • Prepare: This thread is preparing a prepared statement
  • Processlist: This thread is generating server thread related information
  • Query: The thread is executing a statement
  • Quit: The thread is terminating
  • Refresh: This thread is refreshing tables, logs, or caches; or is resetting state variables or replicating server information.
  • Register Slave: This thread is registering a slave library
  • Rest stmt: Thread is resetting prepared statement
  • Set option: The thread is setting or resetting the client statement-execution option
  • Shutdown: The thread is shutting down the server
  • Sleep: The thread is waiting for the client to send a statement to it
  • Statistics: This thread is generating server status information
  • Table Dump: Thread is sending table contents to slave server.
  • Time:Unused

Note
In fact, the result of SHOW PROCESSLIST is obtained from the INFORMATION_SCHEMA.PROCESSLIST table
Therefore, execute

SELECT * FROM INFORMATION_SCHEMA.PROCESSLIST

The same results can also be obtained.

Practical SQL

-- 按照客户端IP分组,看哪个客户端的连接数最多
select client_ip, count(client_ip) as client_num
from (select substring_index(host,':',1) as client_ip
		from `information_schema`.processlist) as connect_info
group by client_ip
order by client_num desc;

-- 查看正在执行的线程,并按Time 倒排序,看看有没有执行时间特别长的线程
select * 
from `information_schema`.processlist
where Command != 'Sleep'
order by Time desc;

-- 找出所有执行时间超过5分钟的线程,拼凑出 kill 语句,方便后面查杀
select concat('kill ',id, ';')
from `information_schema`.processlist
where Command = 'Sleep'
	and Time > 300
order by Time desc;

SHOW STATUS

Function: View server related information.
Syntax:

SHOW [GLOBAL | SESSION] STATUS
	[LIKE 'pattern' | WHERE expr]

Example:

SHOW STATUS
SHOW GLOBAL STATUS like '%Slow%'

SHOW VARIABLES

Function: View Mysql variables
Syntax:

SHOW [GLOBAL | SESSION] VARIABLES
	[LIKE 'pattern' | WHERE expr]

Example:

SHOW VARIABLES;

SHOW TABLE STATUS

Function: View the status of tables and views
Syntax:

SHOW TABLE STATUS
	[{
   
   FROM | IN} db_name]
	[LIKE 'pattern' | WHERE expr]

Example:

SHOW TABLE STATUS from employess;

SHOW INDEX

Function: View index related information
Syntax:

SHOW [EXTENDED] {
   
   INDEX | INDEXES | KEYS}
	{
   
   FROM | IN} tb1_name
	[{
   
   FROM | IN} db_name]
	[WHERE expr]

Example:

SHOW INDEX FROM mytable FROM mydb;
SHOW INDEX FROM mydb.mytable;

SHOW ENGINE

Function: Display relevant information about the storage engine
Syntax:

SHOW ENGINE engine_name {
   
   STATUS | MUTEX}

Example:

-- 有关innodb的内容解读详见:https://dev.mysql.com/doc/refman/8.0/en/innodb-standard-monitor.html
SHOW ENGINE INNODB STATUS
SHOW ENGINE INNODB MUTEX

database index

Before discussing this part of the topic, let us first understand a few concepts:

balanced binary tree

  • The height difference between the left subtree and the right subtree of each node does not exceed 1
  • For n nodes, the depth of the tree is log2n, and the time complexity of the query is O(log2n)

Everyone must be familiar with balanced binary trees, so we won’t discuss them in detail here.
Students who don’t understand can read this article: "What is a Balanced Binary Tree (AVL)"

B-Tree(Balance Tree)

The full name of B-Tree (-not minus) is Balance Tree, which means: balanced multi-path search tree.

Look at the picture below: (Disk block 3 in the picture should also have 3 child nodes, but the picture cannot be placed and is not drawn)
Insert image description here

In the figure, gray represents the pointer, which points to the disk block corresponding to the child node. The keyword represents the primary key or index, and the data represents the data corresponding to the keyword lock.
For example: 17-Data, you can understand that the data corresponding to the primary key or index 17 is Data

Suppose we want to search for the piece of data with the primary key of 5. If we search through B-Tree, it works roughly like this: first find the keyword 17 or 35 of the root node, and 5 is less than 17, so we use the P1 pointer to locate it. to disk block 2. The keywords in disk block 2 are 8 and 12, and 5 is less than 8, so disk block 5 is found through the P1 pointer in disk block 2, and finally this data can be found in disk block 5.

B-Tree Features
  • The number of child nodes of the root node 2 <= x <= m, m is the order of the tree
    • Assuming m=3, the root node can have 2-3 children
  • The number of child nodes of the intermediate node m/2 <= y <= m
    • Assuming m=3, the intermediate node has at least 2 children and at most 3 children
  • Each intermediate node contains n keywords, n=number of child nodes-1, and is sorted in ascending order.
    • If the intermediate node has 3 child nodes, there will be 2 keywords in it, and they will be sorted in ascending order.
  • Each intermediate node contains n keywords, n=number of child nodes-1, and is sorted in ascending order.
    • If the intermediate node has 3 child nodes, there will be 2 keywords in it, and they will be sorted in ascending order.
    • Pi(i=1,…n+1) is the pointer to the root node of the subtree. Among them, P[1] points to the subtree whose keyword is less than Key[1], P[i] points to the subtree whose keyword belongs to (Key[i-1], Key[i]), and P[n+1] points to the keyword. subtree greater than Key[n]
    • P1, P2, and P3 are pointers to the root node of the subtree. P1 points to the tree with a key smaller than Key1; P2 points to the subtree between Key1-Key2; P3 points to the tree larger than Key2

In layman's terms, that is to say, for every several keywords, there are n+1 pointers. For example, disk block 2 in the figure has two keywords, then there are three pointers P1, P2, and P3. These three The pointers point to the corresponding disk blocks smaller than 8, the disk blocks corresponding to 8-12, and the disk blocks corresponding to the keywords greater than 12.

B-Tree can effectively reduce the height of the tree. The larger the order of the tree, the lower the height and the fewer the number of queries.

B+Tree

B+Tree is an optimization based on B-Tree. The InnoDB storage engine in MySQL uses B+Tree to implement its storage structure.

After understanding B-Tree, it is relatively easy to understand B+Tree.

As shown in the picture:
Insert image description here

Suppose we want to search for data with the keyword 8. The general process is as follows: first, compare 8 with 5, 28, and 65 on the root node, and then find that 8 is greater than 5 and less than 28, so the P1 pointer will be used to find disk block 2, and then 8, 5, and 5 will be used on disk 2. Comparing 10 and 20, it is found that 8 is greater than 5 and less than 10, so the P1 pointer in disk block 2 is used to find the data below.

We can find that B+Tree and B-Tree are quite similar, but there are also some differences.

The difference between B-Tree and B+Tree
  • A node with n child nodes in B+Tree contains n keywords
    • B-Tree is a node with n child nodes and has n-1 keywords.
  • In B+Tree, all leaf nodes contain information about all keywords of the parent node, and the leaf nodes are linked in ascending order according to the size of the keywords to form an ordered linked list.
    • The leaf nodes of B-Tree do not include all keywords
  • Non-leaf nodes in B+Tree are only used for indexing and do not save data records. Records are stored in leaf nodes.
    • In B-Tree, non-leaf nodes save both indexes and data records.

As you can see in the picture above, each disk block has 3 keywords corresponding to 3 pointers, and 5, 28, and 65 in disk block 1 will be displayed in disk blocks 2, 3, and 4 respectively. . Regardless of the keyword in the parent node, a copy will be recorded in the leaf node. There is no such requirement in B-Tree.

B-Tree vs B+Tree

Let's compare B-Tree and B+Tree. Suppose we want to query data with the keyword 5. (where id = 5)

After analysis, we can find that in this case, the difference between their query processes is not big. However, since the intermediate nodes of B+Tree are only used for indexing, for the same space, B +Tree can store more keywords, so B+Tree is relatively short and fat. Therefore, the number of disk IOs should also be less.
Since the intermediate nodes of B-Tree also need to store data, its query efficiency is not very stable. The best case is that the data can be found directly at the root node, and the worst case is that the data can only be found at the leaf nodes, and B+Tree must go to the leaf nodes to find it at any time.

Make another comparison, if you query data with a keyword between 5 and 10 (where id between 5 and 10)

In this case, if you are using B-Tree, you need to check 5 first, then 6, and so on until 10, and finally assemble the results together and return them.
If you are using B+Tree, you only need to check 5 first. After finding the data 5, it will traverse from the ordered linked list of the node 5 until it reaches 10. .
So B+Tree’s performance for range queries is better than B-Tree.

InnoDB storage method

For the InnoDB engine, it uses B+Tree index. The storage of data is different for different data structures. If the index is a primary key, the leaf node will store the primary key and data; for a non-primary key index (secondary index, auxiliary index), the leaf node will store the non-primary key index and primary key.

In other words, if the Sql statement you send uses a non-primary key index, you need to first find the primary key through the non-primary key index, and then find the data based on the primary key.

MyISAM storage method

MyISAM also uses B+Tree, but regardless of whether the index is a primary key or not, the leaf nodes store pointers to data blocks.
In other words, in MyISAM, indexes and data are stored separately.

MyISAM VS InnoDB

We call the storage method i of Inno DBclustered index, and the storage method of MyISAM is called Non-clustered index

Hash index

Let’s talk about Hash index again
As shown in the picture:

Insert image description here
Keys here represent the fields used to create the index, buckets are a Hash table composed of the hash value calculated by the index field and the physical location of the corresponding data, and entries represent specific data.
After this explanation, we found that the core of the Hash index is the Hash table. Each row of data will be stored in the buckets based on the corresponding hashCode calculated based on the index field, and calculated for our keywords when querying. Get the hashCode and then check it from the buckets. Under normal circumstances, since the hash index is based on the hash table, the time complexity is O1, and the performance is very good.
As for why the time complexity is O1, if you know the source code of HashMap or HashSet, it will be very clear. If you don’t understand, you can Baidu it yourself. This part of the content is relatively simple, so I won’t introduce it further.

So if a Hash conflict occurs, the query process is roughly like this: For example, the hashCode of Xiao Feilong and Xiao Youzi in the picture are both 139. When you make a query, if the query condition is Xiao Youzi, then Xiao Youzi will be used The child calculates the hashCode and gets 139, then goes to the buckets to match and finds a pointer array with a hashCode of 139. Of course, it may also be a linked list, depending on how the data engine is implemented, and then uses the pointer array or linked list to find the corresponding data. It can be found that in the case of hash conflicts, the performance will be reduced. Therefore, when using hash indexes, try to prevent hash conflicts.

Hash index support

For now, MySQL's Memory engine supports explicit Hash indexes.
We can play like this:

create table test_hash_table(
	name varchar(45) not null,
	age tinyint(4) not null,
	key using hash(name)
)engine = memory

In this way, the field will have a hash index.

Of course, in addition to the Memory engine, the InnoDB engine can also be used. InnoDB supports "Adaptive Hash Index". When InnoDB finds that an index is used very frequently, it will be based on B+Tree in the memory. Create another Hash index to improve query efficiency.

We have no way to directly intervene in this adaptive index. It only has a switch. You can use show variables like 'innodb_adaptive_hash_index' to check the switch status.

Insert image description here

You can see that it is turned on by default.
If you want to turn off this feature, you can use set global innodb_adaptive_hash_index = 'OFF'.
Generally speaking, this does not need to be modified.

Spatial index (R-Tree index)

The spatial index is used to store GIS data and is built based on R-Tree, so it is also called R-Tree index.
In the early days, only MyISAM supported spatial indexes. Starting from MySQL 5.7, InnoDB also supports spatial indexes.

Since Mysql’s current support for GIS is not very complete, most people will not use its related functions. Since this piece of knowledge is rarely used, I will not introduce it in detail. It is enough for everyone to know that it exists. If you want to understand the underlying structure of R-Tree, you can read this article:"Classic Search Algorithm—R-Tree", it is very well written .
If you want to know how to use spatial index. You can read this article:"Simple Use of MySQL Spatial Index"

Full text index

Full-text index is mainly used to meet the needs of full-text search. Before MySql5.7, full-text index did not support Chinese and was often paired with Sphinx.
Starting from MySql5.7, a parser ngram has been built in, which supports Chinese.
Official document address: https://dev.mysql.com/doc/refman/8.0/en/fulltext-search-ngram.html

Even so,~ Sir, times have changed!
For now, in response to the demand for full-text search, we are more familiar with some search engines, such as ElasticSearch or Solr, so full-text indexing is not used much. Here is just an introduction. That's it. If you are interested, you can read this article:"MySQL Full-text Index"

B-Tree(B+Tree) & Hash index characteristics and limitations

Above, we have discussed the underlying structures of the four indexes. We know that InnoDB mainly uses B+Tree and also has an adaptive Hash index.
Here, we will discuss the characteristics and limitations of B-Tree (B+Tree) and Hash index.

Here we will not distinguish between B-Tree and B+Tree, they are collectively called B-Tree.
Because this part is applicable to both B-Tree and B+Tree.

B-Tree Features

B-Tree index query is relatively complete.
The query conditions and key values ​​can completely match. For example, when we create an index on the name field, when using where name = '小飞龙' Indexes can be used.
Can range match, for example, create an index on the age field, and the index can also be used when the query condition is where age > 20.
You can prefix match. For example, to create an index in name, use where name like '小%', so you can also use the index, but when % is used in front There is no way to use the index. That is, right blur can use indexes, but left blur cannot use indexes.

B-Tree restrictions

For example, if we create a combined index, index(name,age,sex) acts on three fields.

  • When the query conditions do not include the leftmost column, the index cannot be used.
    • Index cannot be used where age = 5 and sex = 1.
  • If columns in the index are skipped, the index cannot be fully used
    • where name = ‘小飞龙’ and sex = 1 Because sex skips the age field, only the name column can be used
  • If there is a range query (fuzzy) query for a certain column in the query, all columns to the right of it cannot use the index.
    • where name = ‘小飞龙’ and age > 20 and sex = 1 Because age uses a range query, sex cannot use the index and can only use the name and age columns.

These three restrictions are the famous "Leftmost matching principle". The leftmost matching principle refers to: the index is based on leftmost priority If the above three conditions are not met, the index cannot be fully used.

Hash index features

Generally speaking, Hash index has better performance than B-Tree (B+Tree). As long as the hash does not conflict, its time complexity is O1

Hash index limitations

But its limitations are also obvious. First, the Hash index is not sorted according to the index value, so sorting cannot be used. This means that if your query conditions contain order by query conditions, you cannot use the hash index.

The second is that it does not support partial index column matching search
The hash index is calculated using all columns of the index content. For example: we create a search on the two fields a and b. Hash index, if your query condition is where a = 1 and there is only one, it cannot use hash index.

The hash index only supports equivalent queries (such as =, IN). If it is a range query or fuzzy query, the hash index cannot be used.

Finally, the performance of the Hash index depends on the Hash conflict. The more serious the Hash conflict is, the more serious the performance degradation will be.

Principles for creating indexes

Let’s talk about the principles of creating indexes:

  • In which scenarios is it recommended to create an index?
  • In which scenarios is it not recommended to create an index?
Suggested scenarios for creating an index

1) For select statements and fields that are frequently used as where conditions, you can consider creating an index for this field. If you often need to filter data in multiple fields, you can consider a combined index, but when using a combined index, you must consider the leftmost matching principle.

For example: there is an employee table (employees), which has two fields, first_name, and last_name. Since these two fields are often used to filter data, a combined index is created,
For example, we have a dynamic query. First_name is a required condition and last_name is an optional condition. When using index(first_name,last_name), it can satisfy various query situations. Then when using index(last_name, first_name), since last_name is an optional condition, it will cause where first_name = ‘Wei’ to be unable to use this index

2) The fields used in the where condition of the update/delete statement also need to use indexes
This is because the update or delete statement will first query the corresponding data according to the where condition, and then Then update or delete these data. Creating an index can improve the efficiency of queries in it.

3) Indexes also need to be created for fields that need to be grouped or sorted.

4) The fields used by distinct also need to consider using indexes.

5) If the field value has unique constraints, you can also create an index. For example, unique indexes and primary key indexes have unique constraints.

6) For multi-table queries, indexes should be created on the join fields, and the types must be consistent. If they are inconsistent, it may lead to implicit conversion, which may render the index unusable.

Scenarios where index creation is not recommended

1) Because the index is to quickly locate the query data, if it is not used in your query conditions, then there is no need to create an index for this field.

2) If there is very little data in the table, there is no need to create an index.

3) If a field in a table has a lot of duplicate data, the data selectivity is very low. Then creating an index has little effect, and it is not recommended to create an index. This is because the higher the selectivity of the index, the better the query efficiency, because more rows can be filtered during the search process.

4) For frequently updated fields, if you create an index, you must consider the cost of index maintenance. The index must also be updated when modifying or deleting data. If a field is modified very frequently and there are few queries, it is not recommended to create an index.

However, it should be noted that these are only general principles. In actual projects, you must learn and apply them flexibly, and do not stick to dogma. Principles are only used to guide work, and should be reasonably adapted based on actual conditions in actual projects.

Index failure and solutions

Let’s discuss the scenarios and solutions for index failure.
Here I have summarized 7 scenarios that lead to index failure:

  1. Index columns are not independent. Independent means: the column cannot be part of an expression or a parameter of a function.
  2. Used left blur
  3. Some fields queried using OR do not have indexes
  4. String condition is not quoted using ' '
  5. Queries that do not comply with the leftmost matching principle
  6. It is recommended to add NOT NULL constraints to index fields
  7. Implicit conversion causes index failure

Let’s take a look at these scenarios.
1) The index column is not independent
Insert image description here
The is_indentor column here is used as part of the expression, so the index cannot be used in this case. Let's use explain to analyze:
Insert image description here
You can see that the type is ALL and a full table scan has occurred.

Solution: Calculate the value of the expression in advance and then pass it over to avoid calculation on the left side of SQL where condition =.

So it can be rewritten like this:
Insert image description here
You can see that type becomes ref, which improves performance a lot.

Look at this scenario again, the index field is used as a parameter of the function:
For example, this sql
Here because nickname is a parameter of the SUBSTRING function , so there is no way to use the index.
Insert image description here
Solution: Calculate the results in advance and then pass them over. Do not use functions on the left side of the where condition; or use equivalent SQL queries.

Our above sql actually achieves the same effect as using like 'Ant%', so it can be rewritten as:
Insert image description here

2) Used left fuzzy query

Even if you use left blur, you cannot use the index.
The solution is: try to avoid using left blur. If you can't avoid it, you can consider using a search engine.

3) Some fields queried using OR do not have indexes
For example, the following SQL

Insert image description here

There is no index for nickname here, but there is an index for is_indentor, so if OR is used, the index cannot be used. You can see that type is ALL and a full table scan is used.

Solution: Add an index to another field.

After adding the index to nickname, execute again:
Insert image description here

You can see that the type has become index_merge, which is calledindex merge. In the Extra field, it tells usUsing sort_union(nickname,is_indentor);That is to say, mysql uses the two indexes nickname and is_indentor and merges them. Index merging is an internal optimization mechanism of MySQL. It scans the two indexes separately, and finally merges the two result sets. The benefit is that it avoids full table scanning. .

4) The string condition is not quoted using ''
Insert image description here
For example, here, my is_indentor is a varchar type, but it is written as a number when querying. This is also the case There is no way to go through the index.

Solution: Quote the string with ‘'.
Insert image description here
At this time type becomes ref.

5) It does not comply with the leftmost matching principle
Let’s adjust the index first, put rank_indentor_id after nickname,
Insert image description here
and then use rank_indentor_id as a condition query , you can see that a full table scan is used.

Insert image description here
Solution: Adjust the index order.
Insert image description here
Let’s change it like this and look at the execution results.
Insert image description here
You can see that the index is gone.

6) It is recommended to add NOT NULL constraints to index fields

This is a suggestion to add NOT NULL constraints to the index. This is because, for a single column index, it cannot store NULL values, and for a composite index, it cannot store all NULL values, and when using ISNULL At this time, there is no way to use the index.
The official document also gives a clear description, which roughly means: it is recommended to set the field to NOT NULL, which can make SQL execute faster and reduce the storage overhead of extra bits and judgment. overhead.
https://dev.mysql.com/doc/refman/8.0/en/data-size.html
Insert image description here
Therefore, there is no need for storage in your business When NULL is worthwhile, it is recommended to set all fields to NOT NULL and set default values.

7) Implicit conversion causes index failure

Because there is currently no such table, it is impossible to demonstrate. You can experiment by yourself.

Solution: Try to be more standardized when creating indexes, such as using int or bigint uniformly.

Index tuning tips

This part of the content mainly shares some practical index tuning techniques. mainly include:

  • Index tuning for long fields
  • Tips for using composite indexes
  • covering index
  • Sorting optimization
  • Redundant duplicate index optimization
Long field index tuning

In actual projects, we may need very long index string fields.
If the index length is very large, it will cause the index to take up a lot of space. The query efficiency of the index that acts on the super field is not high. So how to optimize it?

The first method is to create an additional field to store the value that this long field can represent. For example, its hashCode...
This additional created field should have the following requirements:

  1. The field length should be relatively small, SHA1/MD5 is inappropriate.
  2. Hash conflicts should be avoided as much as possible. Currently, it is popular to use CRC32() or FNV64()

For example, if nickname is a large value in the user_info table, we can create nickname_hash as an index. The corresponding sql can be transformed into:

select * from user_info 
where nickname_hash = CRC32('Ant')
	and nickname = 'Ant'

In this way, you can directly add an index to the nickname_hash field. The reason why the query still carries the condition nickname = ‘Ant’ is to allow SQL to return the result correctly when there is a Hash conflict.

However, this optimization solution is powerless for fuzzy queries, because we Hash the complete value of the nickname, so like queries cannot be optimized in this way.
What if we still want the index to be smaller?

mysql supportsprefix index
The syntax for creating a prefix index is as follows:

alter table user_info add key (nickname(5)) 

This means taking the first 5 values ​​​​of nickname as the index. But the question is, which number should be written more appropriately? What we hope is that this number is as small as possible, because this can reduce space as much as possible and improve performance. At the same time, we also hope that the selectivity of this index is high enough.

Here, I would like to introduce to you the index selectivity formula:
Index selectivity = unique index values/total number of records in the data table
The larger the result value obtained, the higher the selectivity and the better the performance.

We can play like this:
Insert image description here
The value obtained in this way is the selectivity of the complete column, then this value is the maximum selectivity of this field.

Then we do this:
Select a test value of 3, and the result is 0.5737
Insert image description here

Try to pass 5 again, the calculated value is 0.6669
Insert image description here
Pass 6 again, the calculated value is 0.6787
It can be seen that after 5, It is already close to the maximum value, and there is not much change when it increases again. Therefore, after testing, it can be concluded that choosing 5 is more reasonable.

After analysis, we can find that prefix index can make our table more efficient and transparent to the application. Our application does not need to make any modifications. The cost of use is also relatively low. So this is an optimization solution that is easier to implement. But it also has limitations, that is, it cannot do order by or group by; it cannot use covering indexes.

Single column index vs combined index

Let's discuss the differences in query execution between single-column indexes and combined indexes.

//todo is being updated...

Guess you like

Origin blog.csdn.net/qq_45455361/article/details/121021997#comments_24617409