MySQL Basics (28) Index Optimization and Query Optimization

What dimensions can be used for database tuning? In short:

  • Index failure, index not fully utilized - index creation
  • There are too many JOINs in related queries (design defects or last resort requirements) - SQL optimization
  • Server tuning and various parameter settings (buffering, number of threads, etc.) - adjust my.cnf.
  • Too much data - sub-database and sub-table

Knowledge points about database tuning are very scattered. Different DBMS, different companies, different positions, and different projects encounter different problems. Here we divide it into three chapters to explain in detail.

Although there are many techniques for SQL query optimization, they can be divided 物理查询优化into 逻辑查询优化two major parts.

  • Physical query optimization is optimized through techniques such as 索引and , and the key point here is to master the use of indexes.表连接方式
  • Logical query optimization is to improve query efficiency through SQL 等价变换. To put it bluntly, it means that changing the query writing method may lead to higher execution efficiency.

1. Data preparation

500,000 entries are inserted into student lists and 10,000 entries are inserted into class lists.

Step 1: Create table

CREATE TABLE `class` (
`id` INT(11) NOT NULL AUTO_INCREMENT,
`className` VARCHAR(30) DEFAULT NULL,
`address` VARCHAR(40) DEFAULT NULL,
`monitor` INT NULL ,
PRIMARY KEY (`id`)
) ENGINE=INNODB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8;

CREATE TABLE `student` (
`id` INT(11) NOT NULL AUTO_INCREMENT,
`stuno` INT NOT NULL ,
`name` VARCHAR(20) DEFAULT NULL,
`age` INT(3) DEFAULT NULL,
`classId` INT(11) DEFAULT NULL,
PRIMARY KEY (`id`)
#CONSTRAINT `fk_class_id` FOREIGN KEY (`classId`) REFERENCES `t_class` (`id`)
) ENGINE=INNODB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8;

Step 2: Set parameters
Command on: Allow creation of function settings:

set global log_bin_trust_function_creators=1; # 不加global只是当前窗口有效。  

Step 3: Create a function
to ensure that each piece of data is different

#随机产生字符串
DELIMITER //
CREATE FUNCTION rand_string(n INT) RETURNS VARCHAR(255)
BEGIN
DECLARE chars_str VARCHAR(100) DEFAULT
'abcdefghijklmnopqrstuvwxyzABCDEFJHIJKLMNOPQRSTUVWXYZ';
DECLARE return_str VARCHAR(255) DEFAULT '';
DECLARE i INT DEFAULT 0;
WHILE i < n DO
SET return_str =CONCAT(return_str,SUBSTRING(chars_str,FLOOR(1+RAND()*52),1));
SET i = i + 1;
END WHILE;
RETURN return_str;
END //
DELIMITER ;
#假如要删除
#drop function rand_string;

Randomly generate class numbers

#用于随机产生多少到多少的编号
DELIMITER //
CREATE FUNCTION rand_num (from_num INT ,to_num INT) RETURNS INT(11)
BEGIN
DECLARE i INT DEFAULT 0;
SET i = FLOOR(from_num +RAND()*(to_num - from_num+1)) ;
RETURN i;
END //
DELIMITER ;

#假如要删除
#drop function rand_num;

Step 4: Create stored procedure

#创建往stu表中插入数据的存储过程
DELIMITER //
CREATE PROCEDURE insert_stu( START INT , max_num INT )
BEGIN
DECLARE i INT DEFAULT 0;
SET autocommit = 0; #设置手动提交事务
REPEAT #循环
SET i = i + 1; #赋值
INSERT INTO student (stuno, name ,age ,classId ) VALUES
((START+i),rand_string(6),rand_num(1,50),rand_num(1,1000));
UNTIL i = max_num
END REPEAT;
COMMIT; #提交事务
END //
DELIMITER ;
#假如要删除
#drop PROCEDURE insert_stu;

Create a stored procedure to insert data into the class table

#执行存储过程,往class表添加随机数据
DELIMITER //
CREATE PROCEDURE `insert_class`( max_num INT )
BEGIN
DECLARE i INT DEFAULT 0;
SET autocommit = 0;
REPEAT
SET i = i + 1;
INSERT INTO class ( classname,address,monitor ) VALUES
(rand_string(8),rand_string(10),rand_num(1,100000));
UNTIL i = max_num
END REPEAT;
COMMIT;
END //
DELIMITER ;
#假如要删除
#drop PROCEDURE insert_class;

Step 5: Call the stored procedure

#执行存储过程,往class表添加1万条数据
CALL insert_class(10000);

#执行存储过程,往stu表添加50万条数据
CALL insert_stu(100000,500000);
CALL insert_stu(600000,1000000);

Step 6: Delete the index on a table
and create a stored procedure

DELIMITER //
CREATE PROCEDURE `proc_drop_index`(dbname VARCHAR(200),tablename VARCHAR(200))
BEGIN
    DECLARE done INT DEFAULT 0;
    DECLARE ct INT DEFAULT 0;
    DECLARE _index VARCHAR(200) DEFAULT '';
    DECLARE _cur CURSOR FOR SELECT index_name FROM
    information_schema.STATISTICS WHERE table_schema=dbname AND table_name=tablename AND
    seq_in_index=1 AND index_name <>'PRIMARY' ;
    #每个游标必须使用不同的declare continue handler for not found set done=1来控制游标的结束
    DECLARE CONTINUE HANDLER FOR NOT FOUND set done=2 ;
    #若没有数据返回,程序继续,并将变量done设为2
    OPEN _cur;
    FETCH _cur INTO _index;
    WHILE _index<>'' DO
        SET @str = CONCAT("drop index " , _index , " on " , tablename );
        PREPARE sql_str FROM @str ;
        EXECUTE sql_str;
        DEALLOCATE PREPARE sql_str;
        SET _index='';
        FETCH _cur INTO _index;
    END WHILE;
    CLOSE _cur;
END //
DELIMITER ;

Execute stored procedure

CALL proc_drop_index("dbname","tablename");

2. Index failure case

One of the most efficient ways in MySQL 提高性能is to use tables 设计合理的索引. Indexes provide efficient access to data and speed up queries, so indexes have a crucial impact on query speed.

  • Using an index can be used to select 快速地定位a record in the table, thereby increasing the speed of database queries and improving database performance.
  • If the index is not used in the query, the query statement will 扫描表中的所有记录. In the case of large amounts of data, the query speed will be very slow.

This is used (by default) in most cases B+树to build indexes. Only spatial column type indexes are used R-树, and MEMORY tables also support it hash索引.

In fact, the optimizer has the final say whether to use an index or not. What is the optimizer based on? Based on cost开销(CostBaseOptimizer), it's not based on 规则(Rule-BasedOptimizer), it's not based on 语义. Whatever comes with a low cost. In addition, whether the SQL statement uses an index is related to the database version, data volume, and data selectivity.

Cost is not based on time

2.1 Full Value Match My Favorite

This means that creating a joint index and multiple indexes will take effect at the same time.

The SQL statements that often appear in the system are as follows:

EXPLAIN SELECT SQL_NO_CACHE * FROM student WHERE age=30;
EXPLAIN SELECT SQL_NO_CACHE * FROM student WHERE age=30 and classId=4;
EXPLAIN SELECT SQL_NO_CACHE * FROM student WHERE age=30 and classId=4 AND name = 'abcd';

Execute before indexing: (pay attention to execution time)

mysql> SELECT SQL_NO_CACHE * FROM student WHERE age=30 and classId=4 AND name = 'abcd' ;
Empty set1 warning ( 0.28 sec)

index

CREATE INDEX idx_age ON student(age ) ;

CREATE INDEX idx_age_classid ON student( age , classId);

CREATE INDEX idx_age_classid_name ON student( age , classId , name) ;

After creating the index, execute:

mysql> SELECT SQL_NO_CACHE * FROM student WHERE age=30 and classId=4 AND name = 'abcd';
Empty set,1 warning (0.01 sec)

It can be seen that the query time before creating the index is 0.28 seconds, and the query time after creating the index is 0.01 seconds. The index helps us greatly improve the query efficiency.

2.2 Best left prefix rule

When MySQL builds a joint index, it will follow the best left prefix matching principle, that is, leftmost priority. When retrieving data, matching starts from the leftmost of the joint index.

Example 1:

EXPLAIN SELECT SQL_NO_CACHE * FROM student WHERE student.age=30 AND student.name = 'abcd';
# 走`idx_age_classid_name`   使用了Using index condition

Example 2:

EXPLAIN SELECT SQL_NO_CACHE * FROM student WHERE student.classid=1 AND student.name = 'abcd' ;

# 没有索引匹配上。 

Example 3: Can the index idx_age_classid_name still be used normally?

EXPLAIN SELECT SQL_NO_CACHE * FROM student WHERE classid=4 and student.age=30 AND student.name = 'abcd' ;

If multiple columns are indexed, the leftmost prefix rule must be followed. This means that the query starts from the leftmost column of the index and does not skip columns in the index.

mysq1> EXPLAIN SELECT SQL_NO_CACHE* FROM student WHERE student.age=30 AND student.name ='abcd';

Insert image description here
**Conclusion:**MySQL can create indexes for multiple fields, and an index can include 16 fields. For multi-column indexes, the ** filtering conditions to use the index must be met in the order in which the index is created. Once a field is skipped, the fields following the index cannot be used. **If the first field among these fields is not used in the query conditions, the multi-column (or joint) index will not be used.

Expansion: Alibaba "Java Development Manual"

The index file has the leftmost prefix matching feature of B-Tree. If the value on the left is undetermined, this index cannot be used.

2.3 Primary key insertion order

For a InnoDBtable using a storage engine, the data in the table is actually stored in 聚簇索引the leaf nodes. The records are stored in the data page, and the data pages and records are 主键值从小到大sorted according to the order of the records, so if our 插入records 主键值是依次增大are full, then every time we insert a data page, we will switch to the next data page and continue inserting. It will be more troublesome if the primary key value we insert is suddenly large or small. Assume that the records stored in a certain data page are full, and the primary key value it stores is between 1~100:

Insert image description here
If another record with a primary key value of is inserted at this time 9, its insertion position will be as shown below:
Insert image description here
But this data page is already full, what should I do if I insert it again? We need to split the current 页面分裂page into two pages and move some records in
this page to the newly created page. What do page splits and record shifts mean? Meaning: 性能损耗 !So if we want to
avoid such unnecessary performance loss as much as possible, it is best to allow inserted recording 主键值依次递增, so that such performance loss will not occur.
So we suggest: let the primary key have it AUTO_INCREMENT, let the storage engine generate the primary key for the table by itself, instead of manually inserting it,
for example: person_infotable:

CREATE TABLE person_info(
    id INT UNSIGNED NOT NULL AUTO_INCREMENT,
    name VARCHAR(100) NOT NULL,
    birthday DATE NOT NULL,
    phone_number CHAR(11) NOT NULL,
    country varchar(100) NOT NULL,
    PRIMARY KEY (id),
    KEY idx_name_birthday_phone_number (name(10), birthday, phone_number)
);

Our custom primary key column idhas the attribute, and the storage engine will automatically fill in the auto-incremented primary key value AUTO_INCREMENTfor us when inserting a record .
Such primary keys take up little space, are written sequentially, and reduce page splits.

2.4 Index failure caused by calculation, function, type conversion (automatic or manual)

1.Which way of writing these two SQLs is better?

EXPLAIN SELECT SQL_NO_CACHE * FROM student WHERE student.name LIKE 'abc%';

EXPLAIN SELECT SQL_NO_CACHE * FROM student WHERE LEFT(student.name,3) = 'abc';
# 这个索引失效。因为用上函数了。

2. Create index

CREATE INDEX idx_sno ON student (stuno) ;

3. Type 1: Index optimization takes effect

mysql> EXPLAIN SELECT SQL_NO_CACHE * FROM student WHERE student.name LIKE 'abc%';
mysql> SELECT SQL_NO_CACHE * FROM student WHERE student.name LIKE 'abc%';
+---------+---------+--------+------+---------+
| id | stuno | name | age | classId |
+---------+---------+--------+------+---------+
| 5301379 | 1233401 | AbCHEa | 164 | 259 |
| 7170042 | 3102064 | ABcHeB | 199 | 161 |
| 1901614 | 1833636 | ABcHeC | 226 | 275 |
| 5195021 | 1127043 | abchEC | 486 | 72 |
| 4047089 | 3810031 | AbCHFd | 268 | 210 |
| 4917074 | 849096 | ABcHfD | 264 | 442 |
| 1540859 | 141979 | abchFF | 119 | 140 |
| 5121801 | 1053823 | AbCHFg | 412 | 327 |
| 2441254 | 2373276 | abchFJ | 170 | 362 |
| 7039146 | 2971168 | ABcHgI | 502 | 465 |
| 1636826 | 1580286 | ABcHgK | 71 | 262 |
| 374344 | 474345 | abchHL | 367 | 212 |
| 1596534 | 169191 | AbCHHl | 102 | 146 |
...
| 5266837 | 1198859 | abclXe | 292 | 298 |
| 8126968 | 4058990 | aBClxE | 316 | 150 |
| 4298305 | 399962 | AbCLXF | 72 | 423 |
| 5813628 | 1745650 | aBClxF | 356 | 323 |
| 6980448 | 2912470 | AbCLXF | 107 | 78 |
| 7881979 | 3814001 | AbCLXF | 89 | 497 |
| 4955576 | 887598 | ABcLxg | 121 | 385 |
| 3653460 | 3585482 | AbCLXJ | 130 | 174 |
| 1231990 | 1283439 | AbCLYH | 189 | 429 |
| 6110615 | 2042637 | ABcLyh | 157 | 40 |
+---------+---------+--------+------+---------+
401 rows in set, 1 warning (0.01 sec)

Second type: index optimization failure

mysql> EXPLAIN SELECT SQL_NO_CACHE * FROM student WHERE LEFT(student.name,3) = 'abc';
mysql> SELECT SQL_NO_CACHE * FROM student WHERE LEFT(student.name,3) = 'abc';
+---------+---------+--------+------+---------+
| id | stuno | name | age | classId |
+---------+---------+--------+------+---------+
| 5301379 | 1233401 | AbCHEa | 164 | 259 |
| 7170042 | 3102064 | ABcHeB | 199 | 161 |
| 1901614 | 1833636 | ABcHeC | 226 | 275 |
| 5195021 | 1127043 | abchEC | 486 | 72 |
| 4047089 | 3810031 | AbCHFd | 268 | 210 |
| 4917074 | 849096 | ABcHfD | 264 | 442 |
| 1540859 | 141979 | abchFF | 119 | 140 |
| 5121801 | 1053823 | AbCHFg | 412 | 327 |
| 2441254 | 2373276 | abchFJ | 170 | 362 |
| 7039146 | 2971168 | ABcHgI | 502 | 465 |
| 1636826 | 1580286 | ABcHgK | 71 | 262 |
| 374344 | 474345 | abchHL | 367 | 212 |
| 1596534 | 169191 | AbCHHl | 102 | 146 |
...
| 5266837 | 1198859 | abclXe | 292 | 298 |
| 8126968 | 4058990 | aBClxE | 316 | 150 |
| 4298305 | 399962 | AbCLXF | 72 | 423 |
| 5813628 | 1745650 | aBClxF | 356 | 323 |
| 6980448 | 2912470 | AbCLXF | 107 | 78 |
| 7881979 | 3814001 | AbCLXF | 89 | 497 |
| 4955576 | 887598 | ABcLxg | 121 | 385 |
| 3653460 | 3585482 | AbCLXJ | 130 | 174 |
| 1231990 | 1283439 | AbCLYH | 189 | 429 |
| 6110615 | 2042637 | ABcLyh | 157 | 40 |
+---------+---------+--------+------+---------+
401 rows in set, 1 warning (3.62 sec)

The type is "ALL", which means that no index is used, the query time is 3.62 seconds, and the query efficiency is much lower than before.

Another example:

  • An index is set on the field stuno of the student table.

    CREATE INDEX idx_sno ON student(stuno);
    
    EXPLAIN SELECT SQL_NO_CACHE id, stuno, NAME FROM student WHERE stuno+1 = 900001;
    # 计算导致索引失效
    

    operation result:

Insert image description here

  • The reason why the type is ALL is that the calculation causes index failure.

  • Index optimization takes effect (no calculations are made):

    EXPLAIN SELECT SQL_NO_CACHE id, stuno, NAME FROM student WHERE stuno = 900000;
    

Insert image description here

Another example:

  • An index is set on the field name of the student table

    CREATE INDEX idx_name ON student (name) ; # 上面已经运行过了
    
  • Index failure:

    EXPLAIN SELECT id,stuno,name FROM student WHERE SUBSTRING( name,1,3)='abc';
    ## 使用函数导致失效,可以改用like abc%
    

2.5 Type conversion causes index failure

Which of the following sql statements can use indexes. (Assume there is an index set on the name field)

# 未使用到索引
EXPLAIN SELECT SQL_NO_CACHE * FROM student WHERE name=123;
# name=123发生类型转换,索引失效
# 使用到索引
EXPLAIN SELECT SQL_NO_CACHE * FROM student WHERE name='123';

# 使用到索引

2.6 The column index on the right side of the range condition is invalid

ALTER TABLE student DROP INDEX idx_name;
ALTER TABLE student DROP INDEX idx_age;
ALTER TABLE student DROP INDEX idx_age_classid;

show index from student;

EXPLAIN SELECT SQL_NO_CACHE * FROM student WHERE student.age=30 AND student.classId>20 AND student.name = 'abc' ;

Insert image description here

Because range search is used, the index following the range search index becomes invalid.

Tips: Because of index failure caused by range conditions, you can consider putting the determined index first.

For example, in the above example,

create index idx_age_name_cid on student(age, name, classId);

Here name is placed in front of the range search classId. . The index will take effect.

What falls within the scope?

  1. Greater than or equal to, greater than, less than or equal to, less than
  2. between

Range queries in application development, such as: amount query, date query are often range queries. Consider putting it later when creating a joint index.

2.7 Not equal to (!= or <>) index invalid

  • Create an index for the name field

    CREATE INDEX idx_name ON student(NAME);
    
  • Check whether the index is invalid

    EXPLAIN SELECT SQL_NO_CACHE * FROM student WHERE student.name !='abc';
    

Insert image description here

It's hopeless that the index can only find things you know.

2.8 The index can be used for is null, but the index cannot be used for is not null.

EXPLAIN SELECT SQL_NO_CACHE * FROM student WHERE age IS NULL;

Insert image description here

# is not null 索引失效
EXPLAIN SELECT SQL_NO_CACHE * FROM student WHERE age IS NOT NULL;

Insert image description here

Conclusion: It is best to set the default value when designing the data table 字段设置为 NOT NULL 约束. For example, you can set the default value of the INT type field to 0. Sets the default value for character types to the empty string.

Extension: Similarly, using in a query not likecannot use the index, resulting in a full table scan.

2.9 like index starting with wildcard character % is invalid

In the query statement using the LIKE keyword, if the first character of the matching string is "%", the index will not work. The index will only work if "%" is not in the first position.

Expansion: Alibaba "Java Development Manual"
[Mandatory] Left-fuzzy or full-fuzzy page search is strictly prohibited. If necessary, please use a search engine to solve it.

2.10 There are non-indexed columns before and after OR, and the index is invalid.

In the WHERE clause, if the condition column before the OR is indexed, but the condition column after the OR is not indexed, the index will be invalid. In other words, the index is only used in the query when the columns in the two conditions before and after the OR are both indexes .

Because the meaning of OR is that as long as one of the two conditions is satisfied, 因此只有一个条伴列进行了索引是没有意义的as long as the conditional column is not indexed, a full table scan will be performed, so the indexed conditional column will also be invalid.

2.11 The character sets of databases and tables use utf8mb4 uniformly.

The unified use of utf8mb4 (supported by version 5.5.3 or above) has better compatibility, and the unified character set can avoid garbled characters caused by character set conversion. Different
character sets need to be compared before, 转换which will cause index failure.

2.12 Exercises and general advice

Exercise : Hypothesis: index(a,b,c)
Insert image description here
General advice:

  • For single-column indexes, try to choose an index with better filterability for the current query.
  • When selecting a combined index, the field with the best filterability in the current query should be positioned earlier in the index field order, the better. .
  • When choosing a combined index, try to choose an index that can include more fields in the where clause of the current query.
  • When selecting a combined index, if a certain field may appear in a range query, try to put this field at the end of the index order.

In short, when writing SQL statements, try to avoid causing index failure.

3. Related query optimization

3.1 Data preparation

#分类
CREATE TABLE IF NOT EXISTS `type`(
`id` INT(10) UNSIGNED NOT NULL AUTO_INCREMENT,
`card` INT(10) UNSIGNED NOT NULL,
PRIMARY KEY ( `id` )
);

#图书
CREATE TABLE IF NOT EXISTS `book`(
	`bookid` INT(10) UNSIGNED NOT NULL AUTO_INCREMENT,
    `card`INT(10) UNSIGNED NOT NULL,
	PRIMARY KEY (`bookid`)
);

#向分类表中添加20条记录
INSERT INTO type (card) VALUES (FLOOR(1 +(RAND() * 20)));



#向图书表中添加20条记录
INSERT INTO book(card) VALUES (FLOOR(1 +(RAND() * 20)) );


3.2 Use left outer join

Let’s start the EXPLAIN analysis

EXPLAIN SELECT SQL_NO_CACHE * FROM `type` LEFT JOIN book ON type.card = book.card;

Insert image description here
Conclusion: type has All
to add index optimization

# 添加索引
ALTER TABLE book ADD INDEX Y(card); #【被驱动表】,可以避免全表扫描

EXPLAIN SELECT SQL_NO_CACHE * FROM `type` LEFT JOIN book ON type.card = book.card;

Insert image description here

You can see that the type in the second line has changed to ref, and the rows have also changed to an obvious optimization. This is determined by the left join property. The LEFT JOIN condition is used to determine how to search for rows from the right table. There must be all rows on the left side, so the right side is our key point and must be indexed.

If you can only add one side of the index, then add 被驱动表the index.

ALTER TABLE `type` ADD INDEX X (card); #【驱动表】,无法避免全表扫描

EXPLAIN SELECT SQL_NO_CACHE * FROM `type` LEFT JOIN book ON type.card = book.card;

Insert image description here
then:

DROP INDEX Y ON book;
EXPLAIN SELECT SQL_NO_CACHE * FROM `type` LEFT JOIN book ON type.card = book.card;

Insert image description here

Remove the driven index and become a join buffer again

3.3 Use inner joins

prerequisite knowledge

Insert image description here

drop index X on type;
drop index Y on book;#(如果已经删除了可以不用再执行该操作)

Replace with inner join (MySQL automatically selects the driver table)

EXPLAIN SELECT SQL_NO_CACHE * FROM type INNER JOIN book ON type.card=book.card;

Insert image description here
Add index optimization

ALTER TABLE book ADD INDEX Y (card);

EXPLAIN SELECT SQL_NO_CACHE * FROM type INNER JOIN book ON type.card=book.card;

Insert image description here

# type 加索引
ALTER TABLE type ADD INDEX X (card);
# 观察执行情况
EXPLAIN SELECT SQL_NO_CACHE * FROM type INNER JOIN book ON type.card=book.card;

Insert image description here

After adding an index to type here, the driver table and the driven table are still the same.

After adding some data to type

The optimizer will determine which data is smaller. Just as a driver table

in conclusion:

  • 内连接The main driven table is determined by the optimizer. Whichever cost the optimizer thinks is smaller will be used as the driver table.

  • If only one of the two tables has an index, the table with the index will be used 被驱动表.

    • Reason: The driver table needs to be fully checked. You have to find out whether there is an index or not.
  • When both indexes exist, the large amount of data is used 被驱动表(small table drives large table)

    • Reason: All driver tables must be searched, and large tables can be searched faster through indexes.

3.4 Principle of join statement

The join method connects multiple tables, which is essentially a circular matching of data between tables. After MySQL version 5.5, MySQL only supports one way to associate tables, which is nested loops ( Nested Loop Join). If the amount of data in the associated table is large, the execution time of the join association will be very long. In versions after MySQL 5.5, MySQL BNLJoptimizes nested execution by introducing algorithms.

1. Driver table and driven table

The driving table is the master table, and the driven table is the slave table and non-driven table.

  • For inner joins:

    SELECT * FROM A JOIN B ON ...
    

    Does A have to be a driver table? Not necessarily. The optimizer will optimize based on your query statement and decide which table to check first. The table queried first is the driving table, and vice versa is the driven table. You can view it through the explain keyword.

    • For outer joins:

      SELECT * FROM A LEFT JOIN B ON ...
      #或
      SELECT *FROM B RIGHT JOIN A ON ...
      

      Usually, everyone thinks that A is the driving table and B is the driven table. But not necessarily. The test is as follows:

      CREATE TABLE a(f1 INT,f2 INT,INDEX(f1))ENGINE=INNODB;
      
      CREATE TABLE b(f1 INT,f2 INT)ENGINE=INNODB;
      
      INSERT INTO a VALUES(1,1),(2,2),(3,3),(4,4),(5,5),(6,6);
      
      INSERT INTO b VALUES (3,3),(4,4),(5,5),(6,6),(7,7),(8,8);
      
      #测试1
      EXPLAIN SELECT* FROM a LEFT JOIN b ON (a.f1=b.f1)WHERE (a.f2=b.f2);
      
      #测试2
      EXPLAIN SELECT * FROM a LEFT JOIN b oN (a.f1=b.f1) AND (a.f2=b.f2);
      
      

      Test 1 results:
      Insert image description here
      It’s incredible to come to this conclusion. Let’s follow the show warnings to see:
      Insert image description here
      Test 2 results:
      Insert image description here
      Continue to show warnings \G
      Insert image description here

2.Simple Nested-Loop Join (simple nested loop join)

The algorithm is quite simple. Take a piece of data 1 from table A, traverse table B, and put the matching data into result...and so on. Each record in the driving table A is judged against the record in the driven table B:

Insert image description here
This example is a full table scan without an index.

It can be seen that the efficiency of this method is very low. Calculated based on the above table A data of 100 items and table B data of 1000 items, then A*B=100,000 times. The cost statistics are as follows:

Insert image description here
Of course, mysql will definitely not connect tables so roughly, so the next two optimization algorithms for Nested-Loop Join appeared.

3.Index Nested-Loop Join (Index Nested Loop Join)

The optimization idea of ​​Index Nested-Loop Join is mainly for the purpose 减少内层表数据的匹配次数, so it must be on the driven table 有索引. The outer table matching conditions are directly matched with the inner table index to avoid comparison with each record of the inner table, which greatly reduces the number of matches to the inner table.

Insert image description here
Each record in the driving table is accessed through the index of the driven table. Because the cost of index query is relatively fixed, the MySQL optimizer tends to use a table with a small number of records as the driving table (appearance).
Insert image description here
If the driven table is indexed, the efficiency is very high, but if the index is not a primary key index, a table query must be performed. In comparison, the index of the driven table is a primary key index, which will be more efficient.

4.Block Nested-Loop Join (block nested loop connection)

If an index exists, the index method will be used to join. If the join column does not have an index, the driven table will need to be scanned too many times. Every time the driven table is accessed, the records in the table will be loaded into the memory, and then a record is taken from the driver table to match it. After the match is completed, the memory is cleared, and then a record is loaded from the driver table, and then the driven table is loaded. The records in the table are loaded into memory and matched again and again, which greatly increases the number of IOs. In order to reduce the number of IOs on the driven table, the Block Nested-Loop Join method appeared.

It no longer obtains the data of the driving table one by one, but obtains it piece by piece. It is introduced join buffer缓冲区to 驱动表joincache the relevant part of the data columns (the size is limited by the join buffer) into the join buffer, and then scan the entire table of each - of the driven table. Records— match all driver table records in the join buffer内存中操作 at once ( ), merging multiple comparisons in a simple nested loop into one, reducing the access frequency of the driven table.

Notice:

Not only the columns of the related table are cached here, but also the columns after the select are cached. (The driver table is saved)

In a SQL with N join associations, N-1 join buffers will be allocated. Therefore, when querying, try to reduce unnecessary fields so that more columns can be stored in the join buffer.

Insert image description here
Insert image description here
parameter settings:

  • block_nested_loop

    Pass show variables like '%optimizer_switch%'查看block_nested_loop` state. The default is enabled. . - -

  • join_buffer_size

    Whether the driver table can be loaded at once depends on whether the join buffer can store all the data. By default join_buffer_size=256k.

    mysql> show variables like '%join_buffer%';
    +------------------+--------+
    | Variable_name    | Value  |
    +------------------+--------+
    | join_buffer_size | 262144 |
    +------------------+--------+
    1 row in set (0.00 sec)
    

    The maximum value of join_buffer_size can apply for 4G in 32-bit systems, and can apply for Join Buffer space larger than 4G under 64-bit operating systems (except for 64-bit Windows, the maximum value will be truncated to 4GB and a warning will be issued).

5.Join summary

1. Overall efficiency comparison: INLJ > BNLJ > SNLJ

2. Always use a small result set to drive a large result set (the essence of which is to reduce the amount of data in the outer loop) (the small unit of measurement refers to the number of table rows * the size of each row)

# straight_join 不然优化器优化谁是驱动表  驱动表 straight_join 被驱动表
# 这个例子是说t2 的列比较多,相同的join buffer 加的会比较少。所以不适合用t2 作为  !!!驱动表
select t1.b,t2.* from t1 straight_join t2 on (t1.b=t2.b) where t2.id<=180;#推荐

select t1.b,t2.* from t2 straight_join t1 on (t1.b=t2.b) where t2.id<=100;#不推荐

3. Add indexes to the matching conditions of the driven table (reduce the number of loop matching times for the inner table)

4. Increase the join buffer size (the more data is cached at one time, the fewer times the inner package scans the table)

5. Reduce 驱动表unnecessary field queries (the fewer fields, the more data cached by the join buffer)

6. When deciding which table should be used as the driving table, the two tables should be filtered according to their own conditions. After the filtering is completed, the total data volume of each field participating in the join is calculated. The table with the smaller data volume is the "small table" , should be used as a driver table.

3.5 Summary

  • Ensure that the JOIN field of the driven table has an index created
  • For fields that require JOIN, the data types must be kept absolutely consistent.
  • When LEFT JOIN, select the small table as the driving table 大表作为被驱动表. Reduce the number of outer loops.
  • When INNER JOIN, MySQL will automatically 小结果集的表选为驱动表. Choose to trust MySQL optimization strategies.
  • If you can directly associate multiple tables, try to associate them directly without using subqueries. (Reduce the number of queries)
  • It is not recommended to use subqueries. It is recommended to split the subquery SQL and combine the program with multiple queries, or use JOIN instead of subqueries.
  • The derived table cannot create an index.

3.5.Hash Join

BNLJ will be abandoned starting from MySQL version 8.0.20, because hash join has been added since MySQL version 8.0.18 and will use hash join by default.

Insert image description here

  • Nested Loop:
    Nested Loop is a better choice when the subset of data being connected is small.

  • Hash Join is 大数据集连接a common way to do it. The optimizer uses the smaller (relatively small) table of the two tables to create it in memory using the Join Key 散列表, and then scans the larger table and detects the hash table to find a match with the Hash table. OK.

    • This method is suitable for situations where the smaller table can be placed completely in memory, so that the total cost is the sum of the cost of accessing the two tables.

    • When the table is too large and cannot be completely put into the memory, the optimizer will divide it into parts 若干不同的分区. The part that cannot be put into the memory will be written to the temporary segment of the disk. In this case, a larger temporary segment is required. Thus, the I/O performance can be improved as much as possible.

    • It works well in environments with large tables without indexes and parallel queries, and provides the best performance. Most people say it's Join's heavy duty lift. Hash Join can only be applied to equivalent joins (such as WHERE A.COL1=B.COL2), which is determined by the characteristics of Hash.

Insert image description here

4. Subquery optimization

MySQL supports subqueries starting from version 4.1. You can use subqueries to perform nested queries of SELECT statements, that is, the results of one SELECT query serve as the conditions for another SELECT statement. 子查询可以一次性完成很多逻辑上需要多个步骤才能完成的SQL操作.

**Subquery is an important function of MySQL, which can help us implement more complex queries through a SQL statement. However, the execution efficiency of subqueries is not high. **reason:

① When executing a subquery, MySQL needs the query result of the inner query statement 建立一个临时表, and then the outer query statement queries records from the temporary table. After the query is completed, then 撤销这些临时表. This will consume too much CPU and IO resources and generate a large number of slow queries.

② The result set of the subquery is stored in a temporary table, whether it is a memory temporary table or a disk temporary table 不会存在索引, so query performance will be affected to a certain extent.

③ For subqueries that return a larger result set, the impact on query performance will be greater.

In MySQL, you can use join (JOIN) queries instead of subqueries . It is not required for join queries 建立临时表, and 速度比子查询要快if an index is used in the query, the performance will be better.

Example 1: Query the information of students who are monitors in the student table

  • Use subquery

    #创建班级表中班长的索引
    CREATE INDEX idx_monitor ON class ( monitor ) ;
    EXPLAIN SELECT *FROM student stu1
    WHERE stu1.stuno IN(
    SELECT monitor
    FROM class c
    WHERE monitor IS NOT NULL);
    
  • Recommendation: Use multi-table query

    EXPLAIN SELECT stu1.* FROM student stu1 JOIN class c
    ON stu1.stuno = c.monitor
    WHERE c.monitor IS NOT NULL;
    

Example 2: Take all students who are not class president·Not recommended

  • subquery

    EXPLAIN SELECT SQL_NO_CACHE a.* FROM student a
    WHERE a.stuno NOT IN (
    SELECT monitor FROM class bWHERE monitor IS NOT NULL);
    
  • Modify to multi-table query

    EXPLAIN SELECT SQL_NO_CACHE a.*
    FROM student a LEFT OUTER JOIN class b ON a. stuno =b.monitor
    WHERE b.monitor IS NULL;
    

Conclusion: Try not to use NOT IN or NOT EXISTS, use LEFT JOIN Xxx ON xx WHERE xx IS NULL instead

5. Sorting optimization

5.1 Sorting optimization

Question: Add an index to the WHERE condition field but why do you need to add an index to the ORDER BY field?

answer:

In MySQL, two sorting methods are supported, namely FileSortand Indexsorting.

  • In Index sorting, the index can ensure the orderliness of the data, and there is no need to sort 效率更高.
  • FileSort sorting is 内存中generally sorting and occupying CPU较多. If the results to be sorted are large, temporary file I/O will be performed on the disk for sorting, which is inefficient.

Optimization suggestions:

  1. In SQL, you can use indexes in the WHERE clause and ORDER BY clause, with the purpose of being in the WHERE clause 避免全表扫描and in the ORDER BY clause 避免使用FileSort排序. Of course, in some cases, full table scan or FileSort sorting is not necessarily slower than indexing. But in general, we still have to avoid it to improve query efficiency.
  2. Try to use Index to complete ORDER BY sorting. If WHERE and ORDER BY are followed by the same column, use a single index column; if they are different, use a joint index.
  3. When Index cannot be used, the FileSort method needs to be tuned.

5.2 Testing

Delete the created indexes in the student table and class table.

#方式1:
DROP INDEX idx_monitor ON class;

DROP INDEX idx_cid ON student;
DROP INDEX idx_age ON student;DROP INDEX idx_name ON student ;
DROP INDEX idx_age_name_classid ON student ;DROP INDEX idx_age_classid_name ON student ;

#方式2:
call proc_drop_index( 'test' , 'student' );

Can the following index be used? Can it be removed?using filesort

Process one:

EXPLAIN SELECT SQL_NO_CACHE * FROM student ORDER BY age,classid;

EXPLAIN SELECT SQL_NO_CACHE * FROM student ORDER BY age,classid limit 10;

Insert image description here

Process 2: There is no limit when ordering by, and the index becomes invalid.

#创建索引
CREATE INDEX idx_age_classid_name ON student (age,classid, NAME);
#不限制,索引失效
EXPLAIN SELECT SQL_NO_CACHE * FROM student ORDER BY age ,classid ;

Insert image description here

The optimizer here feels that it still needs to be returned to the table. It will take more time without indexing.

Try using a covering index
Insert image description here

There is no need to return the table, the optimizer thinks that indexing is faster. Just use index.

Add limit condition
Insert image description here

Increase the limit to reduce the number of table returns. The optimizer thinks that indexing is faster and will use the index.

Process 3: The order is wrong when order by, and the index is invalid.

CREATE INDEX idx_age_classid_stuno ON student (age,classid,stuno) ;

#以下哪些索引失效?

# 不会走,最左前缀原则
EXPLAIN SELECT* FROM student ORDER BY classid LIMIT 10;

# 不会走,最左前缀原则
EXPLAIN SELECT* FROM student ORDER BY classid,NAME LIMIT 10;

# 走
EXPLAIN SELECT* FROM student ORDER BY age,classid, stuno LIMIT 10;
# 走
EXPLAIN SELECT *FROM student ORDER BY age,classid LIMIT 10;
# 走
EXPLAIN SELECT * FROM student ORDER BY age LIMIT 10;

Process 4: The rules are inconsistent during order by, and the index is invalid (wrong order, no indexing; reverse direction, no indexing)

# age desc 方向反 索引失效
EXPLAIN SELECT * FROM student ORDER BY age DESC, classid ASC LIMIT 10;

# 没有最左前缀 索引失效
EXPLAIN SELECT * FROM student ORDER BY classid DESC, NAME DESC LIMIT 10;

# age asc 没问题 classid desc 降序, 优化器认为,文件排序比较快索引失效
# 方向反了不走索引
EXPLAIN SELECT * FROM student ORDER BY age ASC, classid DESC LIMIT 10;

# Backward index scan 走索引了,倒着走索引
EXPLAIN SELECT * FROM student ORDER BY age DESC, classid DESC LIMIT 10; 

Process 5: No filtering, no indexing

EXPLAIN SELECT * FROM student WHERE age=45 ORDER BY classid;

EXPLAIN SELECT * FROM student WHERE age=45 ORDER BY classid , name;

Insert image description here

EXPLAIN SELECT *FROM student WHERE classid=45 order by age;

EXPLAIN SELECT * FROM student WHERE classid=45 order by age limit 10;

Insert image description here
The first sort here is Using filesort, which is easy to understand.

Why isn’t Article 2 Using filesort ?

Here type = index, key = idx_age_classid_name. This means that the optimizer expects a complete traversal of the idx_age_classid_name index. Because the index itself is stored in ascending order according to age. . So as long as the first ten classid=45 are encountered during the traversal process. You can stop traversing. Return data to the table.

summary:

INDEX a_b_c( a, b,c)

order by 能使用索引最左前缀
- ORDER BY a
- ORDER BY a, b
- ORDER BY a , b, c
- ORDER BY a DESC, b DESC,c DESC


# 如果WHERE使用索引的最左前缀定义为常量,则order by 能使用索引
- WHERE a = const ORDER BY b, c
- WHERE a = const AND b = const ORDER BY c
- WHERE a = const ORDER BY b, c
- WHERE a = const AND b > const ORDER BY b , c

# 不能使用索引进行排序
- ORDER BY a ASC, b DESC, c DESC/*排序不一致*/
- WHERE g = const ORDER BY b,c/*丢失a索引*/
- WHERE a = const ORDER BY c/*丢失b索引*/
- WHERE a = const ORDER BY a, d /*d不是索引的一部分*/
- WHERE a in (...) ORDER BY b,c /*对于排序来说,多个相等条件也是范围查询*/

Only one index will be used. There is no way to use one index for where and one index for order by.

But joint indexes can be created.

5.3 Case practice

In the ORDER BY clause, try to use Index sorting and avoid using FileSort sorting.

Before executing the case, clear the index on the student, leaving only the primary key:

DROP INDEX idx_age ON student;
DROP INDEX idx_age_classid_stuno ON student;DROP INDEX idx_age_classid_name ON student;
#或者
call proc_drop_index( 'test' , ' student' ) ;

show index from student;

Scenario: Query students who are 30 years old and whose student number is less than 101000, sorted by user name

EXPLAIN SELECT SQL_NO_CACHE * FROM student WHERE age = 30 AND stuno <101000 ORDER BY NAME;

Insert image description here

mysql>  SELECT SQL_NO_CACHE * FROM student WHERE age = 30 AND stuno <101000 ORDER BY NAME;
+-----+--------+--------+------+---------+
| id  | stuno  | name   | age  | classId |
+-----+--------+--------+------+---------+
| 417 | 100417 | bBAYtX |   30 |     159 |

....

| 372 | 100372 | xwODCc |   30 |     764 |
+-----+--------+--------+------+---------+
18 rows in set, 1 warning (0.17 sec)

Conclusion: type is ALL, which is the worst case. It also appears in Extra Using filsort, which is also the worst case scenario. Optimization is a must.

Optimization ideas:

Option 1: In order to remove filesort, we can build the index

#创建新索引
CREATE INDEX idx_age_name ON student(age , NAME);

EXPLAIN SELECT SQL_NO_CACHE * FROM student WHERE age = 30 AND stuno <101000 ORDER BY NAME;

Insert image description here
Option 2: Try to use the upper index for filtering conditions and sorting where

create index idx_age_stuno_name on student(age,stuno,name);
EXPLAIN SELECT SQL_NO_CACHE * FROM student WHERE age = 30 AND stuno <101000 ORDER BY NAME;

Insert image description here
Although the following solution is used Using filesort, it is faster.

reason:

All sorting is performed after conditional filtering. Therefore, if the conditions filter out most of the data, sorting the remaining hundreds or thousands of data is not very performance consuming. Even if the index optimizes the sorting, the actual performance improvement is very limited. Compared with the condition of stuno<101000, if the index is not used, tens of thousands of pieces of data must be scanned, which is very performance-intensive, so placing the index on this field is the most cost-effective and optimal choice.

Conclusion:
1.Two indexes exist at the same time, and MySQL automatically selects the optimal solution. (For this example mysql selects idx_age_stuno_name). but,随着数据量的变化,选择的索引也会随之变化的。

2. When there is a choice between [range condition] and [group by or order by] fields, priority is given to observing the filtering quantity of the condition field.
If and there is not much data to be sorted, priority is given to indexing on the range field. vice versa.

Thinking: Here we use the following index, is it feasible?

DROP INDEX idx_age_stuno_name ON student;

# 当然可以了,因为3个也只是用到了两个索引
CREATE INDEX idx_age_stuno ON student(age , stuno ) ;

5.4 filesort algorithm: two-way sorting and one-way sorting

If the sorted field is not on the index column, there filesortwill be two algorithms: two-way sorting and one-way sorting.

Two-way sort (slow)

  • MySQL 4.1之前是使用双路排序, literally means scanning the disk twice, finally getting the data, reading the row pointer and order by column, sorting them, then scanning the sorted list, and re-reading the corresponding data output from the list according to the value in the list
  • Get the sorting field from the disk, sort it in the buffer, and then get other fields from the disk.

To get a batch of data, the disk needs to be scanned twice. As we all know, IO is very time-consuming, so after mysql4.1, a second improved algorithm appeared, which is single-way sorting.

One-way sorting (fast)

Read the query from disk and 所有列sort them in the sort_buffer (sort cache) buffer according to the order by column, but it will use more space because the single-way sorting efficiency is faster and avoids a second read. data. And the random IO is changed into sequential IO, and the rows are saved in the memory.

Conclusions and raised questions

  • Since the single path comes out from behind, it is generally better than the dual path.
  • But there is a problem with using single channel
    • In sort_buffer, single-channel is better than multi-channel 多占用更多空间, because single-channel takes out all fields, so the total size of the data taken out may exceed the capacity sort_buffer, resulting in that only data of the capacity can be taken each time sort_bufferfor sorting (create tmp file, multi-channel merge), after sorting, take the sort_buffer capacity, and then sort... resulting in multiple I/Os.
    • The single channel originally wanted to save one I/O operation, 反而导致了大量的I/O操作but the gain outweighed the gain.

Optimization Strategy

1. Try to increase sort_buffer_size

  • No matter which algorithm is used, increasing this parameter will improve efficiency. It should be improved according to the system's capabilities, because this parameter is
    adjusted between 1M-8M for each process (connection). MySQL5.7, the default value of InnoDB storage engine is 1048576 bytes, 1MB.

    mysql> SHOW VARIABLES LIKE '%sort_buffer_size%';
    +-------------------------+---------+
    | Variable_name           | Value   |
    +-------------------------+---------+
    | innodb_sort_buffer_size | 1048576 |
    | myisam_sort_buffer_size | 8388608 |
    | sort_buffer_size        | 262144  |
    +-------------------------+---------+
    3 rows in set (0.00 sec)
    

2. Try to increase max_length_for_sort_data

  • Increasing this parameter will increase the probability of improving the algorithm.

    mysql> SHow VARIABLES LIKE '%max_length_for_sort_data%';
    +--------------------------+-------+
    | Variable_name            | Value |
    +--------------------------+-------+
    | max_length_for_sort_data | 4096  |
    +--------------------------+-------+
    1 row in set (0.00 sec)
    
  • But if it is set too high, sort_buffer_sizethe probability of exceeding the total data capacity increases. The obvious symptoms are high disk I/O activity and low processor usage. max_length_for_sort_dataUse if the total length of columns needed to be returned is greater 双路算法, otherwise use the one-way algorithm. Adjustable between 1024-8192 bytes.

**3.Select* is a taboo when ordering by. It's best to query only the fields you need. **reason:

  • When the sum of Query's field sizes is less than that max_length_for_sort_data, and the sorting field is not of type TEXT|BLOB, the improved algorithm - single-way sorting will be used, otherwise the old algorithm - multi-way sorting will be used.
  • The data of both algorithms may exceed the capacity of sort_buffer_size. After exceeding, tmp files will be created for merge sorting, resulting in multiple I/Os. However, the risk of using the single-way sorting algorithm will be greater, so it should be improved sort_buffer_size.

6.GROUP BY optimization

  • The principle of using indexes in group by is almost the same as that in order by. Even if group by does not use the index as a filter condition, it can also use the index directly. .
  • group by sorts first and then groups, following the best left prefix rule for index construction
  • When the index column cannot be used, increase max_length_for_sort_datathe sort_buffer_sizeparameter setting
  • The efficiency of where is higher than that of having. If the conditions can be written in where, do not write them in having.
  • Reduce the use of order by. When communicating with business, you can sort without sorting, or put the sorting in the program.
  • Statements such as Order by, group by, and distinct consume more CPU, and the CPU resources of the database are extremely precious.
  • Contains query statements such as order by, group by, and distinct. The result set filtered by the where condition should be kept within 1,000 rows, otherwise SQL will be very slow.

7. Optimize paging queries

In general, when performing paging queries, the performance can be better improved by creating a covering index. A common and very troublesome problem is limit 2000000,10. At this time, MySQL needs to sort the first 2000010 records, and only returns 2000000 - 2000010 records. The other records are discarded, and the cost of query sorting is very high.

EXPLAIN SELECT * FROM student LIMIT 2088800,10;

Optimization idea 1:
Complete sorting and paging operations on the index, and finally associate back to other column contents required by the original table query based on the primary key.

EXPLAIN SELECT * FROM student t, ( SELECT id FROM student ORDER BY id LIMIT 2000000,10) a WHERE t.id = a.id;

Insert image description here
Optimization idea two (almost unusable)

This solution is suitable for tables with auto-incrementing primary keys, and can convert Limit queries into queries at a certain location.

EXPLAIN SELECT * FROM student WHERE id > 2080880 LIMIT 10;

Insert image description here

It's unreliable, the ID may be deleted during production, and the query conditions cannot be that simple.

8. Prioritize covering indexes

8.1 What is a covering index?

Understanding method one : Index is a way to find rows efficiently, but general databases can also use indexes to find data in a column, so it
does not have to read the entire row. After all, index leaf nodes store the data they index; when the desired data can be obtained by reading the index
, there is no need to read the rows. An index that contains data that satisfies the query results is called a covering index.

**Understanding method 2:** A form of non-clustered composite index, which includes all columns used in the SELECT, JOIN and WHERE clauses in the query (that is, the fields to be indexed are exactly those involved in the query conditions
) fields).

Simply put, 索引列+主键it contains SELECT 到 FROM之间查询的列.

**Example 1:**What a covering index looks like.索引列+主键

#斯降之前的索引
DROP INDEX idx_age_stuno ON student ;
CREATE INDEX idx_age_name ON student (age , NAME);

EXPLAIN SELECT * FROM student WHERE age <>20;

Insert image description here

EXPLAIN SELECT id, age , NAME FROM student WHERE age <> 28;

Insert image description here
The declared indexes are all used in the above. This is not the case in the following situation. There is an additional column of classid in the query column, indicating that the index is not used:

EXPLAIN SELECT id, age , NAME,classid FROM student WHERE age <> 28;

Insert image description here
Example two:

EXPLAIN SELECT *FROM student WHERE NAME LIKE '%abc';

Insert image description here

CREATE INDEX idx_age_name ON student (age , NAME);
EXPLAIN SELECT id, age ,NAME FROM student WHERE NAME LIKE '%abc ';

Insert image description here

# 索引覆盖失效
EXPLAIN SELECT id, age ,NAME,classid FROM student WHERE NAME LIKE '%abc ';

There are too many classids in the query, and the result is that the index is not used.
Insert image description here

As mentioned before, inequality and left ambiguity will cause index failure. But why is it used again here? The reason is that the optimizer discovered that the data is already indexed. Data can be returned by traversing the index directly. . Traversing the index is definitely less data than traversing the entire table. This way IO can be less.

Everything is a consideration of cost.

8.2 Pros and cons of covering indexes

Benefits:
1. Avoid secondary query of Innodb table index (table return)

Innodb is stored in the order of clustered index. For Innodb, the secondary index saves the primary key information of the row in the leaf node. If the secondary index is used to query the data, after finding the corresponding key value, It is necessary to perform a secondary query through the primary key to obtain the data we actually need.

In the covering index, the required data can be obtained from the key value of the secondary index, 避免了对主键的二次查询,减少了IO操作which improves query efficiency.

2. Can turn random IO into sequential IO to speed up query efficiency

随机读取的IOSince the covering index is stored in the order of key values, for IO-intensive range searches, the IO required to read each row of data randomly from the disk is much less. Therefore, using the covering index can also change the disk during access. Searched into an index 顺序IO.

3. The amount of data in the index is smaller and more compact.

The index must be smaller than the original data. . This can reduce IO.

Because covering indexes can reduce the number of tree searches and significantly improve query performance, using covering indexes is a common performance optimization method.

Disadvantages:

索引字段的维护There is always a price. Therefore, there are trade-offs to consider when building redundant indexes to support covering indexes. This is the job of the business DBA, or business data architect.

9. How to add index to string

There is a teacher table, which is defined as follows:

create table teacher(
ID bigint unsigned primary key,
email varchar(64),
...
)engine=innodb;

The lecturer needs to log in using his email address, so a statement similar to this must appear in the business code:

mysql> select col1, col2 from teacher where email='xxx';  

If there is no index on the email field, then this statement can only be done 全表扫描.

9.1 Prefix index

MySQL supports prefix indexes. By default, if you create an index without specifying a prefix length, the index will contain the entire string
.

mysql> alter table teacher add index index1(email);
#或
mysql> alter table teacher add index index2(email(6))

What are the differences in data structure and storage between these two different definitions? The following figure is a schematic diagram of these two indexes
Insert image description here
.
Insert image description here
If index1 is used (that is, the index structure of the entire email string), the execution sequence is as follows:

  1. Find the record whose index value is '[email protected]' from the index1 index tree, and obtain the value of ID2;
  2. Go to the primary key and find the row whose primary key value is ID2, judge that the value of email is correct, and add this row of records to the result set;
  3. Get the next record at the just found position in the index1 index tree, and find that
    the condition of email='[email protected]' is no longer met, and the loop ends.

During this process, you only need to retrieve data from the primary key index once, so the system thinks that only one row has been scanned.

If you are using index2 (i.e. email(6) index structure), the execution sequence is as follows:

  1. Find records that satisfy the index value 'zhangs' from the index2 index tree, and the first one found is ID1;
  2. Go to the primary key and find the row whose primary key value is ID1. It is judged that the value of email is not ' [email protected] ', and this row of records is discarded;
  3. Get the next record at the position just found on index2 and find that it is still 'zhangs'. Take out ID2, then get the entire row on the ID index and judge that the
    value is correct this time. Add this row of records to the result set;
  4. Repeat the previous step until the value obtained on idxe2 is not 'zhangs', and the loop ends.

In other words, using the ** prefix index and defining the length can save space without adding too much additional query cost. **
I have already talked about discrimination before. The higher the discrimination, the better. Because the higher the distinction, the fewer duplicate key values.

9.2 The impact of prefix index on covering index

in conclusion:

Using a prefix index eliminates the need for a covering index to optimize query performance. This is also a factor you need to consider when choosing whether to use a prefix index.

10. Index pushdown

10.1 Comparison before and after use

Index Condition Pushdown (ICP) is a new feature in MySQL 5.6. It is an optimization method that uses indexes to filter data at the storage engine layer.

  • Without ICP, the storage engine traverses the index to locate rows in the base table and returns them to the MySQL server, which evaluates WHEREsubsequent conditions to retain the rows.
  • After ICP is enabled, if some WHEREconditions can be filtered using only columns in the index, the MySQL server will put these WHEREconditions into the storage engine for filtering. The storage engine then filters the data by using the index entries and reads rows from the table only if this condition is met.
    • Benefits: ICP can reduce the number of times the storage engine must access the base table and the number of times the MySQL server must access the storage engine.
    • However, ICP depends on the proportion of data passing 加速效果through the storage engine .ICP筛选

example:

key1 has an index
Insert image description here

Here, the conditions like '%a'
can actually be in the index to calculate which ones meet the conditions. . . . Filter out those that meet the conditions and return to the table. In this way, the data returned to the table can be reduced a lot. Another advantage is that without index pushdown, you need to return all the data to the table to find it out. These data may be in different pages, which will cause IO

The condition is pushed down until the next condition is not met.

10.2 Turning on/off ICP

  • Index conditional pushdown is enabled by default. This can be controlled by setting system variables optimizer_switch:index_condition_pushdown

    #打开索引下推
    SET optimizer_switch = 'index_condition_pushdown=off ' ;
    #关闭索引下推
    SET optimizer_switch = 'index_condition_pushdown=on ' ;
    
    
  • When using index conditions to push down, EXPLAINthe contents of the Extra column in the statement output result are displayed as Using index condition.

10.3ICP use cases

Create table

CREATE TABLE `people` (
	`id` INT NOT NULL AUTO_INCREMENT,
	`zipcode` VARCHAR ( 20 ) COLLATE utf8_bin DEFAULT NULL,
	`firstname` varchar(20)COLLATE utf8_bin DEFAULT NULL,
	`lastname` varchar(20) COLLATE utf8_bin DEFAULT NULL,
	`address` varchar (50)COLLATE utf8_bin DEFAULT NULL,
	PRIMARY KEY ( `id`),
KEY `zip_last_first`( `zipcode` , `lastname`, `firstname`)
)ENGINE=InnoDB AUTO_INCREMENT=5 DEFAULT CHARSET=utf8mb3 COLLATE=utf8_bin;

Insert data

INSERT INTO `people` VALUES
( '1', '000001','三','张','北京市'),
 ( '2', '000002 ','四','李','南京市'),
 ( '3', '000003', '五','王','上海市'),
 ( '4 ', '000001','六','赵','天津市');

Define the joint index zip_last_first (zipcode, lastname, firstname) for this table. If we know a person's zip code, but are not sure of the person's last name, we can do the following search:

SELECT *FROM people
WHERE zipcode= '000001'
AND lastname LIKE '%张%'
AND address LIKE '%北京市%';

Insert image description here

Execute the query plan to view the SQL, which is displayed in Extra Using index condition, which indicates that index pushdown is used. In addition, Usingwhereit means that the condition contains data of non-index columns that need to be filtered, that is, the condition address LIKE '%Beijing City%' is not an index column and needs to be filtered out on the server side.

10.4 Performance comparison of turning ICP on and off

The main purpose of creating a stored procedure is to insert a lot of 000001 data, so that it can be filtered at the storage engine layer and reduce IO when querying, and also to reduce the role of the buffer pool (cache data pages, no IO).

DELIMITER //
CREATE PROCEDURE insert_people( max_num INT )
BEGIN
DECLARE i INT DEFAULT 0;
	SET autocommit = 0;
	REPEAT
	SET i = i + 1;
	INSERT INTo people ( zipcode, firstname , lastname , address ) VALUES ( '000001','六', '赵','天津市');

	UNTIL i = max_num
	END REPEAT;
	COMMIT;
END //
DELIMITER ;


Call stored procedure

call insert_people(1000000);

Open first profiling.

#查看
mysql> show variables like 'profiling%';
+------------------------+-------+
| Variable_name          | Value |
+------------------------+-------+
| profiling              | OFF   |
| profiling_history_size | 15    |
+------------------------+-------+
set profiling=1 ;

Execute the SQL statement, and index pushdown is turned on by default.

SELECT * FROM people WHERE zipcode= '000001' AND lastname LIKE '%张%';

Execute the SQL statement again without using index pushdown

SELECT /*+ no_icp (people) */ * FROM people WHERE zipcode='000001' AND lastname LIKE '%张%';

View all profiles generated by the current session

show profiles\G ;

The results are as follows.

Insert image description here
Insert image description here
Comparing the efficiency of multiple tests, the query efficiency using ICP optimization will be better. It is recommended here to store more data for more obvious effects.

10.1 Scanning process before and after use

In the process of not using ICP index scanning:

Storage layer: Only the entire row of records corresponding to the index records that meet the index key conditions are taken out and returned to the server layer.

Server layer: Use the subsequent where condition to filter the returned data until the last row is returned.

Insert image description here
Insert image description here
The process of using ICP scanning:
storage layer:
first determine the index record interval that satisfies the index key condition, and then use index filter to filter on the index. Only the index records that meet the index
filter conditions are returned to the table and the entire row of records is returned to the server layer. Index records that do not meet the index filter conditions are discarded and
will not be returned to the table or server layer.
Server layer:
Use table filter conditions for final filtering of the returned data.
Insert image description here
Insert image description here
Cost difference before and after use. Before use
, the storage layer returned many rows of records that needed to be filtered out by the index filter.
After using ICP, records that did not meet the index filter conditions were directly removed, eliminating the need for them to be returned to the table and passed to the server layer. the cost of.
The acceleration effect of ICP depends on the proportion of data filtered out by ICP in the storage engine.

10.5 Conditions of use of ICP

  1. ICP can be used if the table access types are range, ref, eq_ref and ref_or_null.

  2. ICP can be used for InnoDBand MyISAMtables, including partitioned tables InnoDBand MyISAMtables

  3. For InnoDBtables, ICPonly used for secondary indexes. The goal of ICP is to reduce the number of full-row reads, thereby reducing I/O operations.

  4. ICP is not supported when SQL uses covering indexes. Because using ICP in this case will not reduce I/O.

    Index coverage cannot be used. One reason is that index coverage does not require table backing. . The function of ICP is to reduce the return to the table. ICP needs to return the table.

  5. Correlated subquery conditions cannot use ICP

11. Ordinary index vs unique index

From a performance perspective, should you choose a unique index or a normal index? What is the basis for selection?

Assume that we have a table with the primary key column as ID. There is field k in the table, and there is an index on k. Assume that the values ​​in field k are not repeated.

The table creation statement for this table is

mysql> create table test(
id int primary key,
k int not null,
name varchar(16),
index (k)
)engine=InnoDB;

The (ID,k) values ​​of R1~R5 in the table are (100,1), (200,2), (300,3), (500,5) and (600,6) respectively.

11.1 Query process

Assume that the statement to execute the query is select id from test where k=5.

  • For ordinary indexes, after finding the first record (5,500) that meets the condition, you need to find the next record until you encounter the first
    record that does not meet the k=5 condition.
  • For a unique index, since the index defines uniqueness, the search will stop after the first record that meets the condition is found
    .

So, what is the performance gap caused by this difference? The answer is, 微乎其微.

11.2 Update process

In order to illustrate the impact of ordinary indexes and unique indexes on update statement performance, let's introduce the change buffer.

When a data page needs to be updated, if the data page is in the memory, it is updated directly. If the data page is not yet in the memory, without affecting the data
consistency, InooDB会将这些更新操作缓存在change buffer中there is no need to read from
the disk Enter this data page. When the next query needs to access this data page, read the data page into memory, and then perform
operations related to this page in the change buffer. In this way, the correctness of the data logic can be ensured.

The process of applying the operations in the change buffer to the original data page and obtaining the latest results is called merge.
In addition to triggering merge when accessing this data page , the system has 后台线程会定期merge. In 数据库正常关闭(shutdown)the process, the merge
operation will also be performed.

If the update operation can be recorded in the change buffer first, 减少读磁盘the execution speed of the statement will be significantly improved. Moreover,
reading data into memory requires occupying the buffer pool, so this method can also 避免占用内存improve memory utilization.
唯一索引的更新就不能使用change buffer, in fact only ordinary indexes can be used.

If you want to insert a new record (4,400) into this table, what is the InnoDB process?

11.3 Usage scenarios of change buffer

  1. How to choose between ordinary index and unique index? In fact, there is no difference in query capabilities between these two types of indexes. The main consideration is
    the impact on update performance. Therefore, it is recommended that you try to choose ordinary indexes.

  2. In actual use, it will be found that the combined use of ordinary indexes and change buffers can
    obviously optimize the update of tables with large amounts of data.

  3. If all updates are immediately followed by queries for this record, then you should close the change buffer. In
    other cases, change buffer can improve update performance.

  4. Since unique indexes cannot use the optimization mechanism of the change buffer, if the business is acceptable, it is recommended to
    give priority to non-unique indexes from a performance perspective. But what to do if "the business may not be guaranteed"?

    -First of all, business correctness takes priority. Our premise is that "the business code has guaranteed not to write duplicate data" before discussing performance
    issues. If the business cannot guarantee this, or the business requires the database to make constraints, then you have no choice but to create a unique index.
    In this case, the significance of this section is to provide you with one more
    troubleshooting idea if you encounter a situation where inserting a large amount of data is slow and the memory hit rate is low.

    Then, in some "archive library" scenarios, you can consider using unique indexes. For example, online data only needs to be retained for half a year, and then historical data is stored in the archive. At this time, the archived data has ensured that there are no unique key conflicts. To improve archiving efficiency, you can consider changing the unique index in the table to a normal index.

12. Other query optimization strategies

12.1 The difference between EXISTS and IN

question:

I don't quite understand in which case EXISTS should be used and in which case IN should be used. Is the selection criterion based on whether the index of the table can be used?

answer:

Indexing is a prerequisite. In fact, whether to choose it depends on the size of the table. You can think of the selection criteria as 小表驱动大表. In this way the efficiency is highest.

For example:

SELECT *FROM A WHERE cc IN (SELECT cc FROM B)

SELECT *FROM A WHERE EXISTS (SELECT cc FROM B WHERE B.cc=A.cc)

When A is less than B, use EXISTS. Because the implementation of EXISTS is equivalent to an external loop, the implementation logic is similar to:

for i in A
	for j in B
		if j.cc == i.cc then ...

Use IN when B is less than A, because the implementation logic is similar to:

for i in B
	for j in A
		if j.cc == i.cc then ...

Whichever table is smaller will be used to drive it. If table A is smaller, EXISTS will be used. If table B is smaller, IN will be used.

12.2 COUNT(*) and COUNT (specific fields) efficiency

Question: There are three ways to count the number of rows in a data table in MySQL: SELECT COUNT(*), SELECT COUNT(1)and SELECT COUNT(具体字段). What is the query efficiency between these three methods?

Answer:
Premise: If you want to count the number of non-empty data rows in a certain field, it is a different matter. After all, the premise of comparing execution efficiency is that the results must be the same.

Step 1: COUNT(*) sum COUNT(1)is performed on all results COUNT, COUNT(*)and COUNT(1)there is essentially no difference between sum and sum (the execution time of the two may be slightly different, but you can still regard their execution efficiency as equal). If there is a WHERE clause, all data rows that meet the filtering conditions are counted; if there is no WHERE clause, the number of data rows in the data table is counted.

Step 2:o(1) If it is a MyISAM storage engine, the complexity of counting the number of rows in the data table is only 1. This is because each MyISAM data table has a meta information to store row_countthe value, and the consistency is guaranteed by table-level locks.

If it is an InnoDB storage engine, because InnoDB supports transactions and uses row-level locks and MVCC mechanisms, it cannot maintain a row_count variable like MyISAM. Therefore, it needs to be completed by looping + counting with O(n) complexity 扫描全表. statistics.

**Link (key point) 3:**In the InnoDB engine, if you use COUNT(具体字段)to count the number of data rows, try to use secondary indexes. Because the index used for the primary key is a clustered index, the clustered index contains a lot of information and is obviously larger than the secondary index (non-clustered index). For COUNT(*)and COUNT(1), they do not need to find specific rows, but only count the number of rows. The system will 自动use a secondary index that takes up less space to perform statistics.

If there are multiple secondary indexes, key_lenthe smaller secondary index will be used for scanning. When there is no secondary index, the primary key index will be used for statistics.

12.3 About SELECT(*)

In table queries, it is recommended to specify the fields. Do not use * as the field list of the query. It is recommended to use SELECT <field list> query. reason:

① During the parsing process, MySQL will 查询数据字典convert "*" into all column names in order, which will greatly consume resources and time
.

② Unable to use覆盖索引

12.4 Impact of LIMIT 1 on optimization

It is aimed at SQL statements that will scan the entire table. If you can be sure that there is only one result set, then adding LIMIT 1
will not continue scanning when a result is found, which will speed up the query.

If the data table has established a unique index for the field, you can query through the index. If the entire table is not scanned, there is no need to add
it LIMIT 1.

12.5 Use COMMIT more

Whenever possible, use COMMIT as much as possible in the program, so that the performance of the program will be improved and the demand will be
reduced due to the resources released by COMMIT.

Resources released by COMMIT:

  • Information on the rollback segment used to recover data
  • Lock acquired by program statement
  • Space in redo / undo log buffer
  • Manage internal spending on the above 3 resources

13. How is the primary key designed in Taobao database?

Let’s talk about a practical question: How is the primary key of Taobao’s database designed?

Some outrageously wrong answers are still circulating on the Internet year after year, and have even become the so-called MySQL military regulations. Among them, one of the most obvious mistakes is about MySQL's primary key design.

Most people's answer is so confident: use 8-byte BIGINT as the primary key instead of INT. !

Such an answer only stands at the database level without 从业务的角度thinking about the primary key. Is the primary key an auto-incrementing ID? Standing on the New Year's Eve of 2022, it is possible to use auto-increment as the primary key in terms of architectural design 连及格都拿不到.

13.1 The problem of self-increasing ID

The auto-increment ID is used as the primary key, which is simple and easy to understand. Almost all databases support the auto-increment type, but the implementation is different. In addition to being simple, self-increasing IDs have other disadvantages. Overall, there are problems in the following aspects:

  1. Not very reliable

    There is a problem with auto-increment ID backtracking, which was not fixed until the latest version of MySQL 8.0.

  2. Not very safe

    The externally exposed interface makes it very easy to guess the corresponding information. For example: With an interface like /User/1/, it is very easy to guess
    the value of the user ID and the total number of users. It is also very easy to crawl data through the interface.

  3. Poor performance

    The performance of auto-incrementing ID is poor and needs to be generated on the database server side.

  4. Much interaction

    The business also needs to execute an additional function like last_insert_id() to know the auto-increment value just inserted, which requires one more network interaction. In a massively concurrent system, one more SQL statement will cause another performance overhead.

  5. local uniqueness

    The most important point is that the auto-incrementing ID is locally unique and is only unique in the current database instance, rather than globally unique and unique among any servers. For current distributed systems, this is a nightmare.

13.2 Business fields as primary keys

In order to uniquely identify a member's information, 会员信息表a primary key needs to be set for . So, how to set the primary key for this table to achieve our ideal goal? Here we consider the business field as the primary key.

The table data is as follows:
Insert image description here
In this table, which field is more appropriate?

  • Select card number (cardno)

    The membership card number (cardno) seems more appropriate, because the membership card number cannot be empty, and it is unique and can be used to identify a membership record.

mysql> CREATE TABLE demo.membermaster
-> (
-> cardno CHAR(8) PRIMARY KEY, -- 会员卡号为主键
-> membername TEXT,
-> memberphone TEXT,
-> memberpid TEXT,
-> memberaddress TEXT,
-> sex TEXT,
-> birthday DATETIME
-> );
Query OK, 0 rows affected (0.06 sec)

Different membership card numbers correspond to different members, and the field "cardno" uniquely identifies a certain member. If this is the case, the membership card number corresponds to the member one-to-one, and the system can operate normally.

But the actual situation is that 会员卡号可能存在重复使用. For example, Zhang San moved away from his original address due to a job change and no longer went to the merchant's store to make purchases (the membership card was returned), so Zhang San was no longer a member of the merchant's store. However, the merchant did not want to leave the membership card empty, so it sent the membership card with the card number "10000001" to Wang Wu.

From a system design perspective, this change only modifies the member information that the card number is "10000001" in the member information table, and will not affect data consistency. In other words, if you modify the membership information with the membership card number "10000001", each module of the system will obtain the modified membership information. There will be no "some modules obtain the membership information before the modification, and some modules obtain the modified membership information". member information later, resulting in inconsistencies in data within the system." Therefore, from 信息系统层面the perspective of , there is no problem.

But from 系统的业务层面the perspective of use, there are big problems, which will have an impact on businesses.

For example, we have a sales flow table (trans) that records all sales flow details. On December 1, 2020, Zhang San bought a book in the store and spent 89 yuan. Then, there is a transaction record of Zhang San buying books in the system, as shown below:
Insert image description here
Next, we query the membership sales record on December 1, 2020:

mysql> SELECT b.membername,c.goodsname,a.quantity,a.salesvalue,a.transdate
-> FROM demo.trans AS a
-> JOIN demo.membermaster AS b
-> JOIN demo.goodsmaster AS c
-> ON (a.cardno = b.cardno AND a.itemnumber=c.itemnumber);
+------------+-----------+----------+------------+---------------------+
| membername | goodsname | quantity | salesvalue | transdate |
+------------+-----------+----------+------------+---------------------+
| 张三 || 1.000 | 89.00 | 2020-12-01 00:00:00 |
+------------+-----------+----------+------------+---------------------+
1 row in set (0.00 sec)

If the membership card "10000001" is issued to Wang Wu again, we will change the membership information table. When querying results in:

mysql> SELECT b.membername,c.goodsname,a.quantity,a.salesvalue,a.transdate
-> FROM demo.trans AS a
-> JOIN demo.membermaster AS b
-> JOIN demo.goodsmaster AS c
-> ON (a.cardno = b.cardno AND a.itemnumber=c.itemnumber);
+------------+-----------+----------+------------+---------------------+
| membername | goodsname | quantity | salesvalue | transdate |
+------------+-----------+----------+------------+---------------------+
| 王五 || 1.000 | 89.00 | 2020-12-01 00:00:00 |
+------------+-----------+----------+------------+---------------------+
1 row in set (0.01 sec)

The result obtained this time is: Wang Wu bought a book on December 1, 2020, spending 89 yuan. Obviously wrong! Conclusion: Never use the membership card number as the primary key.

  • Select member phone number or ID number

Can a member's phone number be used as the primary key? No way. In actual operation, there are also 被运营商收回cases where mobile phone numbers are reissued to others.

What about the ID number? It seems possible. Because the ID card will never be repeated, there is a one-to-one correspondence between the ID number and a person. But the problem is that the ID number belongs to the customer 个人隐私and the customer may not be willing to give it to you. If it is mandatory for members to register their ID numbers, many customers will be driven away. In fact, customer phone numbers also have this problem, which is why we allow both the ID number and phone number to be empty when designing the member information table.

Therefore, it is recommended not to use business-related fields as primary keys. After all, as project design technicians, none of us can predict
which business fields will be repeated or reused during the entire life cycle of the project due to the business needs of the project.

experience:

When first starting to use MySQL, a common mistake that many people make is to use business fields as primary keys, assuming that they understand the business requirements. However, the actual situation is often unexpected, and the cost of changing the primary key settings is very high.

13.3 Taobao’s primary key design

In Taobao's e-commerce business, order service is a core business. Please tell me, 订单表的主键how is Taobao designed? Is it a self-increasing ID?

Open Taobao and take a look at the order information:

Insert image description here
As you can see from the picture above, the order number is not an auto-incrementing ID! Let’s take a closer look at the above 4 order numbers:

1550672064762308113
1481195847180308113
1431156171142308113
1431146631521308113

The order number is 19 digits long, and the last 5 digits of the order are all the same, 08113. And the first 14 digits of the order number are monotonically increasing
.
A bold guess is that Taobao’s order ID design should be:

订单ID = 时间 + 去重字段 + 用户ID后6位尾号

Such a design can be globally unique and is extremely friendly to distributed system queries.

13.4 Recommended primary key design

非核心业务: The primary key of the corresponding table automatically increments ID, such as alarm, log, monitoring and other information.

核心业务: The primary key design should be at least globally unique and monotonically increasing. Global uniqueness ensures that it is unique between systems. Monotonically increasing is expected to not affect database performance during insertion.

The simplest primary key design is recommended here: UUID.

Features of UUID:

Globally unique, occupies 36 bytes, data is out of order, and insertion performance is poor.

Get to know UUIDs:

  • Why are UUIDs globally unique?
  • Why does UUID take up 36 bytes?
  • Why are UUIDs unordered?

The UUID composition of the MySQL database is as follows:

UUID = 时间+UUID版本(16字节)- 时钟序列(4字节) - MAC地址(12字节)

Let’s take the UUID value e0ea12d4-6473-11eb-943c-00155dbaa39d as an example:

Insert image description here
Why are UUIDs globally unique?

The time part in the UUID occupies 60 bits and stores a timestamp similar to TIMESTAMP, but represents the count from 1582-10-15 00:00:00.00 to the present 100ns. It can be seen that the time precision of UUID storage is higher than that of TIMESTAMPE, and the probability of duplication in the time dimension is reduced to 1/100ns.

The clock sequence is to avoid the possibility of time duplication caused by clocks being set back. The MAC address is used to be globally unique.

Why does UUID take up 36 bytes?

UUIDs are stored as strings and are designed with useless "-" strings, so a total of 36 bytes are required.

Why are UUIDs random and unordered?

Because in the design of UUID, the low bit of time is placed first, and this part of the data is constantly changing and is disordered.

Modified UUID

If the high and low bits of time are interchanged, time will be monotonically increasing, and it will become monotonically increasing. MySQL 8.0 can change the storage method of low time bit and high time bit, so that UUID is an ordered UUID.

MySQL 8.0 also solves the space occupation problem of UUID, removes the meaningless "-" string in the UUID string, and saves the string in binary type, thus reducing the storage space to 16 bytes.

The above functions can be achieved through the uuid_to_bin function provided by MySQL8.0. Similarly, MySQL also provides the bin_to_uuid function for conversion:

SET @uuid = UUID();
SELECT @uuid,uuid_to_bin(@uuid),uuid_to_bin(@uuid,TRUE);
# uuid_to_bin(@uuid) 转成16进制存储
# uuid_to_bin(@uuid,TRUE); 修改成先高位 中位 地位,就可以保证uuid地政了

Insert image description here
**Convert UUID to ordered UUID through function uuid_to_bin(@uuid,true). **Globally unique + monotonically increasing, isn’t this the primary key we want!

4. Ordered UUID performance test

How does the performance and storage space compare between the 16-byte ordered UUID and the previous 8-byte auto-increment ID?

Let's do a test and insert 100 million pieces of data. Each piece of data occupies 500 bytes and contains 3 secondary indexes. The final result is as follows: From the above
Insert image description here
figure, you can see that inserting 100 million pieces of data in order is the most optimal UUID. It is fast, and in actual business use, ordered UUID can be generated on the business end
. It can also further reduce the number of SQL interactions.

In addition, although the ordered UUID has 8 more bytes than the auto-increment ID, it actually only increases the storage space by 3G, which is acceptable.

In today's Internet environment, database design in which self-increasing ID is used as the primary key is highly discouraged. A globally unique implementation similar to ordered UUID is more recommended.

In addition, in a real business system, the primary key can also add business and system attributes, such as the user's last number, computer room information, etc. Such a primary key design will further test the architect's level.

What should I do if it is not MySQL8.0?

Manually assign fields as primary keys!

For example, design the primary key of the membership table of each branch, because if the data generated by each machine needs to be merged, the problem of duplicate primary keys may occur.

There can be a management information table in the headquarters MySQL database. Add a field to this table to record the maximum value of the current membership number.

When the store adds a member, it first obtains the maximum value from the headquarters MySQL database, adds 1 to this basis, and then uses this value as the "id" of the new member. At the same time, it updates the current member in the headquarters MySQL database management information table. The maximum value of the number.

In this way, when each store adds members, it operates on the data table fields in the same headquarters MySQL database, which solves the problem of membership number conflicts when each store adds members.

Guess you like

Origin blog.csdn.net/zhufei463738313/article/details/130638643
Recommended