[MySQL performance optimization series] select count(*) uses secondary index hundreds of times faster than primary key index, can you believe it?

question

During the data testing process of MySQL version 5.7, it takes 20 seconds to query a table with millions of data using select count(*) and the primary key index is removed. Why does the query take so long? How to optimize? Next, we will invite the SQL involved to speak

Validation analysis

guess

First guess why the primary key index is still very slow?
insert image description here

No secondary index is created.

Smart friends will ask whether the secondary index can be faster than the primary key index? Yes, it will be much faster in the case of count statistics and large table field data.

Dry supplements.

Because in the InnoDB storage engine, the count(*) function first reads data from the memory to the memory buffer, and then scans to obtain the number of row records. InnoDB will give priority to the secondary index, and if there is no primary key index. result in a longer time-consuming.

In the MyISAM storage engine, the count() function directly reads the number of rows saved in the data table and returns it.
When the where condition is added to the count function, the effect in the two storage engines is the same, and the entire table will be scanned to calculate the number of value items in a certain field.


Clustered index: Each table under the InnoDB storage engine has a special index used to save the data of each row, called the clustered index (usually the primary key), the clustered index actually saves the B-Tree index and row data, so the size is actually approximately equal to the amount of table data.

Secondary index: In addition to the clustered index, all other indexes on the table are secondary indexes, and only the corresponding index columns and primary key columns are stored in the index.

Check out what's going on next!

verify

Next, we mainly compare the difference between the primary key index and the secondary index in the case of count(*).

0. Check the index information in the table

 show index from test;

Only the primary key index.
insert image description here

1. The SQL statement in question is as follows:

SELECT count(*) from test;

2. The execution results are as follows:

1306725
> OK
> 时间: 17.397s

3. View the execution plan:

desc SELECT count(*) from test

insert image description hereIndeed, the primary key index is gone.

4. Restart the database

windows restart mysql:

net stop mysql
net start mysql

linux:

service mysqld restart

5. View the memory buffer situation

select * from sys.innodb_buffer_stats_by_table where object_schema = 'test';

insert image description here

6. Execute again to SELECT count(*) from testcheck the memory buffer situation

insert image description here

7. Add a secondary index and repeat the above verification

ALTER TABLE `test`.`test`
ADD INDEX `idx_id`(`id`) USING BTREE;

Check the secondary index size: about 15M.

SELECT CONCAT(ROUND(SUM((data_length+index_length)/1024/1024),2),'MB') AS DATA FROM information_schema.`TABLES` WHERE table_schema='test' AND table_name='test';

The data size of the table before adding the index is
931.50MB , and the data size
of the table after adding the index is
916.98MB

implement:

SELECT count(*) from test
1306725
> OK
> 时间: 0.198s

See Execution Plan:
insert image description hereSecondary Indexes Used.
Check the buffer situation:
insert image description herethe buffer data size is basically the same as the secondary index size .

summary

The verification is consistent with the actual theory in the conjecture. In the absence of a secondary index, the select count(*)primary key index is used to cache the entire table data into the buffer. If there is a secondary index, only the index page needs to be read into the buffer, and the query speed is significantly increased by hundreds of times. The above is tested based on MySQL version 5.7. If it is MySQL 8.0, the new feature parallelinnodb_parallel_read_threads to improve the query speed again.

An example of a parallel query is as follows:

set  local  innodb_parallel_read_threads=888;
select  count (*)  from  test;

small expansion

Which one to use for count(*), count(1), count(0), and count(column name)?

Just use count(*)

Alibaba specification reference:

[Mandatory] Do not use count (column name) or count (constant) to replace count ( ), count ( ) is a standard syntax for counting rows defined by SQL92, which has nothing to do with the database, and has nothing to do with NULL and non-NULL. Explanation: count(*) will count rows whose value is NULL, but count (column name) will not count rows whose value is NULL.

[Mandatory] count(distinct col) calculates the number of unique rows except for NULL in this column. Note that count(distinct col1, col2) returns 0 if one of the columns is all NULL, even if the other column has a different value.

[Mandatory] When the values ​​of a column are all NULL, the return result of count(col) is 0, but the return result of sum(col) is NULL, so pay attention to the NPE problem when using sum().

Query the data size, index size and total size of each table in a library

 SELECT
CONCAT(a.table_schema,'.',a.table_name) as '表名',
CONCAT(ROUND(table_rows/1000,4),'KB') AS '行大小',
CONCAT(ROUND(data_length/(1024*1024),4),',') AS '数据大小',
CONCAT(ROUND(index_length/(1024*1024),4),'M') AS '索引大小',
CONCAT(ROUND((data_length+index_length)/(1024*1024*1024),4),'G') AS'总大小'
FROM
information_schema.TABLES a
WHERE
a.table_schema = 'test'
ORDER BY index_length desc

insert image description here

Like, collect and follow
Mountain flowers to sea trees, red sun to sky.

Guess you like

Origin blog.csdn.net/qq_35764295/article/details/127670548