Recently, I am optimizing a statistical interface. Under the statistics of hundreds of thousands of data, the response time of interface processing has reached 20s. After looking at the code logic, I found that there are three main statistical methods, after optimizing the statistical logic of one of the methods. The response time of the interface drops to within 3s. Still did not achieve the expected response time (within 1s). After looking at the SQL statements of the other two methods, one of the query times reached more than two seconds, as follows:
SELECT
FLOW_TO AS flowTo,
COUNT( DISTINCT RELATED_ID ) AS count
FROM
or_flow_schedule
WHERE
DATE_FORMAT( CREATE_TIME, '%Y-%m-%d' ) = DATE_FORMAT( NOW(), '%Y-%m-%d' )
GROUP BY FLOW_TO
After reading the SQL statement, the following indexes are added to the table
After explaining,
I found that the index just built was not followed. According to the principle of index and query principle (you can understand these two unclear ones), I think it may be that the index is invalid caused by DATE_FORMAT. Later, I found that this is indeed the case. Refer to mysql DATE_FORMAT causes index failure
Replace date_format with between and
SELECT
FLOW_TO AS flowTo,
COUNT( DISTINCT RELATED_ID ) AS count
FROM
or_flow_schedule
WHERE
CREATE_TIME BETWEEN '2021-11-18 00:00:00' and '2021-11-18 23:59:59'
GROUP BY FLOW_TO
At this time, the index built at the beginning is gone, the number of affected rows has dropped from the initial 25W to 641, and the query time has also dropped to 0.2s