Simple analysis of a MySQL slow log monitoring false positive problem

This article mainly introduces the analysis and solution of a misreport of MySQL slow log monitoring to help everyone better understand and use MySQL. Interested friends can learn about it.

Previously, due to various reasons, some alarms were not paid attention to. Some potential human causes were immediately ruled out during the recent holiday. It was found that the slow log alarms of the database were a bit strange. The main manifestation was that the slow log alarms were not true. After receiving the instant communication reminder of the alarm, After a while, I went to the database to check and found that the performance of the slow log does not seem to be that bad (a threshold I set is 60).

I checked the logic at the code level several times and found no obvious problems. After a few times, the problem remained the same. This inspired the idea of ​​correction and decided to take a closer look at the cause. The backend uses an ORM-based model, and the data is stored in the table corresponding to the model MySQL_slowlog_sql_history.

The code level is similar to the following logic:

MySQL_slowlog_sql_history.objects.filter(create_time__gt='2020-01-29 11:00:00',Query_time_pct_95__gt=60)

The incoming time is dynamic, and then the threshold is 60 seconds. As expected, if the alarm comes out, there must be a problem. For further verification, I changed the threshold time to 600, but it still reported an error, and the slow query of 7 to 8 seconds would still be reported. I used debug to get the SQL parsed by ORM:

SELECT...`mysql_slowlog_sql_history`.`create_time`, `mysql_slowlog_sql_history`.`memo` 
FROM `mysql_slowlog_sql_history` 
WHERE (`mysql_slowlog_sql_history`.`create_time` > '2020-01-29 11:00:00' AND `mysql_slowlog_sql_history`.`Query_time_pct_95` > '600') LIMIT 21; 
args=(u'2020-01-29 11:00:00', u'600')

It's okay to look at SQL. I implemented it on the client side, and it was really good, only filtering out the results of more than 600 seconds.

select ip_addr,db_port from mysql_slowlog_sql_history 
where create_time>'2020-01-29 00:00:00' and Query_time_pct_95 > 600;

I began to reflect on this result. What is the reason? I looked at the field definitions of the model and started to realize it, and then I quickly verified it. For the convenience of explanation, I created a test table test_dummy.

create table test_dummy(id int primary key auto_increment,Query_time_pct_95 varchar(100));

Initialize several pieces of data.

insert into test_dummy(Query_time_pct_95 ) values('8.83736'),('7.70056'),('5.09871'),('4.32582');
+----+-------------------+
| id | Query_time_pct_95 |
+----+-------------------+
| 1 | 8.83736      |
| 4 | 7.70056      |
| 7 | 5.09871      |
| 10 | 4.32582      |
+----+-------------------+
4 rows in set (0.00 sec)

Then use the following two sentences to conduct a comparative test.

mysql> select *from test_dummy where Query_time_pct_95>600;
Empty set (0.00 sec)
mysql> select *from test_dummy where Query_time_pct_95>'600';
+----+-------------------+
| id | Query_time_pct_95 |
+----+-------------------+
| 1 | 8.837364     |
| 2 | 7.700558     |
+----+-------------------+
2 rows in set (0.00 sec)

It can be seen that when an integer value is used, no result is returned, and when a character type is used, the matching result is filtered according to the leftmost matching pattern, which means that the floating-point number at the database level The treatment is still very different. So the quick fix for this problem is to modify the data table type to float at the database level, and the impact of this piece of precision loss is negligible. Verify again, the problem does not reappear.

The above is the detailed analysis and solution of a MySQL slow log monitoring false positive problem.

Guess you like

Origin blog.csdn.net/yaxuan88521/article/details/113784984