I have a MariaDB 10.4 with a hung table (about 100 million rows) for storing crawled posts. The table contains 4x columns, and one of them is lastUpadate
(datetime) and indexed.
Recently I try to select posts by lastUpdate. Most of them returns fast with index used, but some takes minutes with fewer records returned and looks like a table scan.
This is the query explain without conditions.
> explain select 1 from SourceAttr;
+------+-------------+------------+-------+---------------+---------------+---------+------+----------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+-------------+------------+-------+---------------+---------------+---------+------+----------+-------------+
| 1 | SIMPLE | SourceAttr | index | NULL | idxCreateDate | 5 | NULL | 79830491 | Using index |
+------+-------------+------------+-------+---------------+---------------+---------+------+----------+-------------+
This is the query explain and number of rows returned for the slow one. The number of rows
in the explain is almost equals to the above one.
> select 1 from SourceAttr where (lastUpdate >= '2020-01-11 11:46:37' AND lastUpdate < '2020-01-12 11:46:37');
+------+-------------+------------+-------+---------------+---------------+---------+------+----------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+-------------+------------+-------+---------------+---------------+---------+------+----------+--------------------------+
| 1 | SIMPLE | SourceAttr | index | idxLastUpdate | idxLastUpdate | 5 | NULL | 79827437 | Using where; Using index |
+------+-------------+------------+-------+---------------+---------------+---------+------+----------+--------------------------+
> select 1 from SourceAttr where (lastUpdate >= '2020-01-11 11:46:37' AND lastUpdate < '2020-01-12 11:46:37');
394454 rows in set (14 min 40.908 sec)
The is the fast one.
> explain select 1 from SourceAttr where (lastUpdate >= '2020-01-15 11:46:37' AND lastUpdate < '2020-01-16 11:46:37');
+------+-------------+------------+-------+---------------+---------------+---------+------+---------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+-------------+------------+-------+---------------+---------------+---------+------+---------+--------------------------+
| 1 | SIMPLE | SourceAttr | range | idxLastUpdate | idxLastUpdate | 5 | NULL | 3699041 | Using where; Using index |
+------+-------------+------------+-------+---------------+---------------+---------+------+---------+--------------------------+
> select 1 from SourceAttr where (lastUpdate >= '2020-01-15 11:46:37' AND lastUpdate < '2020-01-16 11:46:37');
1352552 rows in set (2.982 sec)
Any reason what might cause this ?
Thanks a lot.
When you see type: index
it's called an index scan. This is almost as bad as a table-scan.
Notice the rows: 79827437
in the EXPLAIN of the two slow queries. This means it's examining over 79 million items in the scanned index, either idxCreateDate
or idxLastUpdate
. So it's basically examining every index entry, which takes nearly as long as examining every row of the table.
Whereas the quick query says rows: 3699041
so it's estimating less than 3.7 million rows examined. More than 20x fewer.