tiflash引擎的使用小总结

作者: sustyle 原文来源: https://tidb.net/blog/50baeeb8

1 背景

我们公司有一种特殊的数据分析场景,大数据部门会定期抽取业务数据进行分析然后T+1更新报表。这个抽数任务经常会影响业务的正常访问。经过调研后发现tiflash可以解决这种场景带来的问题。

但是,近期我们公司在使用tiflash引擎的过程中遇到一些与预期不符合的问题,本文就来整理总结一下。

tidb版本4.0.13

2、初体验

根据官方文档的介绍,tiflash在统计信息count(*)这种场景有很大的优势,所以我们就测试了这种使用场景,结果和描述的一致。

(1)准备测试数据

mysql> show create table tb_monitor_disk_info\G
*************************** 1. row ***************************
       Table: tb_monitor_disk_info
Create Table: CREATE TABLE `tb_monitor_disk_info` (
  `id` bigint(20) NOT NULL AUTO_INCREMENT,
  `rshost` varchar(20) NOT NULL DEFAULT '' COMMENT '主机地址',
  `part` varchar(50) NOT NULL DEFAULT '' COMMENT '分区信息',
  `disk_info` json DEFAULT NULL COMMENT 'disk信息 json串, 单位是GB',
  `a_time` datetime NOT NULL DEFAULT '2022-01-01 00:00:00',
  PRIMARY KEY (`id`),
  KEY `idx_rshost` (`rshost`),
  KEY `idx_a_time` (`a_time`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin AUTO_INCREMENT=307494024
1 row in set (0.00 sec)

mysql> select count(*) from tb_monitor_disk_info;
+----------+
| count(*) |
+----------+
| 52916325 |
+----------+
1 row in set (0.07 sec)

mysql> select * from tb_monitor_disk_info limit 10;
+-----------+----------------+-----------+---------------------------------------------------+---------------------+
| id        | rshost         | part      | disk_info                                         | a_time              |
+-----------+----------------+-----------+---------------------------------------------------+---------------------+
| 260383923 | 192.168.64.160 | /dev/sdb1 | {"free": "3900", "total": "5361", "used": "1461"} | 2023-02-01 03:39:28 |
| 260383924 | 192.168.64.77  | /dev/sda1 | {"free": "2617", "total": "2748", "used": "19"}   | 2023-02-01 03:39:28 |
| 260383925 | 192.168.66.224 | /dev/sdb1 | {"free": "1147", "total": "3574", "used": "2426"} | 2023-02-01 03:39:27 |
| 260383926 | 192.168.64.158 | /dev/sdb1 | {"free": "2746", "total": "5361", "used": "2614"} | 2023-02-01 03:39:28 |
| 260383927 | 192.168.65.33  | /dev/sdb1 | {"free": "2328", "total": "5361", "used": "3033"} | 2023-02-01 03:39:28 |
| 260383928 | 192.168.65.93  | /dev/dfa1 | {"free": "2947", "total": "2978", "used": "31"}   | 2023-02-01 03:39:28 |
| 260383929 | 192.168.64.68  | /dev/sda1 | {"free": "2607", "total": "2748", "used": "28"}   | 2023-02-01 03:39:28 |
| 260383930 | 192.168.65.83  | /dev/sdb1 | {"free": "3346", "total": "3574", "used": "227"}  | 2023-02-01 03:39:27 |
| 260383931 | 192.168.64.245 | /dev/sdb1 | {"free": "3819", "total": "5321", "used": "1233"} | 2023-02-01 03:39:29 |
| 260383932 | 192.168.64.249 | /dev/sdb1 | {"free": "3972", "total": "5321", "used": "1080"} | 2023-02-01 03:39:29 |
+-----------+----------------+-----------+---------------------------------------------------+---------------------+
10 rows in set (0.02 sec)

mysql> select * from information_schema.tiflash_replica;
+--------------+----------------------+----------+---------------+-----------------+-----------+----------+
| TABLE_SCHEMA | TABLE_NAME           | TABLE_ID | REPLICA_COUNT | LOCATION_LABELS | AVAILABLE | PROGRESS |
+--------------+----------------------+----------+---------------+-----------------+-----------+----------+
| dbzz_monitor | tb_monitor_disk_info |      455 |             2 |                 |         1 |        1 |
+--------------+----------------------+----------+---------------+-----------------+-----------+----------+
1 row in set (0.01 sec)

mysql> 

所有演示都是采用我们线上的监控测试数据,总量大概5000w左右,并且已经部署好tiflash引擎。

(2)测试tiflash

  • 使用tikv做统计查询
mysql> set SESSION tidb_isolation_read_engines = "tikv";
Query OK, 0 rows affected (0.00 sec)

mysql> desc select count(*) from tb_monitor_disk_info;
+----------------------------+-------------+-----------+------------------------------------------------------+----------------------------------+
| id                         | estRows     | task      | access object                                        | operator info                    |
+----------------------------+-------------+-----------+------------------------------------------------------+----------------------------------+
| StreamAgg_24               | 1.00        | root      |                                                      | funcs:count(Column#14)->Column#6 |
| └─IndexReader_25           | 1.00        | root      |                                                      | index:StreamAgg_8                |
|   └─StreamAgg_8            | 1.00        | cop[tikv] |                                                      | funcs:count(1)->Column#14        |
|     └─IndexFullScan_22     | 53220210.00 | cop[tikv] | table:tb_monitor_disk_info, index:idx_rshost(rshost) | keep order:false                 |
+----------------------------+-------------+-----------+------------------------------------------------------+----------------------------------+
4 rows in set (0.00 sec)

mysql> select count(*) from tb_monitor_disk_info;
+----------+
| count(*) |
+----------+
| 52916325 |
+----------+
1 row in set (2.12 sec)

mysql> select count(*) from tb_monitor_disk_info;
+----------+
| count(*) |
+----------+
| 52916325 |
+----------+
1 row in set (2.32 sec)

mysql> select count(*) from tb_monitor_disk_info;
+----------+
| count(*) |
+----------+
| 52916325 |
+----------+
1 row in set (2.10 sec)

mysql> select count(*) from tb_monitor_disk_info;
+----------+
| count(*) |
+----------+
| 52916325 |
+----------+
1 row in set (2.14 sec)

mysql> select count(*) from tb_monitor_disk_info;
+----------+
| count(*) |
+----------+
| 52916325 |
+----------+
1 row in set (2.07 sec)

mysql> 
  • 使用tiflash做统计查询
mysql> set SESSION tidb_isolation_read_engines = "tiflash";
Query OK, 0 rows affected (0.00 sec)

mysql> desc select count(*) from tb_monitor_disk_info;
+----------------------------+-------------+-------------------+----------------------------+---------------------------------+
| id                         | estRows     | task              | access object              | operator info                   |
+----------------------------+-------------+-------------------+----------------------------+---------------------------------+
| StreamAgg_16               | 1.00        | root              |                            | funcs:count(Column#8)->Column#6 |
| └─TableReader_17           | 1.00        | root              |                            | data:StreamAgg_8                |
|   └─StreamAgg_8            | 1.00        | batchCop[tiflash] |                            | funcs:count(1)->Column#8        |
|     └─TableFullScan_15     | 53220210.00 | batchCop[tiflash] | table:tb_monitor_disk_info | keep order:false                |
+----------------------------+-------------+-------------------+----------------------------+---------------------------------+
4 rows in set (0.00 sec)

mysql> select count(*) from tb_monitor_disk_info;
+----------+
| count(*) |
+----------+
| 52916325 |
+----------+
1 row in set (0.07 sec)

mysql> select count(*) from tb_monitor_disk_info;
+----------+
| count(*) |
+----------+
| 52916325 |
+----------+
1 row in set (0.05 sec)

mysql> select count(*) from tb_monitor_disk_info;
+----------+
| count(*) |
+----------+
| 52916325 |
+----------+
1 row in set (0.06 sec)

mysql> select count(*) from tb_monitor_disk_info;
+----------+
| count(*) |
+----------+
| 52916325 |
+----------+
1 row in set (0.05 sec)

mysql> select count(*) from tb_monitor_disk_info;
+----------+
| count(*) |
+----------+
| 52916325 |
+----------+
1 row in set (0.06 sec)

mysql> 

可以看到在使用tikv引擎做全表统计行数的时候耗时大概2秒多,使用tiflash引擎做全表行数统计大概只需要100 ms以内,提升还是很大的。

3、不满足预期

在使用tiflash组件前,我们只是验证了一下统计查询对性能的提升,并没有对起针对性的测试调研,比如模拟业务的查询场景。

这是一个反面例子,强烈建议上线前要充分测试验证,有条件的还要模拟线上业务场景做充分测试。

以至于,我们抽数任务在使用tiflash引擎后,查询SQL并没有走tiflash,经过测试发现,使用索引的查询条件就没法使用tiflash。所以从使用体验来看,tiflash并没有那么舒服,也不像网上吹虚的那样,具体测试如下:

mysql> desc select count(*) from tb_monitor_disk_info where a_time >= '2023-02-10' and a_time < '2023-02-11';
+-----------------------------+------------+-----------+------------------------------------------------------+-------------------------------------------------------------------+
| id                          | estRows    | task      | access object                                        | operator info                                                     |
+-----------------------------+------------+-----------+------------------------------------------------------+-------------------------------------------------------------------+
| StreamAgg_24                | 1.00       | root      |                                                      | funcs:count(Column#8)->Column#6                                   |
| └─IndexReader_25            | 1.00       | root      |                                                      | index:StreamAgg_9                                                 |
|   └─StreamAgg_9             | 1.00       | cop[tikv] |                                                      | funcs:count(1)->Column#8                                          |
|     └─IndexRangeScan_23     | 1513586.15 | cop[tikv] | table:tb_monitor_disk_info, index:idx_a_time(a_time) | range:[2023-02-10 00:00:00,2023-02-11 00:00:00), keep order:false |
+-----------------------------+------------+-----------+------------------------------------------------------+-------------------------------------------------------------------+
4 rows in set (0.00 sec)

mysql> select count(*) from tb_monitor_disk_info where a_time >= '2023-02-10' and a_time < '2023-02-11';
+----------+
| count(*) |
+----------+
|  1775971 |
+----------+
1 row in set (0.25 sec)

mysql> desc select * from tb_monitor_disk_info where a_time >= '2023-02-10' and a_time < '2023-02-11';
+--------------------------------+------------+-----------+------------------------------------------------------+-------------------------------------------------------------------+
| id                             | estRows    | task      | access object                                        | operator info                                                     |
+--------------------------------+------------+-----------+------------------------------------------------------+-------------------------------------------------------------------+
| IndexLookUp_13                 | 1513586.15 | root      |                                                      |                                                                   |
| ├─IndexRangeScan_11(Build)     | 1513586.15 | cop[tikv] | table:tb_monitor_disk_info, index:idx_a_time(a_time) | range:[2023-02-10 00:00:00,2023-02-11 00:00:00), keep order:false |
| └─TableRowIDScan_12(Probe)     | 1513586.15 | cop[tikv] | table:tb_monitor_disk_info                           | keep order:false                                                  |
+--------------------------------+------------+-----------+------------------------------------------------------+-------------------------------------------------------------------+
3 rows in set (0.00 sec)

mysql> desc select * from tb_monitor_disk_info ignore index(idx_a_time) where a_time >= '2023-02-10' and a_time < '2023-02-11';
+-------------------------+-------------+--------------+----------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------+
| id                      | estRows     | task         | access object              | operator info                                                                                                                                      |
+-------------------------+-------------+--------------+----------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------+
| TableReader_10          | 1513586.15  | root         |                            | data:Selection_9                                                                                                                                   |
| └─Selection_9           | 1513586.15  | cop[tiflash] |                            | ge(dbzz_monitor.tb_monitor_disk_info.a_time, 2023-02-10 00:00:00.000000), lt(dbzz_monitor.tb_monitor_disk_info.a_time, 2023-02-11 00:00:00.000000) |
|   └─TableFullScan_8     | 53220210.00 | cop[tiflash] | table:tb_monitor_disk_info | keep order:false                                                                                                                                   |
+-------------------------+-------------+--------------+----------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------+
3 rows in set (0.00 sec)

mysql> 

针对上述现象,只能使用ignore index语法,使其不使用索引才会走tiflash。

也可以使用 set SESSION tidb_isolation_read_engines = "tiflash";

4、总结

抽数是我们公司一个重要的业务场景,具有一定的特殊性,基本都是通过时间字段取遍历增量数据或者全量数据,十分消耗资源,所以基于这个背景我们上线了tiflash组件提供这类业务场景进行抽数,但是实际使用中发现并没有得到预期的效果:

(1)我们的抽数SQL非统计类且因为使用了索引,可能tidb识别为代价不高的查询,所以还是下发到tikv,导致我们的业务正常访问请求收到影响。

(2)为了解决上述(1)的问题,有如下两个方案。

  • 修改业务逻辑。

这个方案需要去修改业务逻辑,将抽数语句改成ignore index,或者在连接tidb的时候设置tidb_isolation_read_engines,总的来说对业务来说也不太友好,所以需要看实际情况,业务能不能接受。

  • 修改tidb的配置

另一种方案就是新加一个tidb节点,将新tidb节点的引擎强制使用tiflash。

[isolation-read]
engines = ["tiflash"]

但是需要注意,没有启用tiflash的表是没法查数的,所以这个新加的节点不能暴露给业务使用,否则可能会导致查询报错。

综上所述,每个环境的业务场景都比较复杂,存在不同情况的查询慢问题,tiflash的使用场景也存在一定的限制,所以想通过tiflash解决线上查询性能问题一定要先测试。

5、写在最后

本文对tiflash引擎在线上环境做了简单测试总结,各公司的业务场景也不一样,需求也不同,还可能碰上其他未知的问题,本文所有内容仅供参考。

猜你喜欢

转载自blog.csdn.net/TiDBer/article/details/131259598