TiDB SQL测试之like查询的疑问

手上最近有一个很核心的新项目，主体业务承接自老系统，老系统是基于Oracle的，虽然平时的并发量和互联网业务比起来不算高，但SQL的复杂性很高，5、6张表关联是家常便饭。在集团去'O'的背景下，新项目的数据库选项主要考虑NewSQL--TiDB。
由于新项目上线的时间较紧，指望开发同事整改SQL是不可能了，现在的担心点就是TiDB执行复杂SQL时的性能，最近在测试时发现了这样一个现象，才疏学浅，希望有大神能够解答：
MySQL [db1]> select
-> DISTINCT v.vehicle_no vehicleNo,
-> '2019-05-31 08:59:00' as expireDate,
-> b.batch_name as batchName,
-> r.is_distribute isDistribute,
-> date_format(r.date_created, 'yyyy-MM-dd') dateCreated,
-> r.tmr_id tmrId,
-> t.customer_id customerId,
-> r.task_group_id taskGroupId,
-> 'test' as codeDesc,
-> 'test' as robotName,
-> c.campaign_name campaignName,
-> v.policy_end_date policyEndDate,
-> 'test' as listRank,
-> s.special_dial_org_name specialName,
-> 'test' as secondOrg
-> from t_pub_task t,
-> t_pub_robot_communicate r,
-> t_pub_campaign c,
-> t_pub_batch b left join
-> t_aas_dialorg_custcount_source s on b.tcims_batch_id = s.batch_id,
-> t_pc_vehicle v
-> where r.task_group_id = t.task_group_id
-> and t.vehicle_id is not null
-> and t.vehicle_id = v.nets_vehicle_id
-> and t.batch_id = b.batch_id
-> and t.campaign_id = c.campaign_id
-> and t.team_id = '1000002832'
-> and v.nets_cust_id = t.customer_id
-> and r.list_type = 7
-> and c.biz_model = '1'
-> and r.tmr_id IS NULL
-> AND r.is_distribute = 'N'
-> AND (r.date_created >= date_format('20090531092304', '%Y-%m-%d 00:00:00'))
-> AND (r.date_created < DATE_ADD(date_format('20190531092304', '%Y-%m-%d 00:00:00'), interval 1 day))
-> AND r.robot_id = '22222'
-> AND (C.EXPIRED_DATE = '2011-04')
-> AND b.batch_name like '%test%'
-> AND t.org_id = '201'
-> AND exists (select e.list_rank
-> from t_pub_wx_entry_auto_call e
-> where t.TASK_GROUP_ID = e.task_group_id
-> and e.list_rank = 'A');
Empty set (3.54 sec)

MySQL [db1]> select
-> DISTINCT v.vehicle_no vehicleNo,
-> '2019-05-31 08:59:00' as expireDate,
-> b.batch_name as batchName,
-> r.is_distribute isDistribute,
-> date_format(r.date_created, 'yyyy-MM-dd') dateCreated,
-> r.tmr_id tmrId,
-> t.customer_id customerId,
-> r.task_group_id taskGroupId,
-> 'test' as codeDesc,
-> 'test' as robotName,
-> c.campaign_name campaignName,
-> v.policy_end_date policyEndDate,
-> 'test' as listRank,
-> s.special_dial_org_name specialName,
-> 'test' as secondOrg
-> from t_pub_task t,
-> t_pub_robot_communicate r,
-> t_pub_campaign c,
-> t_pub_batch b left join
-> t_aas_dialorg_custcount_source s on b.tcims_batch_id = s.batch_id,
-> t_pc_vehicle v
-> where r.task_group_id = t.task_group_id
-> and t.vehicle_id is not null
-> and t.vehicle_id = v.nets_vehicle_id
-> and t.batch_id = b.batch_id
-> and t.campaign_id = c.campaign_id
-> and t.team_id = '1000002832'
-> and v.nets_cust_id = t.customer_id
-> and r.list_type = 7
-> and c.biz_model = '1'
-> and r.tmr_id IS NULL
-> AND r.is_distribute = 'N'
-> AND (r.date_created >= date_format('20090531092304', '%Y-%m-%d 00:00:00'))
-> AND (r.date_created < DATE_ADD(date_format('20190531092304', '%Y-%m-%d 00:00:00'), interval 1 day))
-> AND r.robot_id = '22222'
-> AND (C.EXPIRED_DATE = '2011-04')
-> AND b.batch_name like '%t%'
-> AND t.org_id = '201'
-> AND exists (select e.list_rank
-> from t_pub_wx_entry_auto_call e
-> where t.TASK_GROUP_ID = e.task_group_id
-> and e.list_rank = 'A');
Empty set (0.67 sec)

上述两条SQL是绑定变量为不同值的同一条SQL
第一条为 b.batch_name like '%test%
第二条为 b.batch_name like '%t%
两条SQL执行计划一致（毕竟只有值不同），第二条的筛选条件更宽松，临时结果集更大，join时的消耗按理说应该大于第一条，也就是说第二条执行时间应该更长，可测试结果却是相反的！
后续多次测试中发现，当把like后的值改为多个字母时（如'aa'，'abc'等等），其执行时间都在3.5s左右；
当把like后的值改为单个字母或者中文时（如'a'，'业务'），其执行时间大幅降低至0.7s左右，完全和我的认知相反。
为了确定是不是like查询的问题，又做了如下测试：
MySQL [db1]> select count() from (select batch_name from t_pub_batch where batch_name like '%t%') a;
+----------+
| count() |
+----------+
| 25739 |
+----------+
1 row in set (3.05 sec)

MySQL [db1]> select count() from (select batch_name from t_pub_batch where batch_name like '%test%') a;
+----------+
| count() |
+----------+
| 2 |
+----------+
1 row in set (3.04 sec)
可见，TiDB在处理这两个值的like查询时效率没有明显差异，也就是说问题不是出在like查询这里，like ‘%t%的结果集也确实要大很多，那么问题来了，究竟是什么原因导致上述SQL执行时间违反常理的呢？先做个mark，记录下这个问题，探索答案中。。

TiDB SQL测试之like查询的疑问

猜你喜欢