mysql create custom pseudo-hash index

innodb storage engine to use hash indexes reasons: (innodb not use hash indexes, only the btree index)

-- 创建资源表
CREATE TABLE `my_resource`  (
  `id` int(32) NOT NULL AUTO_INCREMENT COMMENT '主键',
  `resource_name` varchar(255) CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL COMMENT '资源名称',
  `type` int(2) NOT NULL DEFAULT 0 COMMENT '资源类型 0其他 1视频 2文件',
  `url` varchar(255) CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL COMMENT '资源地址',
  PRIMARY KEY (`id`) USING BTREE
) ENGINE = InnoDB AUTO_INCREMENT = 1 CHARACTER SET = utf8 COLLATE = utf8_general_ci ROW_FORMAT = Dynamic;


-- 在url字段上创建索引,用于检索
create index idx_url on my_resource (url asc);


-- 新增数据
insert into my_resource (resource_name,type,url) values('庆余年_1',1,'https://v.qq.com/x/cover/rjae621myqca41h/Cz4BRyhUY3.html');
insert into my_resource (resource_name,type,url) values('庆余年_2',1,'https://v.qq.com/x/cover/rjae621myqca41h/tlTt4CiBc1.html');
insert into my_resource (resource_name,type,url) values('庆余年_3',1,'https://v.qq.com/x/cover/rjae621myqca41h/QdTUsz40Cb.html');
insert into my_resource (resource_name,type,url) values('庆余年_4',1,'https://v.qq.com/x/cover/rjae621myqca41h/dFdlXmOd9n.html');
insert into my_resource (resource_name,type,url) values('庆余年_5',1,'https://v.qq.com/x/cover/rjae621myqca41h/TsvFR6b8wg.html');
insert into my_resource (resource_name,type,url) values('庆余年_6',1,'https://v.qq.com/x/cover/rjae621myqca41h/YgWeaPrYpc.html');


-- 查询执行计划
explain select url from my_resource order by url ;

As can be seen from the execution plan, key used in the index idx_url, but the index length is 767, while optimizing the index, the index length is as short as possible,

Reason: index length directly affects the size of the index file, deletions affect the rate of change, and indirectly affect query speed (memory for multiple).
         For values in columns, from left to right cutting out section to be indexed
         1: the shorter truncated , the higher the degree of repetition, the smaller the degree of differentiation, the worse the effect index
         2: the cut is longer, low repetition, the higher the degree of differentiation index, the better the effect, but the greater impact - change deletions slow, and inter-query speed impact.
Therefore, to both + length discrimination, achieve a balance.

 

Solution: intercepting a different length, and tested for discrimination

-- 截取url字段长度,从1开始截取,计算字符前缀没有重复的字符占全部数据的比例
select count(distinct right(url,15))/count(*) from my_resource; 

The percentage of 1 is the most distinguishing best

Custom pseudo hash index :( core)

premise

On the type of data to consider, naturally think, it is not to compare the record field from the comparison string, into comparative figures?
This is optimized direction. In the computer so that the underlying data is 01010, just need to be converted into digital equivalent of 0101 can do compare, but to become the characters, the characters need to find the number that corresponds to the character code table to go, in terms of the numbers 0101, here Find out multi-step operation. On the other hand character occupies space much larger than the number, a page will accommodate the entry of item numbers to less than this will result in more data page reads.

According to this direction, try to use a custom index HASH, HASH common functions are MD5, crc32 , sha1 and so on, only the numeric value after the crc32 hash.

Pseudo-hash index creation:

1, recorded in Table Riga field value after the hash, and the field index plus (InnoDB the btree index).

-- 添加 url的伪hash索引值存储字段  bigint类型
ALTER TABLE my_resource ADD COLUMN `url_crc32` bigint(10) NOT NULL COMMENT 'url的伪hash索引值存储字段' AFTER `url`;
-- 创建该字段索引
create index idx_url_crc32 on my_resource (url_crc32 asc);
--删除 url的索引
drop index idx_url on my_resource;

2, create a trigger

delimiter $$
CREATE TRIGGER my_resource_url_crc32_trigger BEFORE INSERT ON my_resource FOR EACH ROW 
BEGIN 
SET NEW.url_crc32 = crc32(NEW.url);-- url通过crc32计算后将结果赋值给url_crc
END; $$
delimiter ;

3, the new data, view results

4, view the execution plan

explain select url_crc32 from my_resource order by url_crc32;

key_len length is 8, so after knock

4, hash index shortcomings

1) hash comparison processing range not only equivalence comparison process.

2) hash can not do the sort, hash out the results are randomly distributed. (In this case just to see the effect, do not do sort)

3) hash index portion is not supported, such as index a (10) is not supported.

4) hash index can not be covered

5) hash collision, a collision was more powerful, the cost of processing the collision is relatively high.

 

Published 92 original articles · won praise 3 · Views 5126

Guess you like

Origin blog.csdn.net/qq_22049773/article/details/103881285