Principles of Three Hash Modulo Algorithms in Distributed Systems - Ordinary Hash Modulo, Hash Ring, Hash Slot

foreword

1. The Hash modulo algorithm is often used in distributed cache cluster systems. It is generally divided into three types, common hash modulo, consistent hash, and Hash slot.
2. Usage scenario: Suppose there is a user registration system now, the number of users will continue to increase, and several servers need to be stored together.

1. Ordinary Hash modulo

1. Create 4 servers (canister) first, and then take the modulus after hashing the registered user id. For example, if the user id is "matt", first perform hash calculation on "matt", assuming that the obtained value is 17, take this hash Take the modulus of the number of canisters: 17 % 4 = 1, then the "matt" user will be assigned to canister 1 for storage. Come another user, repeat the previous step, and get 23 after hashing, then after taking the modulus, assign it to canister 3, the same is true when accessing, first hash and then find the remainder, and then go to the corresponding canister to read the data.
insert image description here
2. One problem with using this hash modulus algorithm is that when all 4 canisters are full, if you want to add another canister, then the previous relationship will be completely disrupted, which means that the stored data The location cannot be found through the hash index. You can only migrate all the data and re-establish the relationship.
insert image description here

2. Hash ring

1. The Hash ring can solve the problem of dynamic expansion. First, create a user Canister at the time of initialization, map the Canister id to the Hash ring (Canister id Hash takes the remainder of 2 to the 32nd power), and then transfer the current user ID is also mapped to this Hsah ring (the remainder of 2 to the 32nd power after ID Hash). Currently there is only one Canister 1, then all users mapped on the Hash ring will find the Canister clockwise.
As shown in the figure below:
insert image description here
2. When the storage capacity of Canister 1 reaches the rated capacity, a Canister 2 is dynamically created, and the ID hash of Canister 2 is also mapped to the Hash ring. At this time, the clockwise storage rules on the Hash ring will be reset definition.
insert image description here
3. At this time, all stored IDs need to be calculated, and each ID will be stored in the adjacent storage node clockwise after calculating the Hash.
This will cause data migration. When there is only one Canister, all the adjacent points clockwise after the ID Hash are Canister 1. Now that Canister 2 is inserted, the data index 1 and data index 2 originally stored in Canister 1 will be migrated. Go to Canister 2, so that the corresponding Canister can be found by querying the ID.
When querying data, it is also clockwise to find the adjacent Caniste according to the PID hash.
insert image description here
3. When Canister 2 reaches the rated capacity, dynamically create Canister 3, map it to the Hash ring, and repeat the above steps to migrate the data. In this way, each time a new Canister is added, only a small amount of data migration will be caused. As shown in the figure below, when Canister is added, the data index 1 originally belonging to Canister 2 is migrated to the newly created Canister 3, so that the insertion and search of the Hash ring will not be confused.
insert image description here
4. A problem with the hash ring is that the hash is tilted, that is, a certain canister acquires a large clockwise value, while another one is small. Some canisters have stored a lot of data, while others have not stored much data. , in order to avoid this problem, it is necessary to establish many virtual nodes to share the space of the hash ring equally, and the data corresponding to the virtual nodes are then mapped to the real nodes to achieve the effect of equal sharing.
insert image description here

3. Hash slot

1. The principle of the hash slot is similar to that of the hash ring, except that the hash slot determines the mapping in the initial state. The calculation method of the hash slot: crc16 (ID)% 16383, and then stores data corresponding to the mapping relationship on the hash slot. When taking the modulus When the value is less than 3000, the ID is stored on canister 1, when the value is greater than 30001 and less than 60000, it is stored on canister 2, and so on.
insert image description here
2. When adding a new canister, re-adjust the mapping table, and then migrate the data to the corresponding canister. Theoretically, the hash slot has a lower probability of hash tilt than the hash ring.
insert image description here

おすすめ

転載: blog.csdn.net/matt45m/article/details/126091236