Distributed data storage and consistency of hash hash

Data distribution design principles

Uniform data: data stored in different storage nodes to be balanced as much as possible, the user must aim for balanced access

Data Stability: When a storage node fails or needs to be removed amplification, the resulting data obtained in accordance with the distribution rules should be as stable as possible, do not appear large-scale data migration

Heterogeneous nodes: the hardware configuration different storage nodes may be large differences

Isolating the fault domain: and ensure reliability of the data available

Performance Stability: data storage and query efficiency should be guaranteed

Data distribution method

Hash

The core idea: determining a hash function, to give the corresponding storage node is then calculated for the node of the same type and number of nodes relatively fixed scenes

Consistent hashing

The core idea: the storage node and the hash data are mapped to a ring connected end to end, the storage node may hash the data by looking generally clockwise manner from the IP address, to determine the storage node they belong to. Stability problems hashing good solution when nodes join or quit, only affects subsequent node adjacent to the node clockwise hash ring. Applicable to the same type of node, the node will be the size of a scene change occurs

Consistent hashing with limited load

The core idea: for each storage node is provided to control the upper limit value stored in a storage node to add or remove data caused by non-uniform

Consistent hashing with virtual node

The core idea: The performance of each node, the number of virtual divided into different for each node, and map the virtual node to the hash ring, and then mapping and storing the data according to the consistent hashing algorithm

Compared

The difference between the data points and slice data partitions

Data piece is divided from the data dimension, the data set into a plurality of subsets of data in a certain manner, the different subsets of data stored on a different data block;

Data partition is divided dimensional data from a memory block belonging to different nodes on different physical partition

Guess you like

Origin www.cnblogs.com/battlescars/p/hash.html