Big Data Basic Theory (1) - Data Fragmentation and Routing

1. Overall introduction

  Today, with the surge in data volume, a single machine cannot store and process data volumes of PB or above. It needs to rely on large-scale clusters to store and process data. Therefore, the scalability of the system is an important indicator to measure the pros and cons of the system.

  At present, the mainstream big data storage and computing systems usually adopt the horizontal expansion method to support the scalability of the system. For the massive data to be stored and processed, the data needs to be segmented and distributed to the machine through data sharding. After the data is fragmented, how to find the corresponding storage location, we call it 数据路由.

  Data fragmentation and data replication are closely related. For massive data, the horizontal expansion of the system is realized through data fragmentation, and the high availability of data is guaranteed through data replication. At the same time, in order to ensure that the data is still available in the event of a system failure, it is necessary to copy and store the same data in multiple places to ensure that the data is available. Data replication can increase the efficiency of read operations. The client can select the data that is physically closer to read from multiple replicated data, which increases the concurrency of read operations and improves the efficiency of a single read.

  The figure below shows 数据分片与数据复制的关系:

insert image description here

2. Abstract model

  The following figure shows an abstraction level 数据分片与路由的通用模型:

insert image description here

  We can regard it as a two-level mapping relationship. The first-level mapping is key-partitionmapping, which maps data records to the data fragment space. The mapping relationship is many-to-one, that is, one data fragment contains multiple records of data; the second Seasonal mapping is partition-machinea mapping, which maps data fragments to physical machines, and is generally a many-to-one mapping relationship, that is, one physical machine can accommodate multiple data fragments.

  When doing data sharding, key-partitionthe big data is horizontally divided into many data shards according to the mapping relationship, and then partition-machinethe data shards are placed on the corresponding physical machines according to the mapping relationship.

  When doing data routing, you need to query the value of a certain record Get(key), key-partitionfind the corresponding data fragment according to the mapping, and then look up partition-machinethe relational table. If you find out which physical machine stores this data, you can read the keycorresponding valuevalue.

3. Hash sharding

  Hash functions are generally used for data sharding. The following three commonly used hash sharding methods are: Round Robin, virtual bucket, and consistent hashing methods. The following are detailed introductions.

3.1 Round Robin

  Round Robin, commonly known as hash modulo, is a commonly used data sharding method. If there are k physical machines, data sharding can be realized through the hash function:

H(key) = hash(key)mod k

  Number the physical machines from 0to k-1, according to the above hash function, for keya record as the primary key, H(key)the number is the number of the physical machine that stores the data, so that all the data can be allocated to ka physical machine, and the search is Also use the same hash function to find the physical machine where the data is stored.

  The advantage of this method is that it is simple to implement, but lacks flexibility. If a new physical machine is added to the distributed system, the hash function becomes:

H(key) = hash(key)mod(k+1)

  If this is the case, the mapping relationship between the previously allocated data and the physical machine for storing data will be disrupted, and it needs to be re-allocated according to the new hash function.

  The reason for the lack of flexibility is that the abstract model of data sharding and routing introduced above shows that in fact, Round Robin combines the two functions of physical machine and data sharding, which makes the mapping function and machine insoluble. couple.

3.2 Virtual bucket

  CouchbaseIt is a memory distributed NoSQLdatabase. For data fragmentation management, they proposed the implementation of virtual buckets, 运行机制as follows:

insert image description here

  A virtual bucket layer is introduced between the storage record and the physical machine. The record is first mapped to the corresponding virtual bucket through the hash function, and then the mapping relationship between the virtual bucket and the physical machine is realized through table management.

  Compared with Round Robin, the virtual bucket adjusts the original single-level mapping to a second-level mapping, which improves scalability. When a new physical machine is added, it only needs to adjust the affected individual entries in the second-level mapping table partition-machine. Expansion, more flexibility.

3.3 Consistent hashing

  Distributed hash table (DHC) is a common technology in P2P network and distributed storage. It is a distributed extension of hash table. Consider how to hash data in a distributed environment of multiple physical machines Add/delete/modify/check methods for data operations. Consistent hashing is an implementation of the technical concept of DHC. This section mainly introduces the consistent hashing algorithm proposed in the Chord system.

  The following figure represents the hash space as a schematic diagram of a consistent hash algorithm with a length of 2 5 (m=5). The expressible range of the hash space is 0~31. the circular sequence. Each machine is mapped to the hash value space through the hash function according to the IP and port number. The five nodes in the figure represent different machines, which are represented by N i, where i represents the value corresponding to the hash space, for example N14 nodes store the key-value data whose primary key falls within the range of 6~14 after hashing, and each machine node records the addresses of the predecessor nodes and successor nodes in the ring, forming a real directed ring.

hash directed ring

routing table

  After the construction is completed according to the above method, there is no central management node in the P2P environment, so how to query? The most intuitive way is to sequentially traverse all machine nodes N i for query along the directed ring, but this is an inefficient query method. In order to speed up the query, a routing table can be configured on each machine node. The routing table stores There are m pieces of routing data, and the i-th routing information represents the number of the machine node where the hash space value is 2 i away from the current node. For example, node N14 in the above figure, the corresponding routing table is as follows:

distance 1 (20) 2 (21) 4 (22) 8 (23) 16 (24)
machine node # 20 # 20 # 20 N25 _ N5 _

Consistent Hash Routing Algorithm

  With the routing table, you can use the consistent hash routing algorithm to query.

  Algorithm idea: Collaboration is completed by sending messages between different nodes. Assume that the node currently executing the operation is N c , its initial value is N i , and the successor node of N c is N s .

  步骤1: Determine whether c<j<=s, if true, end the search, N c sends a message to N s to find the value corresponding to the key value, N s returns the query result to N i (each message contains information about the source N i ).

  步骤2: When the result of step 1 is false, N c searches its corresponding routing table and finds the node N h with the largest number smaller than j , N c sends a message to N h , requesting it to search for the value corresponding to the key value on behalf of N i , N h at this time Become the current node N c , and continue to query recursively according to step 1 and step 2.

  For example: as shown in the figure below, the key value request of the key is found at node N14, , H(key)=27according to the algorithm step 1, N14 finds that 27 is not in the successor node N20, enters step 2, queries the routing table, finds the node N25 with the largest number less than 27, and sends Request to N25 to find the key value of the key. N25 enters step 1 of the algorithm and finds that 27 falls on the subsequent node N29, so it sends a request to N29 to ask N29 to query. After N29 completes the query, it returns the corresponding value to N14 to complete the query , why return to N14? Because it is mentioned above that each message contains the information of the message source N14.

insert image description here

  According to this example, it can be seen that the search process of the consistent hash routing algorithm is similar to a binary search.

New node

If you need to add a new machine node N new   to the P2P network , you need to establish a connection with any node, such as N x , query the hash value Hash(N new )=new corresponding to N new according to the routing algorithm of N x , and find The successor node N s of N new , let the predecessor node of N s be N p , in order for N new to join the P2P network, it is necessary to re-establish the p2p network architecture relationship between them.

In 非并发情况下, perform the following 2 steps:

  step1. Change N p , N new , and N s corresponding to the changed predecessor and successor node records;
  step2. Data re-sharding and distribution, and migrate the data stored in N s that should be carried by N new to N new ;

In 并发情况下order to ensure the correctness of the data, the following two steps need to be completed:

  step1. Point the successor node of N new to N s , and the predecessor node to null;
  step2. Perform periodic stability checks, which each node in the P2P network will perform periodically. Through this step, the update and data migration of the predecessor and successor nodes will be completed, but This is not specially prepared for newly joined nodes;

Stability Detection Algorithm

  Algorithm idea and process :

  step1.Assuming that N s is the successor node of N c , N c asks N s for the predecessor node N p ;

  step2.If N p is between N c and N s , N c records N p as its successor node;

  step3.Let N x be the successor node of N c , it may be N s or N p , it depends on the previous step, if the predecessor node of N x is null or N c is located between N x and its predecessor node , then N c sends a message to tell N x that it is the predecessor node of N x , and N x sets its predecessor node as N c ;

  step4. N x migrates part of the data belonging to N c (with a hash value smaller than c) to N c ;

  The following is an example for analysis. When N5 and N14 are stable in the P2P environment, the state after adding N8 is as follows:

insert image description here

  step1. N8 starts stability testing and finds that the precursor node of N14 is N5, enterstep2

  step2. Since N5 is not between N8 and N14, no action, enterstep3

  step3. N8 is located between N14 and its predecessor node N5, so N8 tells N14 that its predecessor node should be N8, and N14 points the predecessor node to N8. The state is as follows:

insert image description here

  step4. The records with component hash values ​​between 6 and 8 in N14 are migrated to the N8 node, so that the N8 node has completed a stability test.

  After the stability test of N8 is completed, the stability test of N5 node will be carried out after a period of time. N5 is informed by N14 that its predecessor node is N8 instead of N5 itself. N5 changes the successor node to N8, because the predecessor node of N8 is null, so N5 informs N8 to use it as the precursor node, and then N8 changes the precursor node to N5. After the stability test of N5, the system status will change to the following figure:

insert image description here

Node leaves the P2P network

  There are two types of nodes leaving the P2P network. Here we mainly introduce the normal departure. Before the normal departure, it will make preparations, notify the corresponding node to update its predecessor and successor nodes, and then migrate the data it holds to the successor node. Due to the departure of the node, the routing table of other machines is invalid, and it can be updated by the method introduced in "The Situation when Joining a New Node".

  Abnormal departure is generally caused by machine failure, which can be avoided by keeping copies of the same data on multiple machines.

virtual node

  Since the hash algorithm makes the position where the machine is mapped to the ring structure random, it will lead to load imbalance. Moreover, machine heterogeneity is common, and high configuration and low configuration are uneven. Consistent hashing treats all machines equally, which may lead to low configuration and high load.

  Dynamo has modified the consistency hash, introducing the concept of virtual nodes, and virtualizing a physical node into several virtual nodes, which are mapped to different positions of the ring structure of the consistency hash. On the one hand, it can balance the load as much as possible. It also takes into account the heterogeneity of the machine.

4. Range sharding

  Range sharding first sorts the primary keys of all records, divides the records in the primary key space into data shards, and each data shard stores all the data recorded in the primary key control fragment. In the specific storage system, a mapping table of data shards is maintained, and each item records the minimum primary key of the data shard and the corresponding physical machine address.

range sharding
  When adding, deleting, modifying and querying records, you can find the physical machine where the data fragmentation is located by looking up the mapping table. The management method of data fragmentation on the physical machine generally adopts LSM number, a data index structure for efficient writing.

Guess you like

Origin blog.csdn.net/initiallht/article/details/123340223