Distributed storage system design (3)-storage structure (transfer)

In the NoSQL storage system, the Key-Value data type is generally adopted. The Key-Value structure is simple and easy to store, which is very suitable for distributed NoSQL storage systems. However, simple data types have certain limitations on the data stored in the business, such as the need to store list-type data. In response to this problem, the system has made some extensions to the Key-Value type data, and supports storing multiple fields and lists under one Key, which expands the business scenario of data storage. This article mainly introduces the data types supported by this distributed storage system and the implementation of data storage in memory.

type of data

  1. Key-Value

Key-Value is the simplest data type. Key and Value do not support structured data. When reading and writing services, serialization and deserialization are required. The system does not understand the data structure and treats it as a binary sequence. Versatility, so that Key and Value support any type of data, as long as the data length after serialization does not exceed the limit.

  1. Key-Fields

Key-Fields supports multiple fields, and uses integer tags for field identification. The length of the field corresponding to each tag can be different. Fields are also serialized data and are more common. For scenarios where multiple fields need to be stored in the data of a Key, Key-Fields are much more friendly than Key-Value.

  1. Key-Rows

Key-Rows supports multiple rows, each row is a Key-Fields, supports multiple fields, the number and length of the fields in each row can be different, very flexible. Key-Rows can be used to store lists to support more complex business scenarios, such as storing a user's shopping order information.

Multi-level hash table

Because the data is stored in full memory, in order to ensure that the process does not lose data when restarting, the method of using shared memory also facilitates access to data by multiple processes. For Key-Value data, the most efficient data structure is naturally a hash table. Here we introduce an implementation of a multi-order hash table, which is simple, efficient, and robust, and is very suitable for engineering applications.

The multi-order hash table adopts the idea of ​​re-hashing to resolve conflicts. As shown in the figure below, a linear array is divided into N steps, and the bucket size (number of elements) of each step is Ni (1≤i≤N). When inserting data, Hash (Key)% Ni is calculated in each The position of the order, starting from the first order, if there is a conflict, then continue to search in the next order, until the vacancy is found, if there are conflicts in the N order, the insertion fails, and the search process needs to be similar. It should be noted that when inserting a new data, you need to search through all stages first, which will have a certain impact on performance. For example, data 1 is inserted in stage 3 for the first time, and then in the search path. Level 2 data is deleted. When data 1 is inserted again, a vacancy will be found in level 2. If this position is inserted, level 3 data 2 will remain and cannot be cleaned, resulting in wasted space. For NoSQL data storage systems, the proportion of newly added data is relatively small, and most operations are data update and read, so multi-level hash tables can maintain relatively good performance in this scenario.

There are two main indicators to measure the quality of a multi-level hash table, space utilization and average search times. Through experiments, we can see that the more orders, the stronger the ability to deal with conflicts, and the higher the space utilization rate. However, the more the average search times, it can be seen that the space utilization and the average search times are mutually restrictive, which needs to be weighed according to the actual situation.

For each order of bucket size, different prime numbers need to be selected. When the bucket sizes of each order can be proved mathematically, the distribution of data at each order will be more uniform, thereby reducing the probability of collision. The simplest method is to select successive prime numbers for each order. Is there a better choice? You can get a higher space utilization rate and fewer average search times at the same order. When the size of each bucket is similar, the lower order will distribute more data, so when the size of the bucket decreases from the lower order to the higher order, the space utilization will be improved; under the condition that the order is constant, in order to reduce The average number of lookups should make the lower-order bucket size as large as possible. At the same time, in order to maintain the high-order ability to deal with conflicts, the bucket size should not be too small. To sum up, the size of the bucket decreases from low to high order in order, the decrease in the lower order is more severe, the decrease in the higher order is gentler, and the proportional sequence is more consistent with this feature. The comparison test results of the bucket size using the equal number series with a coefficient of 1.5 and the average distribution of the bucket size are as follows. The random number is used for insertion. The space utilization rate when the insertion fails for the first time is used as the final result. It can be seen that the same order Next, when the bucket size is proportional, there will be higher space utilization and less average search times. The order of 20 is a better choice. At this time, the space utilization of the multi-order hash table reaches 93% , The average search times is about 3 times.

In the types of data listed above, the key and data are variable length, and the nodes of the hash table are fixed length. If the data is directly stored in the hash table node, it is a waste of space. In order to improve the utilization of memory, a data interface that combines a hash table and a linked list is used here. As shown in the following figure, the hash table node stores the data index, and the key and data are stored in a fixed-length block linked list. In this way, the number of fixed-length blocks can be allocated according to the data size, which is efficient and does not waste space. The size of the fixed-length block also needs to be carefully weighed. If it is too large, the void will be larger, and if it is too small, the number of fixed-length blocks occupied by each data will be larger, which will have a certain impact on performance.

Read-write conflict

For a storage space, single-process read and write is the simplest design, there is no problem of read-write conflicts, but single-process processing capacity will become a bottleneck. In order to improve processing power, it is necessary to use multi-process or multi-thread concurrency, which introduces the problem of read and write conflicts.

If multiple write processes operate on a storage space concurrently, a mutual exclusion lock mechanism is required. Because the storage structure uses a linked list, a large-grained lock is required to ensure that no wrong chains occur, and the lock mechanism is also complicated to implement. In the previous article in this series, a better solution was proposed, which divided the storage space of a single machine into multiple storage units, and each storage unit was processed by a write process, so that concurrent exclusion was also guaranteed. The performance of this lock-free implementation is also better.

Solve the problem of mutual exclusion of the writing process, there will still be conflicts between the reading and writing process, if a certain data is in the process of writing, and the reading process is reading this data, it may be read incomplete data. This problem can be solved by verification. A verification value is recorded in the data, and the verification value is updated every time the data is written. When reading the data, the data is verified. If the verification fails, a read-write conflict occurs. , You need to retry. In order to reduce the probability of read and write conflicts, when updating data, according to the process shown in the figure below, instead of overwriting on the original fixed-length block chain, a new fixed-length block chain is assigned to write data.

Most businesses that use distributed storage are scenarios that write less and read more. Under the above read-write conflict handling mechanism, a storage unit can support one write process and multiple read processes at the same time, making read operations higher. The concurrency ability is very suitable for writing less and reading more.

Guess you like

Origin www.cnblogs.com/jerryliuxin/p/12698444.html