The principle of data distribution of Yunhong hyper-converged cluster

Hyper-convergence builds data centers by integrating computing, storage, network and virtualization resources to replace traditional SAN storage through software-defined infrastructure, and pays more attention to data management and control based on low-cost x86 servers.

Yunhong Hyper-Fusion integrates the self-developed server virtualization platform CNware® and the high-performance distributed storage system WinStore. Yunhong distributed storage is based on the open source Ceph and has done in-depth optimization and function development. In 2015, the first year of hyper-convergence, Yunhong launched Winhong HCI v1.0. Yunhong's distributed storage WinStore runs in a hypervisor in a modular fashion instead of running on a virtual machine, which can combine multiple physical machines. The above local SSD and HDD form a virtual storage pool, use multiple x86 servers to share the storage load, and use the location server to locate and store information. It not only improves the reliability, availability, and read/write efficiency of the system, but also is easy to expand and extremely convenient. .

Yunhong hyper-converged cluster data distribution, data multi-copy distribution, based on a fully distributed architecture, through the CRUSH algorithm to ensure that the data can be distributed as evenly as possible on each node, and the master copy of the data will be automatically distributed on different nodes.


To understand the data distribution of hyperconverged clusters, there are several important concepts first:

Object ——The logical object produced after the file is sliced, and its maximum size is usually limited to 2MB or 4MB, so as to realize the organization and management of the underlying storage.

PG (Placement Group, placement group) - organizes and maps the storage of objects . Specifically, a PG is responsible for organizing several objects (thousands or even more), but an object can only be mapped to one PG , that is, a "one-to-many" mapping relationship.

OSD (Object StorageDevice, Object Storage Device) - The main functions include data storage, copy data processing, data recovery, data replenishment, balanced data distribution, and provide some data-related monitoring information to at least 2 OSDs to effectively save two data. One PG will be mapped to n OSDs , and each OSD will carry a large number of PGs , that is, there is a "many-to-many" mapping relationship between PGs and OSDs . In practice, n is at least 2, and if used in a production environment, at least 3. There can be hundreds of PGs on an OSD .


The super-converged cluster data distribution algorithm adopts the CRUSH algorithm. The CRUSH algorithm is the cornerstone of cloud macro distributed storage. It is a scalable pseudo-random data distribution algorithm used to control the distribution of data, and can efficiently and stably distribute data in common structured clusters.


Features of CRUSH Algorithm

1. Decentralized architecture, no metadata server, read and write performance will not be reduced due to cluster expansion;

2. In the same environment, there is no correlation between the results obtained from similar inputs, and the results obtained from the same input are deterministic;

3. Ensure that data is distributed as evenly as possible on all hard disks of each node of the cluster;

4. When the number of storage targets changes due to the addition or deletion of nodes, the amount of data migration between clusters can be minimized.


data distribution process

The concept and the algorithm used are clarified, and the analysis of the data distribution process will be more straightforward.

In the data distribution of the hyper-converged cluster, virtualization and distributed storage are integrated and deployed , and the virtual disk of the WinServer virtual machine directly uses the rbd block device provided by the WinStore distributed storage . In the process of writing a file on the virtual machine, the file is first divided into multiple objects, and each object may be written to a different HDD (OSD) , which is evenly distributed across the entire cluster.

47.png

Specific operation process

1. WinStore first splits the file (2M) into multiple objects to generate object ids;

2. According to the pool and object id of the storage pool to which the virtual disk belongs, the ID of the PG to which it belongs is obtained through the hash algorithm and modulo ; after the PG is created in the Pool , it will calculate the OSD location where it should be located according to the CRUSH algorithm. , is created. That is to say, when the client writes the object, the PG has been created , and the mapping relationship between the PG and the OSD has been determined;

3. Pass the value of PG to the CRUSH algorithm, and the corresponding master-slave OSD is obtained by the CRUSH algorithm;

4. After the data is written to the SSD corresponding to the master-slave OSD , it will return the write-success information to the upper-layer business system;

5. Finally , WinStore will flush the data in the SSD to the persistent storage HDD disk according to certain rules.



The CRUSH algorithm is used in the mapping of PG to OSD instead of other hash algorithms. One of the reasons is that CRUSH has a configurable feature, and the physical location mapping strategy of OSD can be determined according to the configuration parameters of the administrator; on the other hand, CRUSH has Special " stability ", when a new OSD is added to the system to increase the system scale, the mapping relationship between most PGs and OSDs will not change, and only a small number of PG mapping relationships will change and cause data migration . Neither this configurability nor stability is provided by ordinary hashing algorithms. Therefore, the design of the CRUSH algorithm is also one of the core contents of WinStore . 








Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325155723&siteId=291194637