Multi-Graph Warning - From RAID to Replica Distribution in Distributed Systems

The original text was first published on the personal blog " Tobe's ravings ". Welcome everyone to visit and collect~

We know that when faced with the calculation and storage of large-scale data, there are two processing ideas:

  • Vertical expansion (scale up) : Improve the processing power of the computer by upgrading the hardware of a single machine , such as CPU, memory, disk, etc.
  • Scale out : Increase the processing power of the entire system by adding more machines to the distributed system.

When the distributed technology is not yet mature, the gradual upgrade of minicomputers, medium computers, mainframes, and supercomputers is almost the only option for large companies, but this vertical expansion has a ceiling, and the speed of hardware upgrades is much faster than With the growth of data scale, even supercomputers cannot meet people's demand for computing resources.

The horizontal expansion scheme, that is, the scheme of continuously adding machines to a system, has just entered the stage of history. This is the distributed technology of today.

In this article, I will introduce the RAID storage technology under the stand-alone system and the storage distribution technology under the distributed system respectively. These two technologies are very similar in thought, and I hope readers will experience it slowly.

RAID

RAID, the full name is Redundant Array of Inexpensive/Independent Disks , that is, redundant array of disks, there are two versions of I here, one is Inexpensive, cheap , and the other is Independent, independent . The so-called RAID is to combine multiple disks and abstract them into a large disk with large capacity , high read and write speed , and good .

I like the concept of "abstract" because it shields us from lower-level details, such as the file system in the operating system, virtual memory, etc. In my opinion, RAID is an abstraction of multiple independent disks.

Note that the three aspects in the above figure (storage capacity, read and write speed, and data reliability) are important criteria for measuring storage systems, and we will also mention them in distributed systems, but now let's take a look at the commonly used RAID technology.

RAID 0

RAID 0 is that when the data is written to the disk from the memory buffer, the data is divided into N parts according to the number of disks, and then the data is written to N disks concurrently, and different data is stored on each disk, so that the overall data write The input speed is N times that of a single disk, and the read is of course performed concurrently.

Therefore, RAID 0 has extremely fast data read and write speed. However, RAID 0 does not do data backup. As long as one of the N disks is damaged, the data integrity is destroyed, and the data of other disks cannot be used.

RAID 1

The strategy of RAID 1 is simpler. No matter how many disks you have, you can store the same data for me, so the reliability of the data is extremely high, but the writing speed is greatly affected.

Any read request can be serviced by any drive in the set. If a request is broadcast to every drive in the set, it can be serviced by the drive that accesses the data first (depending on its seek time and rotational latency), improving performance. Sustained read throughput, if the controller or software is optimized for it, approaches the sum of throughputs of every drive in the set, just as for RAID 0. Actual read throughput of most RAID 1 implementations is slower than the fastest drive. Write throughput is always slower because every drive must be updated, and the slowest drive limits the write performance. The array continues to operate as long as at least one drive is functioning.^1

This passage means that the read speed of RAID 1 depends on which hard disk can access the data to be read first. If the software is optimized, the read speed of RAID 0 can be achieved. But the slowest disk limits the write speed because the system needs to wait for the slowest disk to finish writing and do the verification work. The reliability of RAID 1 is good. As long as any disk in the array can still be used, the array can continue to work, and when the new disk replaces the old disk, the system will automatically copy the data.

RAID 10

RAID 0 has high read and write speed, but there is no data redundancy. RAID 1 does data backup, but the read and write speed is limited, so it is necessary to find a way to combine RAID 0 and RAID 1 to maximize strengths and avoid weaknesses, and RAID 10 just appeared.

RAID 10 is to divide N disks into two equally, and the two are mirror images of each other, which is equivalent to RAID 1, but for N/2 disks in each disk, the storage method is the same as RAID 0. Concurrent reads and writes. In this way, a compromise is achieved, and there is a balance between read and write speed and fault tolerance.

It is not difficult to see that the disk utilization of RAID 10 is low, and half of the disks are used for backup, which is indeed a luxury.

Generally speaking, it is rare for two disks to be damaged at the same time on the server. Often, when one disk is damaged, a new disk is replaced, and then recovery technology is used to restore the data on the damaged disk, so we can use it accordingly. Design a scheme with higher disk utilization.

RAID 3 and RAID 5

With the previous discussion, we can think that if the data on any disk can be recovered from the data on other N-1 disks, won't our problem be solved?

The verification mechanism just meets our requirements.

When writing to the disk, we divide the data into N-1 parts, write N-1 disks concurrently, and then use the remaining disk to record the checksum data, so that we can tolerate the damage of any disk.

Depending on where the verification data is written, we have two options:

  • RAID 3: All parity data is written on the same disk . In scenarios where data modification is frequent , modification of data on any disk will cause the parity disk to rewrite data . This causes parity disks to be damaged more easily than other disks, so RAID 3 is rarely used in practice. In professional terms, the load is unbalanced .
  • RAID 5: Parity data is written to all disks in a spiral . Looking at the above picture, you can tell the difference between the two schemes. RAID 5 allows each disk to undertake part of the verification work, so that the pressure of modifying the verification data is distributed to all the disks, which is what we expected. load balancing . Therefore RAID 5 is the more widely used scheme.

RAID 6

Compared with RAID 5, RAID 6 is more reliable because RAID 6 uses two checksum spiral writing schemes, which can tolerate simultaneous damage of two disks.

Under what circumstances is such fault tolerance required? On large servers, the capacity of each disk is often very large. After a disk is damaged, even if a new disk is immediately replaced, it will take a long time to restore all data . Another disk is damaged, and the data cannot be recovered, which is unacceptable, so RAID 6 is needed to ensure the integrity of the data.

Distributed storage solution

PS: This article focuses on the relationship between copies of distributed systems and data distribution, because the ideas in this part are similar to RAID, and a separate article will be written about consistent hashing and other issues.

The storage scale of distributed systems is much larger than that of single machines, but the basic ideas and design goals are the same:

  • Increase system throughput
  • Increase the storage capacity of the system
  • Improve system reliability with data backup

Different from the single-machine situation, distributed systems face many more problems, because data between servers is transmitted through the network, with high latency, and there may even be network interruptions, making some machines inaccessible. This has a great impact on our storage solution. For example, can we still use a parity method similar to RAID 5 for redundancy?

The answer is no, because the cost of verification is too high, and one verification requires the response of other N-1 machines, which is tens of milliseconds in the first place. The efficiency is extremely low, and the network load is too large. Instead, the RAID 10 solution seems more suitable for the current situation.

machine-based copy

In this mode, several machines are copies of each other, and the data between the copies is exactly the same, just like the RAID 1 scheme. The advantage of this method is simplicity , but the disadvantages are also obvious:

  • The efficiency of data recovery is low: if the disk of machine 3 is damaged and all data is lost, then we schedule a new machine to enter the machine group. In order for this machine to provide services as soon as possible, data needs to be copied from the other two machines. However, due to the limitation of network bandwidth, the speed of data recovery is slow.
  • Scalability is not high: each machine group has three machines, if you want to expand, you need to add three machines at a time.
  • It is not conducive to system fault tolerance: if one machine goes down, the read and write pressure will be borne by the remaining two machines, and the pressure will increase by 50%, which is likely to exceed the processing capacity of a single machine.

Therefore, using a machine as a copy unit is not suitable for the current scene, and we need to find other ways.

copy in segment

Compared with using the machine as the copy unit, it is more flexible to split the data into the data segment as the copy unit. I will use a more intuitive example to illustrate the advantages of this scheme.

In this example, all data of machine 1 is distributed on the other 7 machines, ignoring other machines in the cluster.

What benefits does this approach bring us?

  • The efficiency of data recovery is high. Suppose the data of machine 1 is lost and all the data needs to be copied again. Since the data is distributed on the remaining 7 machines, we can copy and restore the data from all the remaining machines at the same time . The copy work can also quickly copy the data. Note that the larger the cluster, the smaller the workload on each machine and the load balancing is achieved.
  • The scalability of the cluster is high. When adding a new machine, we only need to migrate 1/8 of the data segment from each machine to the new machine to achieve new load balancing.
  • The system has high fault tolerance. Assuming that machine 1 is down and temporarily unable to provide services, the pressure on the remaining 7 machines increases by 14.3%, which is acceptable.

But this solution is not without problems, because we need a server to record the correspondence between data segments and machines, and this server is called a metadata server . It is conceivable that as the scale of the cluster grows, the overhead of the metadata that needs to be managed will continue to increase, and the maintenance difficulty of the copy will increase accordingly. Therefore, a compromise solution is to combine some data segments into one data segment. Grouping, and performing copy management with the granularity of data segment grouping, in this way, the granularity of copies can be controlled within a more appropriate range.

This is the introduction to the distributed content of distributed storage copies. I hope you have gained something after reading my article. I look forward to your likes and forwarding!

If this article is helpful to you, welcome to pay attention to the ravings of my public account tobe , and take you deep into the world of computers~ There are surprises in the background of the public account to reply to the keyword [computer]~

{{o.name}}
{{m.name}}

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324138165&siteId=291194637
Recommended