Computer Basics RAID Technology

Overview

RAID, Redundant Array of Independent Disks, is a method of combining multiple independent hard disks (physical hard disks) in different ways to form a hard disk group (logical hard disk), thus providing higher storage than a single hard disk. Performance and data backup technology are common disk management technologies in commercial servers today. The main purpose is to improve the storage capacity, read and write speed of the disk, and enhance the availability and fault tolerance of the disk. Currently, server-level computers support the insertion of multiple disks (8 or more). By using RAID technology, concurrent reading and writing of data and data backup on multiple disks are achieved. Basic features include data redundancy and performance improvements:

  • Data redundancy: refers to storing data verification information in redundant disks. When some disk data is damaged, the data can be reconstructed from other undamaged disks.
  • Performance improvement: Increase the transmission rate. The disk array combines multiple disks into an array and uses it as a single disk to store the data in segmented form on different hard disks. When data access changes occur, the relevant data in the array Disks work together, which can significantly reduce data storage time and have better space and usage.

principle

Commonly used RAID technologies include the following:
Insert image description here

RAID0

Assume that the server has N disks:
Insert image description here
When data is written from the memory buffer to the disk, the data is divided into N parts according to the number of disks. The data is written to N disks simultaneously, making the overall data writing speed N times that of one disk. The same is true when reading, so RAID0 has extremely fast data reading and writing speeds.

Advantages: Improve IO

Disadvantages: Without data backup, there is no redundancy (error repair) capability. As long as one of the N disks is damaged, the data integrity will be destroyed, and the data on all disks will be damaged.

Calculation: Two 40G hard drives are used as RAID0. The total available capacity is the cumulative value, which is 80G. The utilization rate is 100%.

raid1

RAID1 is a solution to solve the problem of RAID0. When data is written to the disk, a copy of the data is written to two disks at the same time. In this way, damage to any disk will not cause data loss. Insert a new disk and the data can be copied. The method automatically repairs and has extremely high reliability.
Insert image description here
Advantages: Data reliability
Disadvantages: Disk utilization 50%

RAID3

Under normal circumstances, two disks will not be damaged at the same time on a server. When only one disk is damaged, if the data on the damaged disk can be restored using data from other disks, reliability and performance can be guaranteed. At the same time, disk utilization is greatly improved.
When the data is written to the disk, the data is divided into N-1 parts, written to N-1 disks concurrently, and the verification data is recorded on the Nth disk. If any disk is damaged (including the verification data disk), it can be used Data repair of other N-1 disks.
However, in scenarios where there are many data modifications, any disk modification will cause the Nth disk to rewrite the verification data. The consequence of frequent writing is that the Nth disk is more susceptible to damage than other disks and needs to be replaced frequently, so RAID3 is rarely used. used in practice.

RAID5

Compared to RAID3, the more commonly used solution is RAID5. RAID5 is very similar to RAID3, but the check data is not written to the Nth disk, but spirally written to all disks. In this way, the modification of the check data is also averaged to all disks, preventing RAID3 from frequently writing to a disk.

At least 3 hard drives are required. Universally uses 4 hard drives, one of which is used for data redundancy. If a hard drive is broken on the RAID5 server, you need to remove the bad hard drive and replace it with a new hard drive. The system will Automatic data synchronization.

Available capacity: 单块磁盘容量 * (n-1), n is the number of disks.

In terms of security performance, RAID1 has the highest performance, and RAID5 is second to RAID1.

Disadvantages: Only a single disk failure is allowed. If a disk fails, it must be dealt with as soon as possible. When a disk fails, RAID5 IO/CPU performance plummets, and the performance is so bad that it cannot be improved.

Suggestion: If you don’t have many disks and have requirements for data security and performance tips, RAID5 is a good choice. If you have many disks, consider RAID10.

RAID6

If the data requires high reliability, if two disks are damaged at the same time (or the operation and maintenance management level is relatively backward, one disk is damaged but has not been replaced, resulting in another disk being damaged), the data still needs to be repaired. This is RAID6 can be used sometimes. RAID6 is similar to RAID5, but data is only written to N-2 disks, and parity information (generated using different algorithms) is spirally written in the two disks.

At least 4 hard drives are required. RAID6 is designed to enhance data protection based on RAID5. Damage to 2 hard drives is allowed.
Available capacity:

C=(N-2)×D
C=可用容量 N=磁盘数量 D=单个磁盘容量

RAID10

Use parity check to realize stripe set mirroring, combine RAID0 and RAID1 two schemes, divide all disks into two equally, data is written to two disks at the same time, which is equivalent to RAID1, but in each disk N/2 On the block disk, RAID0 technology is used to concurrently read and write, which not only improves reliability but also improves performance. However, the disk utilization is low, and half of the disk is used to write backup data.

It requires at least 4 hard drives and is a triple-high array technology with high cost, high reliability, and high storage performance.

Disadvantages: The number of disks is slightly higher, and the disk usage is 50%.

Compared

RAID technology is implemented in hardware, such as a dedicated RAID card or motherboard that directly supports it, or it can be implemented in software. At the operating system level, multiple disks are combined into a RAID, which is logically regarded as an access directory. RAID technology is widely used in traditional relational databases and file systems, and is an important means to improve computer storage characteristics.

RAID technology only forms an array on multiple disks on a single server. Big data requires larger storage space and access speed. Applying the principles of RAID technology to distributed server clusters forms the architectural idea of ​​Hadoop distributed file system HDFS.

Comparison of various levels of RAID

RAID level Alias fault tolerance redundancy Read performance write performance Space utilization Data reliability The maximum number of bad disks that can be tolerated
RAID0 Bands none N times that of a single disk N times that of a single disk 100% very low 0
raid1 mirror have N times that of a single disk Performance of the slowest disk 50% very high N-1
RAID2 - - Less than N times that of a single disk Writing speed of a single disk * Number of verification disks Less than 100% Depends on the number of Hamming error correction code bits
RAID3 Dedicated parity striping have N-1 times of a single disk Check disk writing speed (N-1)/N 1
RAID4 - - N-1 times of a single disk Check disk writing speed (N-1)/N 1
RAID5 Distributed parity striping have N times that of a single disk Slightly weaker than N times that of a single disk (N-1)/N higher 1
RAID6 Double parity striping have N times that of a single disk Slightly weaker than N times of a single disk, worse than RAID5 (N-2)/N Higher than (RAID5) 2
RAID10 Mirror and stripe have 50%

Actual combat

Capacity calculation

If hard disks with unequal physical hard disk capacities are used for RAID, the total capacity of the created RAID array is calculated based on the smaller hard disk.

The storage mechanism of RAID5 is to store data in two blocks and store the transaction verification results of the other two hard drives in one block. After RAID5 is established, if one hard drive is damaged, the third one can be calculated from the data of the other two hard drives, so at least 3 are required. RAID5 is a rotating parity independent access array method. It is different from RAID3 and RAID4 in that there is no fixed parity disk. Instead, the parity information is evenly distributed on the hard disk to which the array belongs according to certain rules. , so on each hard disk, there is both data information and verification information. This change resolves the issue of contention for the parity disk, allowing multiple writes to occur concurrently within the same group. Therefore, RAID5 is suitable for both large data volume operations and various transaction processing. It is a fast, large-capacity and fault-tolerant disk array with reasonable distribution. When there are N array disks, the user space has a capacity of N-1 disks.

Three hard disks with a capacity of 80G are used as a RAID5 array, and their capacity is: 160G;
Two hard disks of 80G and one 40G are used as a RAID5 array, and their capacity is: 80G

reference

One article explains in detail the principles and key points of large-scale data computing and processing

Guess you like

Origin blog.csdn.net/lonelymanontheway/article/details/118527664