Principle raid

1、RAID 0

  RAID 0 is the n physical disks into a virtual logical disk, a physical disk that formed on each RAID 0 will form a continuous, but also physically continuously on a virtual logical disk. A disk controller (refer to the controller using the virtual disk, if a host card using the appropriate links with external disk array, then refers to the disk controller on the host) instructions issued to the virtual disk, are RAID and the controller receives the analysis process, the composition of the real physical disks each physical disk RAID0 IO request command conversion algorithm according to equation Block pair mapping relationship, after collecting or write data, to refer to the host disk controller.

 

  RAID 0 is also known as striping, which represents the highest level of all RAID storage performance. No data validation, visit the following analysis process from top to bottom RAID 0 disk.

  If at some point, the host controller instructs: 10000 length sector 128 reads the initial

  After the RAID controller receives this command, immediately calculated, the calculated sector number 10000 logical sector corresponding to the physical disk in accordance with the corresponding formula and sequentially calculates the next sequential sector number located 128 sectors on the disk physical logic. Again gives an instruction to the disk, respectively corresponding to the sectors. This is true of reading data, the disk received instructions to submit their data to the RAID controller, through a combination of the controller in the Cache, and submission to the host controller.

  Through the above process, if found that 128 sectors fall in the same Segment, then, that stripe depth capacity greater than the capacity of 128 sectors (64KB), then this can only truly IO from a read on physical disks, and performance compared to a single disk will slow down, because there is no optimization, but also adds additional computational overhead of RAID controller. Therefore, under certain conditions to improve performance, make possible a diffusion of IO to a plurality of physical disks, it is necessary to reduce the depth of the strip. The number of disks in the same conditions, namely to reduce the size of the stripe (Stripe SIZE, that is, the length of the strip), so that the data is divided IO controller, while a Segment filled first strip, the second Segment other two, and so on, this can greatly occupying multiple physical disks.

  So RAID 0 to boost performance, the band made as small as possible. But there is a contradiction there, is the band too small, resulting in reduced probability of concurrent IO, because if the strip is too small, each IO will occupy most of the physical disk, queue IO IO can only wait for the end of this after the physical disk to use, and the band is too large, it can not sufficiently increase the transmission speed. These two are a contradiction, to be used in different ways according to the needs.

2、RAID 1

  RAID 1 is called mirroring, it will be exactly the same data are written to work and the mirror disks, its disk space utilization is 50%. RAID1 when data is written, the response time will be affected, but when reading data is not affected.

RAID 1 for write IO, not only failed to improve the speed, but declined, at the same time should be written to multiple physical disk data, the slowest time of the subject, because it is synchronized. For a read IO requests RAID 1, not only complicated, but even when the order of IO, the controller may be the same as RAID 0, data is simultaneously read from two physical disk, the lifting speed.

 

  In reading, concurrent IO mode, since the N concurrent IO, IO each occupies a physical disk, which is equivalent to N times improved IOPS. Since each IO exclusively only one physical disk, so the data transfer speed with respect to a single disk does not change, so whether it is random or sequential IO, a single disk are relatively constant

3、RAID 2

  RAID2 called error correcting Hamming code disk array design is the use of redundant Hamming code for data verification. Hamming code is a check code added to a number in the original data for error detection and correction coding techniques, in which the first  2 n-  bit (1, 2, 4, 8, ...) is a check code, the other data symbol position . Thus RAID2, the data bit storage, magnetic disk storage each bit of data encoding, data is stored depending on the number of disks set width, set by the user. FIG 4 is a data width of 4 RAID2, it requires four data disks and the parity disk 3. If a 64-bit data width, the required data disk 64 and 7 parity disk. Be seen, the larger the data width RAID2, the higher the storage space utilization, but also the number of disks needed more.

  Hamming codes have their own error correction capability, thus RAID2 can occur in case of error data to correct errors, ensure data security. Its very high data transfer performance, design complexity lower than RAID3 described later, RAID4 and RAID5.

4、RAID 3

  RAID 3 is an array of parallel access using a dedicated parity disk, which uses a disk as a dedicated parity disk, as the disk remaining data disk, the data bit byte interleaved manner to the respective data disk. RAID3 disk requires at least three, with the same data in different areas on the disk for XOR parity, parity check value written to disk. RAID0 read performance and good exactly when RAID3, in parallel with reading data from a plurality of disk striping, very high performance, while also providing fault tolerance of data. When data is written to the RAID3, you must calculate the check value with all the same strip, and writes the new parity value check disk. A write operation comprising a write block, read data block with the strip, calculates the checksum value, write check value and other operating system overhead is very large, lower performance.

 

  If RAID3 in a disk fails, data will not affect the reading, can help verify data integrity and other data to reconstruct the data. If the data block to be read is located just a disk failure, the system needs to read all the data blocks in the same slice, and a check value according to reconstruct the lost data, system performance will be affected. When the failed disk is replaced, the system rebuilds the failed disk data to the new disk in the same way.

  Storage RAID3 only a parity disk, an array of high utilization, coupled with concurrent access feature, it is possible to provide a large number of high-performance read-write high bandwidth, app for sequential access large amounts of data, such as image processing, stream media services. Currently, RAID5 algorithm continue to improve, when large amounts of data can be read analog RAID3, RAID3 performance will decline and in the event of bad disk, and is often used to operating a RAID5 alternative RAID3 continuity, high bandwidth, a large number of read and write characteristics application.

Each RAID stripe 3, which length is designed to be a fast file system size, with the depth depending on the number of disks, but a minimum depth of one sector, so the size of each Segment is typically one sector or several sectors capacity

  Example Solution: RAID mechanism 3

  用一个4块数据盘和1块校验盘的RAID 3系统,Segment SIZE为两个扇区大小(1KB),条带长度为4KB

  RAID 3控制器接收到了这样一个IO:写入初始扇区10000 长度8,即总数量为8*512B=4KB

  控制器先定位LBA10000所对应的真是物理LBA,假如LBA10000恰好在第一个条带的第一个Segment的第一个扇区上,那么控制器将这个IO数据里的第1、2个512B写入这个扇区。同一时刻,第3、4个Segment中,此时恰好是4KB的数据量。也就是说这4KB的IO数据同时被分别写入了4块磁盘,没块磁盘写入了两个扇区,也就是一个Segment。它们是并行写入的,包括校验盘也是并行写入的,所以RAID 3的校验盘没有瓶颈,但是有延迟,因为增加了计算校验的开销。

  如果IO SIZE大于条带长度,如控制器收到的IO SIZE为16KB,则控制器一次所能并行写入的是4KB,这16KB就需要分4批来写入4个条带,其实这里的分4批写入不是先后写入,而是同时写入,也就是这16KB中的第1、5、9、13KB将由控制器连续写入磁盘1,第2、6、10、14连续写入磁盘2,以此类推。直到16KB数据全部写完,是并行一次写完。这样校验盘也可以一次性计算校验值并且和数据一同并行写入,而不是分批。  

 

5、RAID 4

  RAID 4并不常见,其原理是在一级磁盘控制器驱动程序的上层,也就是文件系统层将队列中的IO目标LBA进行扫描,将目标处于同一条带的IO让其并发写入。也就是将两个不同事务的IO写操作,尽量放到相同的条带上,以提升写效率。最典型的的就是NetApp公司著名的WAFL文件系统,WAFL文件系统的设计方式确保了能够最大限度地实现整条写操作。WAFL总是把可以合并写入的数据块尽量同时写到一个条带中,以消除写惩罚,增加IO并发系数。

6、RAID 5

  先介绍几个概念:

  整条写(Full-Stripee Write):需要修改奇偶校验群组中所有的条带单元,因此新的XOR校验值可以根据所有新的条带数据计算得到,不需要额外的读、写操作。因此,整条写是最有效的写类型。整条写的例子,RAID 2、RAID 3.它们每次IO总是几乎能保证占用所有盘,因此每个条带上的每个Segment都被写更新,所以控制器可以直接利用这些更新的数据计算出校验数据之后,在数据被写入数据盘的同时,将计算好的校验信息写入校验盘。

  重构写(Reconstruct Write):如果要写入的磁盘数目超过阵列磁盘数目的一半,可采取重构写方式。在重构写中,从这个条带中不需要修改的Segment中读取原来的数据,再和本条带中所有需要修改的Segment上的新数据一起计算XOR校验值,并将新的Segment数据和没有更改过的Segment数据以及新的XOR校验值一并写入。显然,重构写要牵扯更多的I/O操作,因此效率比整条写低。

  读改写(Read-Modify Write):如果要写入的磁盘数目不足阵列磁盘数目的一半,可采取读改写方式。读改写过程是:先从需要修改的Segment上读取旧的数据,再从条带上读取旧的奇偶校验值;根据旧数据、旧奇偶校验值和需要修改的Segment上的新数据计算出这个条带上的新的校验值;最后写入新的数据和新的奇偶校验值。这个过程中包括读取、修改和写入的一个循环周期,因此称为读改写。读改写计算新校验值的公式为:新数据的校验数据=(老数据 EOR 新数据)EOR 老校验数据。如果待更新的Segment已经超过了条带中总Segment数量的一半,则此时不适合用读改写,因为读改写需要读出这些Segment中的数据和校验数据。而如果采用重构写,只需要读取剩余不准备更新数据的Segment中的数据即可,而后者数量比前者要少。所以超过一半用重构写,不到一半用读改写,整条更新就用整条写。

  写效率排列顺序为:整条写>重构写>读改写。

  RAID 5解决了校验盘争用这个问题,RAID 5采用分布式校验盘的做法,将校验盘打散在RAID组中的每块磁盘上。每个条带都有一个校验Segment,但是不同条带中其位置不同,在相邻条带之间循环分布。为了保证并发IO,RAID 5同样将条带大小做得较大,以保证每次IO数据不会占满整个条带,造成队列中其他IO的等待。所以,RAID 5要保证高并发率,一旦某时刻没有成功进行并发,则这个IO几乎就是读改写模式,所以RAID 5拥有较高的写惩罚。

  分析一下RAID 5具体的作用机制,假设条带大小80KB,每个Segment大小16KB。某一时刻,上层产生一个写IO:写入初始扇区10000长度8,即写入4KB的数据。控制器收到这个IO之后,首先定位真是LBA地址,假设定位到了第1个条带的第2个Segment(位于磁盘2)的第1个扇区,则控制器首先对这个Segment所在的磁盘发起IO读请求,读取这8个扇区中原来的数据到Cache。与此同时,控制器也向这个条带的校验Segment所在的磁盘发起IO读请求,读出对应的校验扇区数据并保存到Cache。利用EOR校验电路来计算新的校验数据,公式为:新数据的校验数据=(老数据 EOR 新数据)EOR 老校验数据。现在Cache中存在:老数据、新数据、老校验数据和新校验数据。控制器立即再次向相应的磁盘同时发起IO写请求,将新数据写入数据Segment,将新校验数据写入校验Segment,并删除老数据和老校验数据。

  在上述过程中,这个IO占用的始终只有1、2两块盘,因为所要更新的数据Segment对应的校验Segment位于1盘,自始至终都没有用到其他任何磁盘。如果此时队列中有那么一个IO,它的LBA初始目标假如位于磁盘4中的数据Segment中,IO长度也不超过Segment的大小,而这个条带对应的校验Segment位于磁盘3上。这两块盘未被其他任何IO占用,所以此时控制器就可以并发的处理这个IO和上一个IO,达到并发。

  RAID 5相对于经过特别优化的RAID 4来说,在底层就实现了并发,可以脱离萎蔫系统的干预。

7、RAID 6

  RAID 6之前的任何RAID级别,最多能保障在坏掉一块盘的时候,数据仍然可以访问。如果同时坏掉两块盘,则数据将会丢失。为了增加RAID 5的保险系数,RAID 6被创立了。RAID 6比RAID 5多增加了一块校验盘,也是分布打散在每块盘上,只不过是用另一个方程式来计算新的校验数据。这样,RAID 6同时在一个条带上保存了两份数学上不相关的校验数据,这样能够保证同时坏两块盘的情况下,数据依然可以通过联立这两个数学关系等式来求出丢失的数据。RAID 6与RAID 5相比,在写的时候会同时读取或者写入额外的一份校验数据。不过由于是并行同时操作,所以不比RAID 5慢多少。其他特性则和RAID 5类似。

Guess you like

Origin www.cnblogs.com/liuldexiaoche/p/10965672.html