NVMe elaborate infrastructure and NVMe-oF New Zealand 45 seconds color data acquisition and concepts address

NVMe transmission protocol layer is an abstraction, New Zealand 45 seconds designed to provide color data acquisition command address vip7.maltapi.com NVMe and reliable data transmission. To support the storage of the data center network to achieve NVMe standard extension on the PCIe bus via NVMe over Fabric, in order to challenge the dominance of the SCSI SAN. NVMe over Fabric support the NVMe Fabrics mapped to multiple transmission options, including FC, InfiniBand, RoCE v2, iWARP, and TCP.

However, in these Fabrics options agreements, we often think InfiniBand, RoCE v2 (routable RoCE), iWARP is the ideal Fabric, reason is that they support RDMA.

  • InfiniBand (IB): from the start of next-generation network protocol support RDMA. Since this is a new network technology, network cards and switches needed to support the technology.
  • RDMA fusion Ethernet (RoCE): RDMA protocol that allows a network via Ethernet. Which is an Ethernet network head lower head on the network header (including data) is InfiniBand head. This allows the use of RDMA over standard Ethernet infrastructure (switches). Only NIC should be special, and support RoCE.
  • Internet Wide Area RDMA Protocol (iWARP): allows the network to perform RDMA protocol by TCP. In the presence of functional IB and RoCE in, iWARP not support these features. This allows the use of RDMA over standard Ethernet infrastructure (switches). Only NIC should be special, and support iWARP (if using CPU offload), otherwise all iWARP stack can be achieved in SW, and lost most of RDMA performance advantages.

So why RDMA support in the choice of NVMe over Fabric has inherent advantages? Features and advantages of this from RDMA talking about.

RDMA is a new memory access technology, RDMA allows computers to directly access the memory of other computers without going through the time-consuming process processor. RDMA fast moving data from one system to the remote system from the memory, without any influence on the operating system. Its principle RDMA technique compared with TCP / IP architecture as shown in FIG.                      

Thus, the RDMA can be simply understood as the use of relevant hardware and internet, a server network card 1 can be directly read and write memory of the server 2, and ultimately achieve high bandwidth, low delay and resource utilization effect. As shown in the following figure need not participate in the application data transmission process, only you need to specify the memory read address, turning on the transfer and to wait for completion. The main advantage of RDMA summarized as follows:

1) Zero-Copy: Copy data does not back and forth between the various layers of the network protocol stack, which shortens the data flow paths.

2) Kernel-Bypass: Application of direct manipulation interface, not a system call through the switch to kernel mode, the kernel does not switch overhead.

3) None-CPU: data transfer without involving the CPU, completely get from the card, and no further contract the packet receiving interrupt processing, not CPU-intensive.

To sum up so many advantages that improve efficiency, reduce latency. What if other networks Fabric can meet efficiency and latency requirements NVMe over Fabric RD6MA by a similar technique, whether as Fabric NVMe overFabric it? Next, consider the difference between the NVMe-oF and NVMe.

The main difference between the NVMe-oF and NVMe is a mechanism of transfer command. NVMe by a Peripheral Component Interconnect Express (PCIe) interface protocol request and response will be mapped to the shared memory in the host. Sending requests and responses between the host and the target storage device via a network message based on the model used NVMe-oF.

NVMe-oF alternatively extend NVMe PCIe host and storage subsystem NVMe distance communication. Compared with the memory device delay NVMe local host using PCIe bus, NVMe-oF original design goal is to add no more than 10 microseconds delay between the host and NVMe NVMe storage targets connected by a suitable network structure.

Further, in both the working mechanism and the technical details are very different, NVMe-oF is an extension and improvement on the basis NVMe (NVMe over PCIe) on a specific point of difference is as follows:

  • On the basis of naming mechanism compatible NVMe over PCIe made on the extension, such as: the introduction of SUBNQN and so on.
  • Changes in terminology used Capsule, Response Capsule to represent the transport packets
  • Extends the Scatter Gather Lists (SGLs) In Capsule Data transfer support. Prior NVMe over PCIe does not support the SGL In Capsule Data transmission.
  • Discovery and increased Connect mechanism for discovery and connection topology NVM Subsystem
  • Increase the creation of mechanisms in Connection Queue mechanism, remove the NVMe over PCIe in the creation and deletion Queue command.
  • Interrupt mechanism under the PCIe architecture does not exist in the NVMe-oF. 
  • NVMe-oF CQ does not support flow control, each number OutStanding Capsule queue CQ not greater than the corresponding number of the Entry, thereby avoiding CQ is OverRun
  • NVMe-oF only supports SGL, NVMe over PCIe supports SGL / PRP

To talk about Brocade has been respected FC Fabric, FC-NVMe will NVMe command set to simplify the instructions for the basic FCP. Since the Fiber Channel storage traffic designed specifically for, so built into the system, such as discovery, and management functions such as authentication end equipment.

Fiber Channel Fabric is a transfer option for NVMe overFabrics (NVMe-oF) by the NVMExpress Inc. (a company with more than 100 members of the nonprofit technology company) specification developed. Other NVMe transmission options include Remote Direct Memory Access (RDMA) over Ethernet and InfiniBand. NVM Express Inc. on June 5, 2016 released version 1.0 NVMe-oF.

International Committee for Information Technology Standards (INCITS) T11 of the Commission defines a frame format and protocol mapping, NVMe-oF applied to the Fiber Channel. T11 committee in August 2017 completed the FC-NVMe standard of the first edition, published and submitted to INCITS.

FC Protocol (FCP) to allow the upper layer transport protocols, such as mapping the NVMe, small computer system interface (SCSI) and IBM proprietary optical fiber connection (FICON) to achieve the data between the host and the peripheral storage devices or systems a target and command transfer.

   In large-scale block-based flash storage environment are most likely to adopt NVMeover FC. Fiber Channel FC-NVMe provide NVMe-oF structure, predictability and reliability of characteristics and to provide the same SCSI addition, NVMe-oF traditional SCSI-based traffic and traffic can run simultaneously on the same FC structure.

FC-based standards define NVMe FC-NVMe protocol layer. NVMe over Fabrics NVMe-oF specification defines a protocol layer. NVMe specification defines NVMe host software protocol layer and NVM subsystem.

NVMe support requirements must be based on Fiber Channel can play the potential benefits of infrastructure components, including storage operating system (OS) and a network adapter card. FC storage system vendors have their products comply with the requirements of FC-NVMe. Currently supports FC-NVMe host bus adapter (HBA) vendors, including Broadcom and Cavium. Broadcom and Cisco are the major FC switch vendors, the current generation of Brocade's Gen 6 FC switches already support NVMe-oF protocol.

NVMe over fabric white paper expressly listed as a Fiber Channel selection NVMeover Fabrics, also described over the need to have a reliable Fabrics to Credit-based flow control and delivery mechanisms. However, the Credit based flow control mechanisms FC, PCIe primary transfer capability. In NVMe White Paper did not put RDMA as "ideal" NVMe overFabric important attribute, which means that in addition to RDMA is just a method to achieve NVMeFabric, nothing special.

FC也提供零拷贝(Zero-Copy)技术支持DMA数据传输。RDMA通过从本地服务器传递Scatter-Gather List到远程服务器有效地将本地内存与远程服务器共享,使远程服务器可以直接读取或写入本地服务器的内存。

接下来,谈谈基于RDMA技术实现NVMe over fabric的Fabric技术,RDMA技术最早出现在Infiniband网络,用于HPC高性能计算集群的互联。基于InfiniBand的NVMe倾向于吸引需要极高带宽和低延迟的高性能计算工作负载。InfiniBand网络通常用于后端存储系统内的通信,而不是主机到存储器的通信。与FC一样,InfiniBand是一个需要特殊硬件的无损网络,它具有诸如流量和拥塞控制以及服务质量(QoS)等优点。但与FC不同的是,InfiniBand缺少发现服务自动将节点添加到结构中。

最后,谈谈NVMe/TCP协议选项(暂记为NVMe over TCP),在几年前,NVMe Express组织计划支持传输控制协议(TCP)的传输选项(不同于基于TCP的iWARP)。近日NVM Express Inc.历时16个月发布了NVMe over TCP第一个版本。该Fabric标准的出现已经回答了是否满足承载NVMe协议标准的Fabric即可作为NVMe over fabric的Fabric的问题。

但是TCP 协议会带来远高于本地PCIe访问的网络延迟,使得NVMe协议低延迟的目标遭到破坏。在没有采用RDMA技术的前提下,NVMe/TCP是采用什么技术达到类似RDMA技术的传输效果呢?下面引用杨子夜(Intel存储软件工程师)观点,谈谈促使了NVMe/TCP的诞生几个技术原因:

1. NVMe虚拟化的出现:在NVMe虚拟化实现的前提下,NVMe-oF target那端并不一定需要真实的NVMe 设备,可以是由分布式系统抽象虚拟出来的一个虚拟NVMe 设备,为此未必继承了物理NVMe设备的高性能的属性 。那么在这一前提下,使用低速的TCP协议也未尝不可。

2. 向后兼容性:NVMe-oF协议,在某种程度上希望替换掉iSCSI 协议(iSCSI最初的协议是RFC3720,有很多扩展)。iSCSI协议只可以在以太网上运行,对于网卡没有太多需求,并不需要网卡一定支持RDMA。当然如果能支持RDMA, 则可以使用iSER协议,进行数据传输的CPU 资源卸载。 但是NVMe-oF协议一开始没有TCP的支持。于是当用户从iSCSI向NVMe-oF 转型的时候,很多已有的网络设备无法使用。这样会导致NVMe-oF协议的接受度下降。在用户不以性能为首要考量的前提下,显然已有NVMe-oF协议对硬件的要求,会给客户的转型造成障碍,使得用户数据中心的更新换代不能顺滑地进行。

3. TCP OffLoading:虽然TCP协议在很大程度上会降低性能,但是TCP也可以使用OffLoading,或者使用Smart NIC或者FPGA。那么潜在的性能损失可得到一定的弥补。总的来说短期有性能损失,长期来讲协议对硬件的要求降低,性能可以改进。为此总的来讲,接受度会得到提升。

4. 相比Software RoCE:在没有TCP Transport的时候,用户在不具备RDMA网卡设备的时候。如果要进行NVMe-oF的测试,需要通过Software RoCE,把网络设备模拟成一个具有RDMA功能的设备,然后进行相应的测试。其真实实现是通过内核的相应模块,实际UDP 包来封装模拟RDMA协议。有了TCP transport协议,则没有这么复杂,用户可以采用更可靠的TCP协议来进行NVMe-oF的一些相关测试。 从测试部署来讲更加简单有效。

NVMe/TCP(NVMe over TCP)的协议,在一定程度上借鉴了iSCSI的协议,例如iSCSI数据读写的传输协议。这个不太意外,因为有些协议的指定参与者,也是iSCSI协议的指定参与者。另外iSCSI协议的某些部分确实写得很好。 但是NVMe/TCP相比iSCSI协议更加简单,可以说是取其精华。

Guess you like

Origin www.cnblogs.com/zhiyunkej/p/11022449.html