RDMA Architecture Overview

1.1.InfiniBand

         native Remote Direct Memory Access (RDMA).
        InfiniBand uses I/O channels for data communication (up to 16 million per host),
        where each channel provides the semantics of a virtualized NIC or HCA (security, isolations etc).

1.2.Virtual Protocol Interconnect® (VPI)

         A VPI adapter or switch can be set to deliver either InfiniBand or Ethernet semantics per port.A dual-port VPI adapter, for example, can be configured to one of the following options:
        • An adapter (HCA) with two InfiniBand ports
        • A NIC with two Ethernet ports
        • An adapter with one InfiniBand port and one Ethernet port at the same time
        Similarly, a VPI switch can have InfiniBand-only ports, Ethernet-only ports, or a mix of both InfiniBand and Ethernet ports working at the same time.
        Mellanox-based VPI adapters and switches support both the InfiniBand RDMA and the Ethernet RoCE solutions.

1.3.RDMA over Converged Ethernet (RoCE)

         RoCE is a standard for RDMA over Ethernet that is also defined and specified by the IBTA organization.
        RoCE provides true RDMA semantics for Ethernet as it does not require the complex and low performance TCP transport (needed for iWARP, for example).
        RoCE has been fully supported by the Open Fabrics Software since the release of OFED.

1.4.Comparison of RDMA Technologies

        Currently, there are three technologies that support RDMA: InfiniBand, Ethernet RoCE and Ethernet iWARP.
        All three technologies share a common user API which is defined in this document, but have different physical and link layers.
        RoCE is supported by many leading solutions, and is incorporated within Windows Server software (as well as InfiniBand).
        The key difference is that RDMA provides a messaging service which applications can use to directly access the virtual memory on remote computers.
        The messaging service can be used for Inter Process Communication (IPC),
        communication with remote servers and to communicate with storage devices using Upper Layer Protocols (ULPs) such as iSCSI Extensions for RDMA (ISER) and SCSI RDMA Protocol (SRP), Storage Message Block (SMB), Samba, Lustre, ZFS and many more.
        RDMA provides Channel based IO. This channel allows an application using an RDMA device to directly read and write remote virtual memory.
        However RDMA use the OS to establish a channel and then allows applications to directly exchange messages without further OS intervention.
        A message can be an RDMA Read, an RDMA Write operation or a Send/ Receive operation. IB and RoCE also support Multicast transmission.
        The IB Link layer offers features such as a credit based flow control mechanism for congestion control.
        It also allows the use of Virtual Lanes (VLs) which allow simplification of the higher layer level protocols and advanced Quality of Service. It guarantees strong ordering within the VL along a given path.
        The IB Transport layer provides reliability and delivery guarantees.
        The Network Layer used by IB has features which make it simple to transport messages directly between applications' virtual memory even if the applications are physically located on different servers. Thus the combination of IB Transport layer with the Software Transport Interface is better thought of as a RDMA message transport service. The entire stack, including the Software Transport Interface comprises the IB messaging service.

          In IB, a complete message is delivered directly to an application. Once an application has requested transport of an RDMA Read or Write, the IB hardware segments the outbound message as needed into packets whose size is determined by the fabric path maximum transfer unit.
        These packets are transmitted through the IB network and delivered directly into the receiving application's virtual buffer where they are re-assembled into a complete message. The receiving application is notified once the entire message has been received. Thus neither the sending nor the receiving application is involved until the entire message is delivered into the receiving application's buffer.

 1.5.Key Components

        These are being presented only in the context of the advantages of deploying IB and RoCE. We do not discuss cables and connectors.
        Host Channel Adapter
        HCAs provide the point at which an IB end node (for example, a server) connects to an IB network.These are the equivalent of the Ethernet (NIC) card but they do much more. HCAs provide address translation mechanism under the control of the operating system which allows an application to access the HCA directly. 

        The same address translation mechanism is the means by which an HCA accesses memory on behalf of a user level application. The application refers to virtual addresses while the HCA has the ability to translate these addresses into physical addresses in order to affect the actual message transfer.

        Range Extenders
        InfiniBand range extension is accomplished by encapsulating the InfiniBand traffic onto the WAN link and extending sufficient buffer credits to ensure full bandwidth across the WAN.

        Subnet Manager
        The InfiniBand subnet manager assigns Local Identifiers (LIDs) to each port connected to the InfiniBand fabric and develops a routing table based on the assigned LIDs. 
        The IB Subnet Manager is a concept of Software Defined Networking (SDN) which eliminates the interconnect complexity and enables the creation of very large scale compute and storage infrastructures

        Switches
        IB switches are conceptually similar to standard networking switches but are designed to meet IB performance requirements. They implement flow control of the IB Link Layer to prevent packet dropping, and to support congestion avoidance and adaptive routing capabilities, and advanced Quality of Service. Many switches include a Subnet Manager. At least one Subnet Manager is required to configure an IB fabric.

1.6.Support for Existing Applications and ULPs

        IP applications are enabled to run over an InfiniBand fabric using IP over IB (IPoIB) or Ethernet over IB (EoIB) or RDS ULPs. Storage applications are supported via iSER, SRP, RDS, NFS, ZFS, SMB and others. MPI and Network Direct are all supported ULPs as well, but are outside the scope of this document.

猜你喜欢

转载自blog.csdn.net/x13262608581/article/details/125023998
今日推荐