InfiniBand technology architecture and protocol analysis

Infiniband open standard technology simplifies and speeds up the connection between the server, supports connection to the server with a remote storage and network devices.

IB technology development

Since 1999 the drafting of specifications and standards, was published in 2000, but the growth rate is less than Rapid I / O, PCI-X, PCI-E and FC, plus Ethernet progress from 1Gbps to 10Gbps. So until after 2005, InfiniBand Architecture (IBA) was widely used in the cluster supercomputers. Global Top 500 supercomputers big performance in a considerable number of sets of systems use the IBA.

As more and more large companies are joining or return it to the camp in the past, including Cisco, IBM, HP, Sun, NEC, Intel, LSI and so on. InfiniBand has become the mainstream of high-performance computer interconnect technology. In order to meet the HPC, enterprise data centers and cloud computing environments of high I / O throughput requirements, a new generation of high-speed FDR 56Gbps (Fourteen Data Rate) and EDR InfiniBand technology has emerged.

IB advantage of technology

Infiniband connection between a large number of FC / IP SAN, NAS and servers, as iSCSI RDMA storage protocol has been standardized iSER IETF. EMC currently has switched to a full line of Infiniband networking, IBM / TMS of FlashSystem series, IBM's storage systems XIV Gen3, DDN's SFA series are used Infiniband network.

The main advantage compared to FC in performance is 3.5 times that of FC, Infiniband switch delay is 1/10 FC switches, SAN support and NAS.

The storage system does not satisfy the stored network server bare conventional FC SAN connection architecture is provided. HP SFS and IBM GPFS is in Infiniband fabric connecting servers and storage building iSER Infiniband parallel file system, completely broke through the bottleneck of the system performance.

Infiniband uses high bandwidth PCI serial link connection from the SDR, DDR, QDR, FDR to EDR HCA, can do delicate, low latency even nano level 1 for advanced congestion control based flow control mechanism of the link layer .

InfiniBand using virtual channels (VL i.e. Virtual Lanes) way to achieve the QoS, some shared virtual channel is a physical link to another discrete logic communication links, each physical link can support up to 15 virtual channels and a standard management channel (VL15).

Kernel bypass RDMA technology, can provide RDMA read and write access between remote nodes, completely uninstall the CPU workload, hardware-based protocol came reliable transmission and higher performance.

Compared to TCP / IP network protocol, IB-based trust, flow control mechanisms to ensure the integrity of the connection, the packet loss rarely, the recipient after the data transmission is completed, a return signal to mark the availability of buffer space, so IB Since the original agreement eliminates packet loss and delay caused by retransmission, so as to enhance the efficiency and overall performance.

Ability to TCP / IP packets having forward losses, but due to constantly confirm the retransmission, the communication protocols based on the slower will therefore greatly affect performance.

IB Basic Concepts

IB is a channel-based bidirectional serial transmission, the connection topology is switched using the switching structure (Switched Fabric), when the line is not long enough available IBA repeaters (the Repeater) extends. Each IBA networks called subnets (Subnet), up to 65,536 nodes (Node) within each subnet, IBA Switch, IBARepeater only for Subnet category, to pass across multiple IBASubnet need to use IBA router ( Router) or IBA gateway (gateway).

Each node (Node) must be connected through the adapter (Adapter) and IBA Subnet, Node CPU, memory, transmission to HCA (Host Channel Adapter) is connected to a subnet; node hard disk, I / O will have to pass through TCA ( TargetChannel Adapter) is connected to a subnet, such a topology constitutes a complete IBA.

IB and the transmission medium is quite flexible, a printed circuit board can be used copper wire transfer foil (Backplane backplane) in the machine device, the available copper cable or fiber optic medium further support from the outside. If the copper foil, copper as far to 17m, and the optical fiber to be 10km, while IBA also hot-swappable, and with automatic detection, self-adjustment of the activation of the intelligent Active Cable connection mechanism.

IB Protocol Overview

InfiniBand is a layered protocol (similar to TCP / IP protocol), each responsible for different functions, the lower for the upper service, independent of each other at different levels. IB adoption of IPv6 header format. Packet header which includes a local routing identifier LRH, global routing identifier GRH, basic transmission identifier BTH like.

1, the physical layer

The physical layer defines the electrical and mechanical properties, including fiber optic and copper cables and media outlet, a backplane connector, the heat exchange characteristics. It defines the back, cable, fiber optic cable three physical ports.

And defines the symbols (including start and end), a data symbol (DataSymbols), and direct filling packages (idles) for forming data frames. Detail of building an effective signaling protocol packet, such as the symbol encoding, framing alignment mark, or the start and end ineffective delimiter between data symbol, no parity error, the synchronization method.

2, the link layer

It describes the format of the link layer and packet data protocol packet operations, such as routing and flow control of packets within the subnet. Link management data link layer packets and data packets of the two types of data packets.

3, the network layer

Network layer protocols to forward packets between subnets, similar to the network layer of the IP network. Data routes have subnets, without participating in the network layer data transfer within the subnet.

Packet comprising a global routing header GRH, for routing forwarding packets between subnets. Global IPv6 routing header indicates the address format using a global identifier (GID) of the source and destination ports, the router forwards the data packets based GRH. GRH using IPv6 header format. GID uniquely tied together by each subnet and subnet port identifier GUID.

4, the transport layer

Transport layer is responsible for distributing packets, channel multiplexing, and basic transmission service message processing segment transmission, reception, and recombination. The function of the transport layer is specified to each transport packet queue (QP) and indicate how the packet queue processing. When the maximum transmission unit (MTU) of the load is greater than the data path message path, the transport layer is responsible for dividing the message into multiple packets.

The receiving side queue responsible for reassembling the data to the specified data buffer. In addition to the original data packets, all packets contain BTH, BTH designated destination queue and indicates the operation type, the packet sequence number and partition information.

5, upper layer protocol

InfiniBand provides different upper layer protocols for different types of users, and the messages and protocols defined for certain administrative functions. InfiniBand main support SDP, SRP, iSER, RDS, IPoIB and uDAPL and other upper layer protocols.

SDP (SocketsDirect Protocol) is the InfiniBand Trade Association (IBTA) is based on a protocol developed by infiniband, which allows users to use existing TCP / IP protocols running on top of a high-speed infiniband.

SRP (SCSIRDMA Protocol) is a communication protocol in InfiniBand, SCSI command will be packaged in InfiniBand, SCSI commands to allow communication between different systems RDMA (Remote Direct Memory Access), and shared storage devices to achieve communication RDMA service.

iSER (iSCSIRDMA Protocol) is similar SRP (SCSI RDMA protocol) protocol, a protocol is IB SAN, its main role is to iSCSI protocol commands and data to go by way of example, the RDMA Infiniband network such as iSCSI RDMA iSER storage protocol has been standardized IETF.

RDS (ReliableDatagram Sockets) protocol similar to UDP sockets designed for use to transmit and receive data on the Infiniband. Actually developed by Oracle Corporation runs on infiniband, based directly on the IPC protocol.

IPoIB (IP-over-IB) is to achieve agreement INFINIBAND network compatible with TCP IP / Network enacted, based on TCP / IP protocol for user application is transparent, and can provide greater bandwidth, which is the original use TCP / IP protocol stack application does not require any modification to use IPoIB.

uDAPL (UserDirect Access Programming Library) user program to directly access the API library is standard, to improve data center data messaging application performance, scalability, and reliability by Remote Direct Memory interconnect (e.g., InfiniBand) access RDMA functionality.

IB scenarios

Infiniband flexible supports straight and more networking switches, mainly for high-performance computing HPC scene, large data centers and other high-performance storage scenario, the common aspiration of HPC is a low latency application (<10 sec), low CPU occupancy rate (<10%) and high bandwidth (56 or main 100Gbps)

In one aspect the host side using RDMA Infiniband technology releases the CPU load, the delay can be reduced from the host data processing tens of microseconds to 1 microsecond; InfiniBand other hand, high-bandwidth network (40G, 56G and 100G), low latency (hundreds of nanoseconds) and lossless characteristics learned the reliability of FC and Ethernet networks flexible expansion capabilities.
--------------------- 
Author: Hardy Han Di 
Source: CSDN 
Original: https: //blog.csdn.net/swingwang/article/details/72887367 

Guess you like

Origin blog.csdn.net/jsd2honey/article/details/87867994