High Performance Network SIG Monthly News: virtio-net supports dynamic interrupt adjustment, SMC v2 protocol adds new extensions

High-performance network SIG (Special Interest Group): In the era of cloud computing, software and hardware develop rapidly, and new application forms such as cloud-native and micro-services emerge, allowing more data to flow between processes, and the network has become the The carrier of streaming plays an unprecedentedly important role in the entire cloud era. In this era of the Internet of Everything, the efficiency of network communication on the cloud is crucial to various services. The High Performance Network Interest Group is committed to using new high-efficiency communication technologies such as XDP, RDMA, VIRTIO, and combining the idea of ​​software and hardware integration to create high-performance The performance network protocol stack improves the performance of the network applied in the data center in the era of cloud computing.

01 Overall progress of SIG this month

This month's High Performance Networking SIG's main work focuses on SMC, virtio and Anolis OS common kernel networking.

Key developments this month:

  • This month, the SIG team and the IBM SMC development team reached an agreement on the protocol extension of SMC v2. So far, several key protocol extensions promoted by SIG (single connection feature support, LGR maximum connection number can be negotiated support, RDMA write with immediate support, and extensible options, etc.) were basically discussed at the SMC developer meeting.
  • The virtio-net: support the virtqueue coalescing moderation (​ ​https://lists.oasis-open.org/archives/virtio-dev/202303/msg00415.html​ ) proposal proposed and advanced by the SIG group was voted through in the virtio community , this proposal allows virtio-net to support dynamic interrupt adjustment and improve the performance of virtio-net.

02 Anolis OS

repair

Anolis OS ANCK 5.10 adds the following fixes:

A total of 17 CVEs were repaired in the network direction this month , covering modules such as ulp/wifi/sched/ipv6/sctp/mpls/usb/nfc/netfilter/bluetooth, CVE list: CVE-2022-4129, CVE-2023-0461, CVE- 2023-23455, CVE-2023-23559, CVE-2022-47929, CVE-2023-0394, CVE-2023-1074, CVE-2023-26545, CVE-2022-2964, CVE-2023-23454, CVE-2023- 1281, CVE-2020-25672, CVE-2023-0590, CVE-2020-25670, CVE-2023-1095, CVE-2022-20566, CVE-2020-25671.

03 High performance network protocol stack SMC

This month, the Dragon Lizard Community High-Performance Network SIG's work in the SMC field mainly focuses on the extension of standardized protocols, native high-performance communication solutions, and eBPF policy replacement capabilities .

Native high-performance communication solution

Native loopback and inter-container (cross-netns) communication is already a common data path, which is widely used in data processing and cloud-native scenarios. For example, in cloud-native scenarios, service mesh communicates with business processes and sidecars through proxy processes. SMC provides a native (loopback and container) high-performance communication solution. Compared with traditional user-mode IPC, and the kernel's TCP loopback or UNIX domain socket, it has great advantages in performance (for detailed data, please refer to LWN link:​ ​https://lwn.net/Articles/927410/​ ​ ), and transparent to the application without intrusion or modification. This month we continued to refine the RFC proposal that had been pushed to the Linux community and sent out a v4 version (LWN link). Judging from the current feedback, the Linux community has reached a consensus on the applicable scenarios and feasibility of the solution. At present, the focus is on how the SMC protocol can uniquely identify the local loopback device. The current solution is to distinguish different loopback devices through multiple IDs, and the probability of conflict is 1/2^(64*3). Regarding the further topic of SMC loopback, we will further communicate with the Maintainer of the Linux community at the SMC developer meeting.

protocol extension

The SMC protocol is defined in IETF RFC 7609 and IBM SMC PDF manual. The protocol stipulates how the two parties shake hands until the connection is established and communicates normally. There are some problems in the current protocol, such as the SMC v2 protocol has not yet been standardized, and there is no expansion beyond the expected scenarios for the protocol. This month, the Dragon Lizard community and the Linux community reached an agreement on protocol extensions. So far, several key protocol extensions we have promoted have been discussed at the SMC developer regular meeting, and after a series of reviews and comments, they have entered into a formal agreement. Update and release process. The protocol extensions agreed upon this time mainly include: support for single connection features, support for negotiable maximum number of connections in LGR, support for RDMA write with immediate, and scalable options .

eBPF Policy Replacement

SMC provides the ability to dynamically fall back to TCP. The decision factor of the current fallback strategy is mainly whether the RDMA/ISM connection is successfully established. Since the short link performance of SMC is not as good as that of TCP, in order to make SMC more general, it is planned to add a policy-based fallback TCP capability to SMC to help SMC better adapt to different application models and scenarios. Based on the above background, we pushed the SMC policy fallback patch based on the eBPF struct ops feature to the Linux BPF community, and discussed the implementation details with the BPF maintainer and struct ops authors. Based on the discussed details, we sent an updated version of the patch. If the eBPF policy replacement solution is accepted by the community, we will submit the user-mode implementation of the eBPF policy replacement to the SMC tool (smc-tools) to simplify user use, and the SMC ULP solution will also be discarded.

04 virtue

virtio specification 支持 virtqueue notification coalescing

Background: Net DIM (Generic Network Dynamic Interrupt Moderation) https://www.kernel.org/doc/html/next/networking/net_dim.html Algorithm counts the traffic information and the number of interruptions of a single queue in the current network , the adaptive computing interrupt adjusts the direction and step size, and sends the resulting configuration to the device to achieve the purpose of improving network throughput. Since the current virtio specification does not support sending interrupt parameters to a single queue, it cannot support netdim yet.

The virtio-net: support the virtqueue coalescing moderation proposal promoted by the high-performance network group can enable virtio-net to support dynamic interrupt adjustment, and issue interrupt adjustment parameters to each queue. Currently, this work has been voted by the virtio community ( https://lists.oasis-open.org/archives/virtio-dev/202303/msg00415.html​ ) .

virtio-net inner header hash

In order to support tunneling protocols such as VXLAN, GENEVE, and GRE to calculate the hash based on the inner header and improve the packet receiving performance of RSS and monitoring scenarios, the High Performance Network SIG team launched virtio_net: support inner header hash (​ https oasis-open.org/archives/virtio-dev/202303/msg00415.html​ ) proposal.

This month's v9->v12 discussion mainly focuses on how the inner header hash can better support symmetric hashing:

Tunneling protocols such as legacy GRE transmit based on the IP header, which has neither an external port number nor a field that can be used to identify flow-specific, so it is necessary to calculate the hash based on the internal header. In addition, although modern tunneling protocols such as VXLAN and GENEVE can be transmitted based on the transport header, that is, the external port number can be used to increase entropy, but in some monitoring scenarios, they may require that packets of the same flow be hashed to the same Receive queue, but the same flow may be encapsulated through different tunnels, therefore, a symmetric hash must be calculated using the inner header to achieve this purpose.

At present, there is no consensus on how to find a key or hash algorithm that can symmetrically hash ipv4/ipv6 packets at the same time. We will communicate further with the community to make the inner header hash more general.

virtio-net support AF_XDP zerocopy

Background: AF_XDP is a new packet sending and receiving framework for the bypass kernel. It can directly transfer the received packets of the driver to the user state, and the user state can directly transfer the packets from the user state to the driver and send them out directly. Its performance can be improved by 3-7 times compared with the UDP PPS of the kernel. But it needs to be driven to support zerocopy.

Due to the large number of patch sets in this series, it is planned to be divided into two parts:

  • The virtio core supports DMA premapped. This part is to enable the virtio core framework to support the submission of DMA address operations. In the current implementation, all DMA operations are completed in the virito core. We need to allow the virtio core to support the transfer of DMA address to the virtio core. Because AF_XDP will complete all address DMA operations ahead of time, as well as some virtqueue reset related operations.
  • This part has basically passed the review in the community.
  • virtio-net supports AF_XDP. Subsequent work will be submitted to the net-next branch on the basis of the previous patch set.
  • This part of the patch has also been split, and the code refactoring of virtio-net's xdp is currently being promoted. This implementation has basically reached a consensus with the community.
  • Subsequent patches will be submitted after the next version is released, and the net part contains the code update of virtio core.

The above is the monthly dynamics of the high-performance network SIG in March. Everyone is welcome to join the co-construction (see the end of the article for the method of joining the group). For more SIG news, go to Dragon Lizard official website:

Dragon Lizard SIG homepage link:​ ​https://openanolis.cn/sig

High-performance network SIG homepage:​ ​https://openanolis.cn/sig/high-perf-network

-- over--

Guess you like

Origin blog.csdn.net/weixin_60347558/article/details/130129822