FPGA implements NIC 10G UDP protocol stack network card, written in pure verilog code, providing engineering source code and technical support

1 Introduction

A network interface controller (NIC) is the gateway through which a computer interacts with a network. The NIC forms the bridge between the software protocol stack and the network, the functions of this bridge define the network interface. Both the capabilities of network interfaces and the implementation of those capabilities are evolving rapidly. These changes are driven by the dual requirements of increasing line speed and supporting NIC capabilities for high-performance distributed computing and virtualization. Increasing wire speeds have resulted in many NIC functions having to be implemented in hardware rather than software. At the same time, new network capabilities, such as precise transmission control over multiple queues, are required to enable advanced protocols and network architectures.

To meet the need for an open development platform for new networking protocols and architectures at practical line speed, we are developing an open source FPGA-based high-performance NIC prototyping platform. This 10G network card platform can run at a speed of 10Gbps, and together with its driver, it can be used in the entire network protocol stack. The design is portable and compact, supporting many different devices while leaving enough resources for further customization even on smaller devices. The modular design and scalability of this 10G NIC allow co-optimized hardware/software solutions to develop and test advanced network applications in real-world environments.

FPGA-based NICs combine the capabilities of ASIC-based and software NICs: they can run at wire speed and provide low latency and precise timing, while the development cycle for new features is relatively short. There are also high-performance, FPGA-based proprietary NICs developed; for example, Alibaba has developed a fully custom FPGA-based RDMA-only NIC that they use to run a hardware implementation of the Precision Congestion Control Protocol (HPCC). Commercial products also exist, including those offered by Exablaze and Netcope. Unfortunately, similar to ASIC-based NICs, commercially available FPGA-based NICs tend to have basic "black box" functionality that cannot be modified. The closed nature of basic NIC functions severely limits their utility and flexibility when developing new network applications.

Commercially available high-performance DMA components, such as the Xilinx XDMA core and QDMA core, and the Atomic Rules Arkville DPDK acceleration core do not provide fully configurable hardware to control the transfer data flow. Xilinx XDMA cores are designed for compute offload applications and as such offer very limited queuing capabilities and no easy way to control transfer scheduling. The Xilinx QDMA core and the Atomic Rules Arkville DPDK accelerated core are geared toward network applications by supporting a small number of queues and providing a DPDK driver; however, the number of supported queues is small (2K queues for the XDMA core versus 128 for the Arkville core) and two cores Neither provides an easy way to precisely control packet transmission. FPGA-based packet processing solutions include Catapult for network application offload and FlowBlaze for reconfigurable matching engines on FPGAs. However, these platforms leave standard NIC functions to separate ASIC-based NICs and operate entirely as "off-line raids", providing no explicit control over the NIC scheduler or queues. Other items are implemented using software or partially implemented in hardware. Shoal describes a network architecture that uses custom NICs and fast Layer 1 electrical crosspoint switches to perform small-scale routing. Shoal is built with hardware, but only evaluated with synthetic traffic without a host connection. SENIC describes NIC-based scalable rate limiting. The hardware implementation of the scheduler is evaluated separately, but the system-level evaluation is performed in software with a custom queuing discipline (qdisc) module. PIEO describes a flexible NIC scheduler that is evaluated separately in hardware. NDP is a pull-mode transport protocol for data center applications. NDP has been evaluated with DPDK software NIC and FPGA-based switches. Loom describes an efficient NIC design that can be evaluated in software using BESS.

The development of this 10G network card is different from all these projects, because it is fully visible verilog source code, and can run with standard host network protocol stacks at actual wire speed. It provides thousands of transmit queues with a scalable transmit scheduler for fine-grained control over flows. The result is a powerful and flexible open source platform for developing web applications that combines hardware and software capabilities.

This design is a 10G rate network card. Xilinx Virtex-7 xc7vx690t calls the Virtex-7 FPGA Gen3 Integrated Block for PCI Express IP core to realize the data interaction between the FPGA network card and the host computer. PCIE is equipped with a DMA controller written in pure verilog to realize PCIE data The handling work; call 10G Ethernet PCS/PMA (10GBASE-R/KR) GTH resources to realize data interaction between FPGA network card and other network cards, GTY is configured as XGMII interface, and GTY is equipped with XGMII data receiving and receiving interface written in pure verilog code to realize GTY Data interaction with the UDP protocol stack; the UDP protocol stack is also written by pure verilog code, equipped with an XGMII data interaction interface; the FPGA network card is connected to other network cards or switches through the optical fiber of the SFP optical port, and itself is connected to the computer host through PCIE to realize The function of the high-speed network card is as follows:
insert image description here
UDP data is sent and received through the SFP optical port; the interactive interface between the UDP protocol stack and the MAC is XGMII, and the rate is up to 100G. The user interface of the UDP protocol stack is XGMII, so that users do not need to Concerned about the complex UDP protocol and only need to care about the simple user interface timing to operate UDP sending and receiving;

This design is stable and reliable after a large number of repeated tests, and can be directly transplanted and used in the project. The engineering code can be comprehensively compiled and debugged on the board, and can be directly transplanted into the project. It is suitable for project development of students and graduate students, and also suitable for project development of on-the-job engineers. It can be applied to the digital communication field of medical, military and other industries;
provide complete and smooth engineering source code and technical support;
the method of obtaining engineering source code and technical support is at the end of the article, please be patient to read the end;

2. The UDP scheme I have here

At present, I have a large number of project source codes of UDP protocol, including UDP data loopback, video transmission, AD acquisition and transmission, etc., as well as projects of TCP protocol. Brothers who need network communication can go and have a look: directly click to go

3. Introduction to the basic performance of 10G network card

This 10G network card is a platform based on Xilinx high-end series FPGA, which is used for network interface development platform up to 100Gbps and higher; the platform includes some core functions for real-time, high-line speed operation, including: high-performance data path, 10G/25G/100G Ethernet MAC, PCIExpress Gen 3, custom PCIeDMA engine, and native high-accuracy IEEE 1588 PTP time stamping. A key feature is scalable queue management, which can support over 10,000 queues as well as a scalable transmission scheduler, allowing fine-grained hardware control over packet transmission. Combined with multiple network interfaces, multiple ports per interface, and event-driven transmission scheduling per port, these capabilities enable the development of advanced network interfaces, architectures, and protocols. The software interface to these hardware functions is a high-performance driver for the Linux network protocol stack. The platform also supports scatter/gather DMA, checksum offload, receive stream hashing and receive-side scaling. Development and debugging are facilitated by a comprehensive, open-source, Python-based simulation framework that includes the entire system, from drivers and simulation models of the PCIExpress interface to the Ethernet interface. The power and flexibility of the platform is demonstrated by implementing a microsecond-level time-division multiple access (TDMA) hardware scheduler that performs TDMA scheduling at a line rate of 100Gbps without CPU overhead.
This 10G network card has several unique architectural features. First, the hardware queue state is efficiently stored in FPGA block RAM, enabling thousands of individually controllable queues. These queues are associated with interfaces, and each interface can have multiple ports, each with its own independent transmit scheduler. This allows for extremely fine-grained control over packet transmission. The scheduler module is designed to be modified or swapped. It is entirely possible to implement different transmission scheduling schemes, including experimental schedulers. Coupled with PTP time synchronization, this enables time-based scheduling, including high-precision TDMA.
The design of this 10G network card is modular and highly parameterized. Many configuration and structural options can be set via Verilog parameters at synthesis time, including interface and port counts, queue counts, memory size, scheduler type, and more. These design parameters are exposed in configuration registers that the driver reads to determine the NIC configuration, enabling the same driver to support many different boards and configurations without modification.
The current design supports the PCIe DMA component interfaced to the Xilinx Ultrascale PCIe Hard IP core. Support for the PCIe TLP interface commonly used in other FPGAs has not yet been implemented and is future work. This support should be possible to operate on more FPGAs.

4. Detailed design plan

The detailed design scheme is as follows:
insert image description here
From a high-level view, the NIC consists of 3 main nested modules. The top-level module mainly contains support and interface components. These components include the PCI Express hard IP core and DMA interface, the PTP hardware clock, and the Ethernet interface components including the MAC, PHY, and associated serializers. The top-level module also includes one or more interface module instances. Each interface module corresponds to an operating system level network interface (eg eth0). Each interface module contains queue management logic as well as descriptor and completion handling logic. The queue management logic maintains queue status for all NIC queues: transmit, transmit complete, receive, receive complete, and event queues. Each interface module also contains one or more port module instances. Each port module provides an AXI stream interface to the MAC and contains a transmit scheduler, transmit and receive engines, transmit and receive data paths, and a scratch RAM for temporary storage of incoming and outgoing during DMA operations data packets.
For each port, the transmit scheduler in the port module decides which queues are designated for transmission. The transmit scheduler generates commands for the transmit engine that coordinate operations on the transmit datapath. The scheduler module is a flexible functional block that can be modified or replaced to support arbitrary schedules, which can be event-driven. The default implementation of the scheduler is a simple loop. All ports associated with the same interface module share the same set of transmit queues and appear to the operating system as a single unified interface. This enables flows to be migrated between ports or load balanced across multiple ports by changing only the transmit scheduler settings without affecting the rest of the network stack. This dynamic, scheduler-defined queue-to-port mapping is a unique feature of this 25G/100G NIC and can enable research into new protocols and network architectures, including parallel networks such as P-FatTree and optically switched networks such as RotorNet and Opera.

Interface Overview

Now explain the modules in the block diagram as follows:
PCIe HIP: PCIe hard IP core; AXIL M: AXI lite Master; DMA IF: DMA interface; PTP HC: PTP hardware clock; TXQ: transmission queue manager; TXCQ: transmission completion queue management RXQ: Receive Queue Manager; RXCQ: Receive Completion Queue Manager; EQ: Event Queue Manager; MAC + PHY: Ethernet Media Access Controller (MAC) and Physical Interface Layer (PHY).

PCIe HIP

PCIe hard IP core:
PCIe hard IP core; here is the Virtex-7 FPGA Gen3 Integrated Block for PCI Express IP core, which realizes the data interaction between the FPGA network card and the host computer; the position and configuration of IP in the code are as follows: My
insert image description here
insert image description here
FPGA The performance of the board is very high, so the PCIE3.0 X18 mode is directly used, and the speed directly reaches a single Line 8G;

DMA IF

DMA interface; the DMA controller written in pure verilog realizes the handling of PCIE data; the position of this module in the code is as follows:
insert image description here
DMA IF is mainly composed of read and write modules, and it is very simple to complete data handling;

AXI bus interface

AXIL M: AXI lite master; that is, the AXI-Lite bus master, because PCIE IP has an AXI-Lite configuration interface, which belongs to the requirement for mounting the bus; AXI M: AXI master; that is, the
AXI-FULL bus master, because PCIE IP has AXI User interface, here belongs to the requirement of mounting the bus;

Clock Synchronization

PHC: PTP hardware clock; do clock synchronization; mainly balance the asynchronous sending and receiving clocks; the code location is as follows:
insert image description here

TXQ and RXQ queues

transmit queue manager; data sending queue manager;
receive queue manager; data receiving queue manager;
the code location is as follows:
insert image description here

TXCQ and RXCQ queues complete

transmit completion queue manager; transmit completion queue manager;
receive completion queue manager; receive completion queue manager;
the code location is as follows:

EQ

event queue manager; event queue manager;

MAC + PHY

Ethernet media access controller (MAC) and physical interface layer (PHY); Ethernet media access controller (MAC) and physical interface layer (PHY); code location is as follows: In the receive direction, incoming packets pass through the flow hash
insert image description here
module Determines the target receive queue and generates commands for the receive engine that coordinate operations on the receive data path. Since all ports in the same interface module share the same set of receive queues, incoming flows on different ports are merged into the same set of queues. Custom modules can also be added to the NIC to preprocess and filter incoming packets before passing them through the PCIe bus.

The components on the NIC are interconnected with several different interfaces, including AXI lite, AXI streams, and a custom segmented memory interface for DMA operations, which will be discussed later. AXI lite is used for the control path from the driver to the NIC. It is used to initialize and configure the NIC components, and to control the queue pointers during send and receive operations. The AXI stream interface is used to transfer packetized data within the NIC, including PCIe Transport Layer Packets (TLPs) and Ethernet frames. The segmented memory interface is used to interface the PCIe DMA interface to the NIC data path as well as the descriptor and completion handling logic. Most of the NIC logic runs in the PCIe user clock domain, which is nominally 250 MHz for all current design variants. Asynchronous FIFOs are used to interface with the MACs running in the serializers to transmit and receive clock domains appropriately – 156.25 MHz for 10G, 390.625 MHz for 25G, and 322.266 MHz for 100G. The following sections describe several key functional blocks in the NIC.

Pipeline Queue Management

The packet data communication between the 10G network card NIC and the driver is mediated through descriptors and completion queues. Descriptor queues form the host-to-NIC communication channel, carrying information about where individual packets are stored in system memory. The completion queue forms the NIC-to-host communication channel, containing information about completed operations and associated metadata. The descriptor and completion queues are implemented as ring buffers residing in DMA-accessible system memory, while the NIC hardware maintains the necessary queue state information. This state information includes pointers to ring buffer DMA addresses, the size of the ring buffer, producer and consumer pointers, and references to associated completion queues. The required descriptor state per queue fits in 128 bits.

The queue management logic of this 10G network card NIC must be able to effectively store and manage the states of thousands of queues. This means that the queue state must be stored entirely in the FPGA's Block RAM (BRAM) or Ultra RAM (URAM). Since 128 bits of RAM are required, and URAM blocks are 72x4096, only 2 instances of URAM are needed to store the state of 4096 queues. The queue management logic can be extended to handle at least 32768 queues per interface using URAM instances.

In order to support high throughput, this 10G network card NIC must be able to process multiple descriptors in parallel. Therefore, the queue management logic must keep track of multiple ongoing operations and report an updated queue pointer to the driver when the operation completes. The state required to track in-process operations is much smaller than descriptor state, so it can be stored in flip-flops and distributed RAM.

This 10G network card NIC design uses two queue manager modules: queue_manager is used to manage the descriptor queue from the host to the NIC, and cpl_queue_manager is used to manage the completion queue from the NIC to the host. Apart from some minor differences in pointer handling, fill handling, and doorbell/event generation, the modules are similar. Due to the similarities, this section will only discuss the operation of the queue_manager module.

BRAM or URAM arrays used to store queue state information require several latency cycles for each read operation, so queue_manager is built using a pipelined structure to facilitate multiple concurrent operations. The pipeline supports four different operations: register read, register write, dequeue/enqueue request, and dequeue/enqueue commit. Register access operations through the AXI lite interface allow drivers to initialize queue state and provide pointers to allocated host memory, as well as access to producer and consumer pointers during normal operation.

The positions of the queue_manager module and the cpl_queue_manager module in the code are as follows:
insert image description here
In fact, in order to improve the utilization rate of data bandwidth in the NIC, almost all modules adopt pipeline processing to promote high concurrency. This section uses the queue management module to introduce the pipeline design ideas based on operation tables and operation pointers.

The queue management logic of this 10G network card NIC must be able to effectively store and manage the states of thousands of queues. To support high throughput, the NIC must be able to process multiple descriptors in parallel. Therefore, the queue management logic must keep track of multiple ongoing operations and report an updated queue pointer to the driver when the operation completes. The operation table item of the NIC includes the activation and submission flags, the queue number, and the shadow pointer. The operation pointer includes the operation table start pointer and the operation table submission pointer. The current ongoing operation can be tracked by indexing different pointers to different fields of the operation table. Which step does the different operation items progress to, so that the pipeline operation can be triggered. In more detail, when the queue management receives a dequeue request, it puts the command into the pipeline and triggers the queue message at the same time. When the command reaches the processing cycle, the information of the corresponding queue has been indexed and can be processed at this time. If dequeue If allowed, the necessary information will be recorded in the operation table. The processing logic only needs to continuously write to the operation table and update the operation pointer. It can be considered that the dequeue logic is processing the header of the operation table. When the operation is submitted, the submission logic will be triggered. Submit Logically process the end of the operation table and release the operation table reasonably. It should be noted that the operation table only tracks the processing process in progress, so it does not need to be set too large. It and the information RAM of the queue management form a doubly linked list, that is, the latest operation table item index serving the queue needs to be stored in the queue information to maintain the correct shadow pointer. Its principle block diagram is as follows:
insert image description here

send scheduler

The default transmission scheduler used in this 10G network card NIC is a simple round-robin scheduler implemented in the tx_scheduler_rr module. The scheduler sends a command to the transmit engine to start the transmit operation from the NIC transmit queue. The round-robin scheduler contains the basic queue status of all queues, a FIFO is used to store the currently active queues and perform round-robin scheduling, and an operation table is used to track transfer operations in progress. Similar to the queue management logic, the round-robin transmission scheduler also stores queue state information in BRAM or URAM on the FPGA so that it can scale to support large numbers of queues. The transfer scheduler also uses processing pipelining to hide memory access latency. The position of the tx_scheduler_rr module in the code is as follows:
insert image description here
The transmission scheduler module has four main interfaces: AXI lite register interface and three stream interfaces. The AXI lite interface allows the driver to change scheduler parameters and enable/disable queues. The first stream interface provides doorbell events from the queue management logic when the driver queues packets for transmission. The second stream interface carries the transfer commands generated by the scheduler to the send engine. Each command contains the index of the queue to send to and a label to track operations in progress. The final streaming interface returns transfer operation status information to the scheduler. The status information notifies the scheduler of the length of the transmitted packet, or if the transmit operation failed because the queue was empty or disabled.

The transport scheduler module can be extended or replaced to implement arbitrary scheduling algorithms. This enables this 25G/100G NIC to be used as a platform for evaluating experimental scheduling algorithms, including those proposed in SENIC, Carousel, PIEO, and Loom. It is also possible to provide other inputs to the transmit scheduler block, including feedback from the receive path, which can be used to implement new protocols and congestion control techniques such as NDP and HPCC. Connecting the scheduler to a PTP hardware clock can be used to support TDMA, which can be used to implement RotorNet, Opera, and other circuit-switched architectures.

Ports and Interfaces

A unique architectural feature of this 10G NIC is the separation between ports and network interfaces, so multiple ports can be associated with the same interface. Most current NICs support one port per interface, as shown in Figure a below:
insert image description here
When the network protocol stack queues a packet for transmission on a network interface, the packet is injected into the network through the network port associated with that interface. However, in Corundum, each interface can be associated with multiple ports, so the hardware can decide which port to inject the packet into the network when dequeuing, as shown in Figure b above.

All ports associated with the same network interface module share the same set of transmit queues and appear to the operating system as a single unified interface. This way flows can be migrated between ports or load balanced across multiple ports by changing only the transmit scheduler settings without affecting the rest of the network stack. Dynamic, scheduler-defined queue-to-port mappings enable the study of new protocols and network architectures, including parallel networks such as P-FatTree and optically switched networks such as RotorNet and Opera.

Data Path and Transmit and Receive Engines

This 10G network card uses both a memory-mapped interface and a streaming interface in the data path. AXI stream is used to transmit Ethernet packet data between port DMA module, Ethernet MAC, checksum and hash calculation module. AXI stream is also used to connect PCIe hard IP core to PCIe AXI lite master block and PCIe DMA interface block. A custom segmented memory interface is used to interface the PCIe DMA interface block, port DMA block, and descriptor and completion handling logic to the internal scratchpad RAM.

The width of the AXI stream interface is determined by the required bandwidth. With the exception of the Ethernet MAC, the core datapath logic runs entirely in the 250 MHz PCIe user clock domain. Therefore, the AXI stream interface to the PCIe hard IP core must match the hard interface width - 256 bits for PCIe Gen 3 x8 and 512 bits for PCIe Gen 3 x16. On the Ethernet side, the interface width matches the MAC interface width, unless the 250 MHz clock is too slow to provide enough bandwidth. For 10G Ethernet, the MAC interface is 64 bits at 156.25 MHz, which can be connected to a 250 MHz clock domain with the same width. For 25G Ethernet, the MAC interface is 64 bits at 390.625 MHz, so it must be converted to 128 bits to provide enough bandwidth at 250 MHz. For 100G Ethernet, this 25G/100G network card uses Xilinx 100G hard CMAC core on Ultrascale Plus FPGA. The MAC interface is 512 bits at 322.266 MHz, it connects at 512 bits on the 250 MHz clock domain, since it needs to run at about 195 MHz to give 100 Gbps.

The block diagram of the NIC data path of this 10G network card is shown in the figure below:
insert image description here
it is a simplified version of the detailed design scheme diagram in Chapter 4 above. The PCIe Hard IP core (PCIe HIP) connects the NIC to the host. Two AXI stream interfaces connect the PCIe DMA interface block to the PCIe hard IP core. One interface is used for read and write requests, and one interface is used for reading data. Then, the PCIe DMA interface module is connected to the descriptor acquisition module, the write completion module, the port temporary storage RAM module, and the RX and TX engines through a set of DMA interface multiplexers. In the direction towards the DMA interface, a multiplexer combines DMA transfer commands from multiple sources. In the opposite direction, they route transport status responses. They also manage the segmented memory interface for reading and writing. The top-level multiplexer combines descriptor traffic with packet data traffic, giving higher priority to descriptor traffic. Next, a pair of multiplexers combine traffic from multiple interface modules. Finally, an additional multiplexer within each interface module combines packet data traffic from multiple port instances.

The send engine and receive engine are responsible for coordinating the operations required to transmit and receive packets. The send and receive engines can handle multiple in-flight packets for high throughput. As shown in the detailed design scheme diagram in Chapter 4, the transmit and receive engines are connected to several blocks in the transmit and receive data paths, including the port DMA block and the hash and checksum offload block, as well as the descriptor and completion processing logic and Time stamp interface module, Ethernet MAC module.

The sending engine is responsible for coordinating the transmission of data packets. The send engine handles transfer requests from a specific queue of the transfer scheduler. After low-level processing using the PCIe DMA engine, the packet goes through the transmission checksum block, MAC and PHY. Once the packet is sent, the send engine will receive the PTP timestamp from the MAC, build a completion record, and pass it to the completion write module.

Similar to the send engine, the receive engine is responsible for coordinating the reception of packets. Incoming packets go through PHY and MAC. After low-level processing including hashing and timestamping, the receive engine will issue one or more write requests to the PCIe DMA engine to write the packet data out to host memory. After the write operation is complete, the receive engine builds a completion record and passes it to the completion write module.

Descriptor read and complete write modules operate similarly to the send and receive engines. These modules handle descriptor/completion read/write requests from the transmit and receive engines, issue enqueue/dequeue requests to the queue manager to obtain queue element addresses in host memory, and then issue requests to the PCIe DMA interface to transfer data . The CompletionWrite module is also responsible for processing events by queuing send and receive completions in the appropriate event queues and writing out event records.

Segmented Memory Interface

For high-performance DMA over PCIe, Corundum uses a custom segmented memory interface. The interface is divided into segments of up to 128 bits, and the overall width is twice the width of the AXI stream interface of the PCIe hard IP core. For example, a design using PCIe Gen 3 x16 with a 512-bit AXI stream interface in the PCIe hard core would use a 1024-bit segmented interface divided into 8 segments of 128 bits each. Compared to using a single AXI interface, this interface provides improved "impedance matching", which eliminates alignment in the DMA engine and arbitration in the interconnect logic, which eliminates backpressure and improves PCIe link utilization. Specifically, the interface guarantees that the DMA interface can perform full-width, unaligned reads or writes every clock cycle. Additionally, contention between read and write paths is eliminated using simple dual-port RAM dedicated to traffic moving in a single direction.

Each segment operates similarly to AXI lite, except it uses three interfaces instead of five. One channel provides write address and data, one channel provides read address, and one channel provides read data. Unlike AXI, bursting and reordering are not supported, simplifying the interface logic. An interconnect component (the multiplexer) is responsible for maintaining the order of operations, even when accessing multiple RAMs. These segments operate completely independently of each other through separate flow control connections and separate instances of interconnect sequencing logic. Additionally, operations are routed based on individual select signals rather than address decoding. This feature eliminates the need to assign addresses and allows the use of parameterizable interconnected components that route operations appropriately with minimal configuration.

The byte address is mapped to the segment's interface address, with the lowest address bit identifying the byte lane in the segment, the next bits selecting the segment, and the highest bit identifying the word address for that segment. For example, in a 1024-bit segmented interface, divided into eight 128-bit segments, the lowest 4 address bits will identify the byte lane in the segment, and the next 3 bits will identify the segment. The remaining bits determine the address bus for the segment.

5. Detailed explanation of vivado project

Development board FPGA model: Xilinx–Virtex-7–xc7vx690tffg1761-3;
development environment: Vivado2022.2;
input/output: PCIE3.0/SFP optical port
application: 10G network card application
The project code structure is as follows:
insert image description here
the IP core used in this project There are: Virtex-7 FPGA Gen3 Integrated Block for PCI Express and 10G Ethernet PCS/PMA (10GBASE-R/KR) etc. The connection and function between IP cores and logic design are as follows: 4 channels of 10G Ethernet PCS/PMA are instantiated in the
insert image description here
project PMA (10GBASE-R/KR), these 4 channels work independently, connected to 4 channels of SFP optical ports, and each channel runs 10G data transmission and reception independently;

10G Ethernet PCS/PMA (10GBASE-R/KR) select BASE-R mode, 64bit bit width, as follows:
insert image description here
Virtex-7 FPGA Gen3 Integrated Block for PCI Express select 8GT/s and X8 mode of 3.0 standard, as follows:
insert image description here
Comprehensive compilation is completed The final FPGA resource consumption and power consumption estimates are as follows:
insert image description here

6. Board debugging and verification

insert image description here
On-board debugging needs to be carried out under the Linux system. It has been tested so far, but the speed is less than 10G, which is around 9.6G. Regarding the test method on the board, I will update it later. There is no time to write detailed test methods and steps. Stay tuned. . . Friends who are interested in the project source code can take the code to study first, the code is more complicated, and it may take half a month to understand it. . .

7. Welfare: acquisition of engineering code

Benefits: Obtaining the engineering code
The code is too large to be sent by email, and it is sent by a certain degree network disk link. The
method of data acquisition: private, or the V business card at the end of the article.
The network disk information is as follows:
insert image description here
insert image description here

Guess you like

Origin blog.csdn.net/qq_41667729/article/details/132006190