FPGA realizes 10G 10G network UDP communication 10G Ethernet Subsystem replaces network PHY chip Provides engineering source code and technical support

1 Introduction

At present, the basic ecology of udp implemented by fpga on the Internet is as follows:
1: udp transceiver written in verilog, but without ping function, such code function can be used normally, but without ping function, it is basically waste, and will not be used in actual projects Such a code, just imagine, multi-machine interconnection, if there is a problem, your network card does not have a ping function, and you don’t even have a basic troubleshooting mechanism. Who would dare to use such a code?
2: The udp transceiver with ping function, the code is excellent and easy to use, but it is basically not open source, and the source code will not be provided to you. This kind of code also has shortcomings, that is, if there is a problem, I don’t know how to troubleshoot it. ;
3: Using Xilinx's Tri Mode Ethernet MAC triple-speed network IP implementation, this code is also very good, but still the same problem, there is no source code, and the triple-speed network IP needs a license, the triple-speed network IP has realized rgmii to gmii and then Conversion to axis;
4: Use FPGA’s GTX resource to use SFP optical port to realize UDP and communication. This kind of solution can be completed without external network transformer. This solution is such a design;

This design uses the UDP protocol stack to realize the MAC layer design of UDP communication, and calls Xilinx's official 10G Ethernet Subsystem IP core to realize the function of the network transformer, so as to realize the solution of UDP communication without external network chip, and easily realize the popular 10G 10 Gigabit network communication; the top-level code sets a user button. When the button is not pressed, the engineering department performs a UDP data loopback test. After pressing the user button, the network speed test can be performed; the UDP protocol stack used in this routine is currently It is not open source, only the IP core is mentioned, but it does not affect the use. The protocol stack has a user interface, so that users do not need to care about the complicated UDP protocol and only need to care about the simple user interface timing to operate UDP sending and receiving, which is very simple; this design Realize the loopback sending and receiving of UDP data through a fifo, and use the network debugging assistant on the computer side to verify UDP sending and receiving;

This design is connected to 1 SFP optical port and configured as a UDP server. This design is stable and reliable after a large number of repeated tests, and can be directly transplanted and used in the project. The project code can be comprehensively compiled and debugged on the board, and can be directly project transplanted. It is suitable for school students , Postgraduate project development, also suitable for on-the-job engineers to do project development, can be applied to the digital communication field of medical, military and other industries; provide
complete and smooth engineering source code and technical support;
the acquisition method of engineering source code and technical support is placed in At the end of the article, please be patient until the end;

2. The UDP scheme I have here

At present, I have the following UDP solutions and application examples:
My blog home page has a FPGA Ethernet communication column, which is free, and there are many UDP applications implemented by FPGA, including conventional gigabit network and 10 gigabit network solutions , brothers who have needs for network communication can go and have a look: click directly to go

3. Detailed design plan

Traditional FPGA UDP solution

Before talking about the design scheme, let’s take a look at what conditions the FPGA should have to implement the UDP communication scheme, which is roughly as follows:
insert image description here
1: User logic:
The actual data that the developer needs to send and receive can exist in various forms, such as custom format, AXIS The data flow format and so on, the interface timing of the user logic must be consistent with the interface timing of the MAC layer;
2: The MAC layer
is mainly composed of specific protocol logic such as UDP, IP, ARP, ICMP, etc., to realize the grouping and unpacking of network data, It is equivalent to doing what Sockte does in the software. Sockte relies on the CPU to make network data packets, and the MAC layer here directly uses hardware resources to make network data packets, which liberates the enslavement of network data packets to the CPU. In today's fashionable RDMA perfectly reflected in. . . The MAC layer of this design uses Milink’s UDP protocol stack. For this part, please refer to my previous article
3: The network transformer
is mainly composed of PCS/PMA. PCS mainly realizes the encoding and decoding of parallel data, such as the classic 8b/ 10 encoding and decoding, PMA mainly realizes parallel/serial/serial parallel conversion, the output interface is a high-speed differential signal, which can be directly connected to SFP or RG45 network port; 4: RJ45 network port:
commonly known as crystal head, plugged into the network cable. . .
5: remote node

This FPGA 10G UDP solution (great)

This FPGA development board can be understood as a network card, and the remote node is another network card connected to it;
the difference between this design and the above-mentioned traditional FPGA implementation UDP solution is the network transformer part, the front network transformer is a real network PHY chip, For example, I often use RTL8211, B50610, 88E1518, etc.; this design does not use a network transformer, but calls the Xilinx official 10G Ethernet Subsystem IP core to realize the function of a network transformer, and realizes the connection with the remote node through the output of the SFP optical port. Connection, the design block diagram is as follows:
insert image description here
This design uses the network debugging assistant on the computer side to communicate with the development board to realize the UDP data loopback test. This design does not use an external network transformer, but calls Xilinx’s official 10G Ethernet Subsystem IP core. The output form is completed.

Top-level file udp_test.v:
For the instantiation of udp_ip, set the local ip address, that is, the IP address of the development board is 192.168.10.1, the local MAC address is 000a3501fec0, the destination IP address is set to 192.168.10.2, and the source port is 61441, the destination port is 61441. The user interface is divided into receiving and sending, all of which are AXI4 stream interfaces, and the same MAC interface is also an AXI4 stream interface, which is connected with 10G MAC.
The IP address and port number can be modified freely. The location is as follows: udp_test.v
insert image description here
UDP loopback module:
The file of the UDP loopback module is udp_read_write_ctrl.v. The upper computer sends data to the development board, and the development board sends back the received data; the other is the speed measurement mode, and the development board continuously sends data packets to the 10G device; the two modes are switched by pressing the button.

MAC layer:
use custom IP to implement, that is, UDP protocol stack, which has dynamic ARP, ping and other functions, and can realize UDP sending and receiving;

Here we will focus on Xilinx's official 10G Ethernet Subsystem IP core;
pay attention! ! !
The 10G Ethernet Subsystem IP core is only available in K7 and above FPGAs;

10G Ethernet block diagram

The above picture is the functional block diagram of 10G MAC, you can check the pg157 document. Among them, the sending part and the receiving part are both AXI stream interfaces, and the data width is 64 bits. There is also an AXI4-Lite interface for configuring MAC registers, etc. The focus of our attention is mainly on the data transfer interface, which is the sending and receiving part.
insert image description here

10G Ethernet sending analysis

insert image description here
It can also be seen from the above table that the AXIS interface can be configured as 64-bit or 32-bit, which is selected when configuring the IP core. According to different data widths, the reference clock frequency is also different. It can also be seen in the document that the 64-bit clock is 156.25MHz and the 32-bit clock is 312.5MHz. In this experiment we use 64-bit data, and its bandwidth is 156.25M 64bit=10Gbps.
Since 10GBASE-R is 64b/66b encoded, the speed of the transceiver is 10Gbps
66/64 = 10.3125Gbps.

The following is the timing of the 64-bit sender in normal mode, where DA is the destination mac address, that is, the destination MAC address; SA is the source mac address, that is, the source MAC address; L/T is the length/type information, that is, the length or type; D is data information. It can be seen that there is no CRC check in normal mode, and this part is completed by MAC.
insert image description here
The length of the data part of a packet is required to be 46~1500 bytes, which is the D part. If it is less than 46 bytes, the MAC will automatically insert the data to 46 bytes. But if MAC IP is configured with FCS part, that is, CRC check part, the user needs to ensure that the length of the data part meets the requirements, otherwise it will be regarded as Bad Frame by MAC IP. In this section of the experiment, the normal mode is used without adding the FCS part.

10G Ethernet receiving analysis

insert image description here
The biggest difference between the receiving part and the sending part is that there is no tready signal, that is, the user does not need to provide a ready signal, but keeps receiving data. At the same time, tuser indicates whether the received package is correct, and the user can judge the correctness of the package according to this signal.
The following is the timing diagram of 64-bit receiving the correct package. It can be seen that when tlast is valid, tuser is valid at the same time, indicating that the package is correct.
insert image description here
The following is the case of receiving error packets, when tlast is valid, tuser is low level.
insert image description here
An error packet will appear in the following situations:
1: FCS finds an error
2: The length of the data packet is less than 64 bytes, which is the length of DA(6)+SA(6)+L/T(2)+D+FCS(4) , where the length of D must be greater than
or equal to 46 bytes.
3: When Jumbo Frames are not enabled, Jumbo Frames are received
4: The length of the data packet is greater than the MTU requirement
There are other situations where errors may occur, which are not listed here. For details, please refer to the pg157 document.
Similarly, the receiving part also has the configuration of whether to insert FCS. In this experiment, the default normal mode is adopted, and FCS is not inserted.

10G Ethernet register configuration

A lot of registers are also introduced in the document. These registers are configured through the AXI4-Lite bus. The configuration source code is given in the project. axi_10g_ethernet_0_axi_lite_sm.v is obtained from the example project of 10g mac. No need to change, just use it directly, please refer to the pg157 document for specific register description.
insert image description here

10G Ethernet UI Configuration

The IP configuration is as follows:
insert image description here
insert image description here
insert image description here
insert image description here
insert image description here

4. Detailed explanation of vivado project

Development board FPGA model: Xilinx xc7k325tffg900-2;
development environment: vivado2019.1;
input/output: SFP optical port;
network card speed: 10G; test items
: UDP data loopback, ping, etc .; Resource consumption and power consumption estimates are as follows:

insert image description here

insert image description here

5. Board debugging verification and demonstration

This experiment requires the user to prepare a 10G network card and insert it into the computer motherboard. The picture below shows the second-hand goods I bought on a certain fish. . .
insert image description here
The connection method is as follows:
insert image description here
the SFP connection on the development board side is as follows:
insert image description here

ping function test

After the board is powered on and downloaded bits, first test the ping function, as follows:
insert image description here
a single ping is not enough, directly perform continuous ping, as follows:
insert image description here

Data sending and receiving test

Then use the network debugging assistant to perform the data sending and receiving test. After the network card is successfully connected, the network debugging assistant will receive the test string sent by the FPGA, and send it once every 1s, as follows: Then send a large amount of data for testing. The test results are as follows
insert image description here
:
insert image description here

10G internet speed test

Be careful with the speed measurement function! ! ! It will cause the computer to freeze severely! ! !
Be careful with the speed measurement function! ! ! It will cause the computer to freeze severely! ! !
Be careful with the speed measurement function! ! ! It will cause the computer to freeze severely! ! !
You can open the task manager first, and check the Ethernet transmission speed in the performance.
insert image description here
Press the user button bound in the top layer, and the Ethernet speed will change.
insert image description here
Observe the sending rate of the Ethernet port of the development board on the computer. This speed measurement only represents the highest possible speed, and does not represent the real speed of the computer without packet loss. The point-to-point non-packet loss speed of UDP and the speed of the network card and CPU of the computer , memory speed, and operating system are all related. . .
If the computer freezes severely, please press the button again in time to switch to Loopback mode! ! !

6. Welfare: acquisition of engineering code

Benefits: Obtaining the engineering code
The code is too large to be sent by email, and it is sent by a certain degree network disk link. The
method of data acquisition: private, or the V business card at the end of the article.
The network disk information is as follows:
insert image description here

Guess you like

Origin blog.csdn.net/qq_41667729/article/details/130372967