Table of contents
- 1 Introduction
- 2. Recommendation of relevant solutions
- 3. Design idea framework
-
- Video source selection
- IT6802 decoding chip configuration and collection
- Dynamic color bar
- Cross-clock FIFO
- Detailed explanation of image scaling module
- UDP protocol stack
- UDP video data packet
- UDP protocol stack data sending
- UDP protocol stack data buffer
- Modification of IP address and port number
- Introduction to Tri Mode Ethernet MAC and porting considerations
- 88E1518 PHY
- QT host computer and source code
- 4. Detailed explanation of vivado project
- 5. Project transplantation instructions
- 6. Board debugging, verification and demonstration
- 7. Benefits: Obtain project source code
1 Introduction
Even those who have never played with UDP protocol stack are embarrassed to say that they have played with FPGA. This is a sentence said by a CSDN boss, and I firmly believe it. . .
The UDP protocol stack is widely used in actual projects, especially in the medical and military industries. The image splicing solutions currently on the market mainly include the Video Mixer solution officially launched by Xilinx and the custom solution of hand-shredding the code; the Video Mixer officially launched by Xilinx The solution directly calls IP and can be implemented through SDK configuration. However, it is difficult to enable and requires high FPGA resources. It is not suitable for small-scale FPGA. It is very used on zynq and K7 and above platforms. If you are interested in Video If you are interested in the Mixer solution, you can refer to my previous blog. Blog address:
Click to go directly
This article uses Xilinx's Kintex7 FPGA based on the 88E1518 network PHY chip to implement Gigabit UDP video transmission (video is scaled and then transmitted). There are two video sources, corresponding to whether the developer has a camera in hand. One is to use the onboard HDMI input interface (laptop input simulates HDMI input source); the other is if you don’t have a camera in your hand, or your development board does not have an HDMI input interface, you can use the dynamic color bar generated internally in the code to simulate camera video. The video source is selected through the `define macro definition at the top level of the code. The HDMI input interface is selected as the video input source by default after power-on. After the FPGA collects the video, it first uses the image scaling module implemented in pure verilog to reduce the video, that is, from the input The 1920x1080 resolution is reduced to 1280x720. Because our QT host computer currently only supports 1280x720, scaling is required; use FDMA to cache the video into DDR3, then read the video out, and perform UDP on the video according to the communication protocol with the QT host computer. Data grouping, then use our UDP protocol stack to encapsulate the video with UDP data, then send the data to the Tri Mode Ethernet MAC IP, output to the 88E1518 network PHY on the development board, and then the UDP video passes through the RJ45 on the development board The network port is transmitted to the computer host via a network cable, and the computer uses the QT host computer we provide to collect images and display them; the FPGA engineering source code of the vivado2019.1 version and the QT host computer and its source code are provided;
This blog describes in detail the design of FPGA based on the 88E1518 network PHY chip to implement Gigabit UDP video transmission. The engineering code can be comprehensively compiled and debugged on the board, and can be directly transplanted into the project. It is suitable for school students and graduate student project development, and is also suitable for in-service applications. Engineers learn and improve, which can be applied to high-speed interfaces or image processing fields in medical, military and other industries;
complete and run-through engineering source code and technical support are provided;
the method of obtaining engineering source code and technical support is placed at the end of the article, please be patient See the end;
Version update instructions
This version is the 2nd version. Based on readers' suggestions, the following improvements and updates have been made to the 1st version of the project:
1: Added the option of inputting video dynamic color bars. Some readers said that they do not have an OV5640 camera in their hands, or the camera principle The picture is inconsistent with mine, which makes the transplantation process very difficult. Based on this, a dynamic color bar is added. It is generated internally by the FPGA and can be used without an external camera. The usage method is explained later. This routine is onboard It is an HDMI input interface. Friends who do not have this interface can choose to use dynamic color bars;
2: Optimized FDMA. The data read and write burst length of AXI4 in the previous FDMA was 256, resulting in insufficient bandwidth on low-end FPGAs and thus poor image quality. Not good. Based on this, the data read and write burst length of AXI4 in FDMA was changed to 128;
3: Optimized the code of the UDP protocol stack and its data buffer FIFO group, and added the code description of this part in the blog post;
4 : Added instructions for the use, update, and modification of the Tri Mode Ethernet MAC IP core, and placed them in the information package in the form of a separate document; 5: Optimized the
overall code structure, making the previously messy code clean and concise;
Disclaimer
This project and its source code include both parts written by myself and parts obtained from public channels on the Internet (including CSDN, Xilinx official website, Altera official website, etc.). If you feel offended, please send a private message to criticize and educate; based on this, this project The project and its source code are limited to readers or fans for personal study and research, and are prohibited from being used for commercial purposes. If legal issues arise due to commercial use by readers or fans themselves, this blog and the blogger have nothing to do with it, so please use it with caution. . .
2. Recommendation of relevant solutions
UDP video transport – no scaling
I have a UDP video transmission solution similar to this blog, but its input video is not scaled, but is directly cached and sent to the UDP protocol stack output. The blog link is as follows: Click directly to go
FPGA image scaling solution
The image scaling scheme used in this blog is the content of a blog post I published before. If you are interested in this image scaling part, you can refer to it. The blog link is as follows: Click directly to go
The Ethernet solution I already have here
At present, I have a large number of UDP protocol project source codes here, including UDP data loopback, video transmission, AD acquisition and transmission, etc. There are also TCP protocol projects, as well as RDMA NIC 10G 25G 100G network card project source code. Brothers who have needs for network communication can Go and have a look: Click directly to go.
The engineering blog of the Gigabit TCP protocol is as follows:
Click directly to go.
3. Design idea framework
The FPGA engineering design block diagram is as follows:
Video source selection
There are two types of video sources, corresponding to whether the developer has a camera in hand. One is to use the onboard HDMI input interface; the other is to use if you do not have a camera in hand, or your development board does not have an HDMI input interface. , you can use the dynamic color bar generated inside the code to simulate camera video. The video source is selected through the macro definition at the top level of the code. The HDMI input interface is selected as the video input source by default after power-on. The video source is selected through the `define macro at the top level of the code
. The definition is as follows:
The selection logic code part is as follows:
The selection logic is as follows:
when (comment) define COLOR_IN, the input source video is a dynamic color bar;
when (no comment) define COLOR_IN, the input source video is HDMI input;
IT6802 decoding chip configuration and collection
The IT6802 decoding chip requires i2c configuration before it can be used. Regarding the configuration and use of the IT6802 decoding chip, please refer to my previous blogs. Blog address: Click to go directly to
the IT6802 decoding chip configuration and acquisition. Both parts are implemented using the verilog code module. Code location As follows:
The code is configured as 1920x1080 resolution;
Dynamic color bar
The dynamic color bar can be configured for videos of different resolutions. The border width of the video, the size of the dynamic moving block, the moving speed, etc. can all be parameterized. Here I configure the resolution as 1920x1080, the code location of the dynamic color bar module and the top-level interface. An example is as follows:
Cross-clock FIFO
The function of cross-clock FIFO is to solve the problem of cross-clock domain. When the video is not scaled, there is no cross-clock domain problem of video, but this problem exists when the video is reduced or enlarged. Using FIFO buffer can make the image scaling module each time All that is read is valid input data. Note that the input timing of the original video has been disrupted here;
Detailed explanation of image scaling module
Because our QT host computer currently only supports 1280x720, we need to scale, that is, reduce the input resolution from 1920x1080 to 1280x720; use a laptop to simulate the HDMI video input source;
Design block diagram
This design integrates commonly used bilinear interpolation and neighborhood interpolation algorithms into one code, and selects a certain algorithm by inputting parameters; the code is implemented using pure verilog, without any IP, and can be transplanted arbitrarily between Xilinx, Intel, and domestic FPGAs. ; The code uses ram and fifo as the core to implement data caching and interpolation. The design architecture is as follows: The
video input timing requirements are as follows:
the input pixel data can only be changed when dInValid and nextDin are high at the same time;
the video output timing requirements are as follows:
the output pixel data is in It can be output only when dOutValid and nextdOut are high at the same time;
Code block diagram
The code is implemented using pure verilog, without any IP, and can be transplanted arbitrarily between Xilinx, Intel, and domestic FPGAs; there
are many ways to implement image scaling, the simplest is Xilinx’s HLS method, using the opencv library and a few lines of c++ language The code can be completed. For information about HLS implementation of image scaling, please refer to the article I wrote before. HLS implementation of image scaling
. There are other image scaling routine codes on the Internet, but most of them use IP, making it difficult to transplant on other FPGA devices, and the versatility is not good. Good; in comparison, this design code is universal; the code structure is as shown in the figure;
the top-level interface part is as follows:
Integration and selection of 2 interpolation algorithms
This design integrates commonly used bilinear interpolation and neighborhood interpolation algorithms into one code, and selects a certain algorithm through input parameters; the
specific selection parameters are as follows:
input wire i_scaler_type //0-->bilinear;1-->neighbor
You can select by entering the value of i_scaler_type;
Enter 0 to select the bilinear interpolation algorithm;
enter 1 to select the neighborhood interpolation algorithm;
Regarding the mathematical differences between these two algorithms, please refer to my previous article HLS implements image scaling
UDP protocol stack
This UDP protocol stack solution needs to be used with Xilinx's Tri Mode Ethernet MAC three-speed network IP. The UDP protocol stack netlist file is used. Although the source code cannot be seen, UDP communication can be realized normally. The protocol stack is not currently open source and only provides network table file, but does not affect use. The protocol stack has a user interface, so that users do not need to care about the complex UDP protocol but only need to care about the simple user interface timing to operate UDP transceiver. It is very simple; the protocol stack architecture is as follows: Protocol
stack
performance The performance is as follows:
1: Supports UDP receive checksum verification function, but does not support UDP send checksum generation;
2: Supports IP header checksum generation and verification, and supports the PING function in the ICMP protocol, which can receive and Respond to PING requests from devices within the same subnet;
3: Can automatically initiate or respond to ARP requests from devices within the same subnet, and ARP transmission and reception are fully adaptive. The ARP table can save 256 IP and MAC address pairs within the same subnet;
4: Supports ARP timeout mechanism, which can detect whether the destination IP address of the required data packet is reachable;
5: The protocol stack transmission bandwidth utilization can reach 93% , under high sending bandwidth, the internal arbitration mechanism ensures that the PING and ARP functions are not affected in any way;
6: The sending process will not cause packet loss;
7: Provides a 64-bit wide AXI4-Stream MAC interface, which can be used with Xilinx official Gigabit The Ethernet IP core Tri Mode Ethernet MAC is used together with the 10 Gigabit Ethernet IP core 10 Gigabit Ethernet Subsystem and 10 Gigabit Ethernet MAC;
with this protocol stack, we do not need to care about the implementation of the complex UDP protocol and can directly call the interface use. . .
The sending timing of this UDP protocol stack user interface is as follows:
The receiving timing of this UDP protocol stack user interface is as follows:
UDP video data packet
To realize the packaging of UDP video data, UDP data sending must be consistent with the receiving program of the QT host computer. The UDP frame format defined by the host computer includes the frame header UDP data. The frame header is defined as follows: The UDP data packaging code on the FPGA side must be consistent with the above
. The data frame format of the figure corresponds, otherwise QT cannot parse it. The data packet state machine and data frame are defined in the code, as follows: In addition, since UDP
transmission is 64-bit data width, and image pixel data is 24-bit width, it must Reassembling UDP data to ensure alignment of pixel data is the difficulty of the entire project, and is also the difficulty of UDP data transmission for all FPGAs;
UDP protocol stack data sending
The UDP protocol stack has sending and receiving functions, but only sending is used here. The code structure of this part is as follows:
I have already prepared the UDP protocol stack code group, and users can use it directly;
UDP protocol stack data buffer
Here is the following explanation of the data buffer FIFO group used in the code:
Since the AXI-Stream data interface bit width of the UDP IP protocol stack is 64 bits, the AXI-Stream data interface bit width of the Tri Mode Ethernet MAC is 8 bits. Therefore, to interconnect the UDP IP protocol stack and Tri Mode Ethernet MAC through the AXI-Stream interface, clock domain and data bit width conversion is required. The implementation plan is shown in the figure below:
The transceiver path (this design only uses transmission) uses two AXI-Stream DATA FIFOs. One FIFO realizes the conversion of asynchronous clock domain, and the other FIFO
realizes mode function; since the AXI-Stream data interface synchronization clock signal of the Tri Mode Ethernet MAC at Gigabit rate is 125MHz, at this time, the AXI-Stream data interface synchronization clock signal of the UDP protocol stack 64bit should be 125MHz/(64/8)= 15.625MHz, therefore,
the clocks at both ends of the asynchronous AXI-Stream DATA FIFO are 125MHz (8bit) and 15.625MHz (64bit) respectively; after the AXI-Stream interface of the UDP IP protocol stack undergoes FIFO clock domain conversion, the data bit width needs to be Conversion, the conversion of data bit width is completed through AXI4-Stream Data Width Converter. In the receiving path, the conversion is performed from 8bit to 64bit; in the transmitting path, the conversion from 64bit to 8bit is performed;
Modification of IP address and port number
The UDP protocol stack sets aside IP address and port number modification ports for users to freely modify. The locations are as follows:
Introduction to Tri Mode Ethernet MAC and porting considerations
This design calls the official IP of Xilinx: Tri Mode Ethernet MAC. Its position in the code is as follows: You
can see that the Tri Mode Ethernet MAC IP core is in a locked state. This is intentional. The purpose is to extend the IP according to different PHY Modify its internal code and internal timing constraint code based on time parameters. Since the network PHY used in this design is 88E1518, here we focus on the modification and transplantation of Tri Mode Ethernet MAC when using 88E1518. When you need to transplant the project, or your vivado When the version is inconsistent with mine, the Tri Mode Ethernet MAC needs to be upgraded in vivado. However, since the IP has been artificially locked by us, upgrading and modification require some high-end operations. Regarding the operation method, I wrote a special document. It is attached to the information package as follows:
88E1518 PHY
The network PHY used in this design development board is 88E1518, which works in delay mode. The schematic diagram leads to MDIO, but there is no need for MDIO configuration in the code. 88E1518 can be made to work in delay mode by pulling up and down resistors. The PHY supports up to Gigabit, and can automatically negotiate between 10M/100M/1000M, but this design is fixed at 1000M on the Tri Mode Ethernet MAC side; in the data package, we provide the schematic diagram of 88E1518;
QT host computer and source code
We provide QT screenshot display host computer and its source code that match the UDP communication protocol. The directory is as follows:
Our QT currently only supports video snapshot display with 1280x720 resolution, but also reserves a 1080P interface. For QT development, Interested friends can try to modify the code to adapt to 1080P, because QT is just a verification tool here and is not the focus of this project, so I won’t go into details. For details, please refer to the QT source code of the reference package, the location is as follows:
4. Detailed explanation of vivado project
Development board FPGA model: Xilinx–Kintex7–xc7k325tffg676-2;
Development environment: Vivado2019.1;
Input: HDMI or dynamic color bar, resolution 1920x1080;
Output: Gigabit UDP protocol stack, 88E1518 PHY, RJ45 network port;
Engineering role: Gigabit UDP network video transmission;
the project BD is as follows:
the project code structure is as follows:
the resource consumption and power consumption of the project are as follows:
5. Project transplantation instructions
Vivado version inconsistency handling
1: If your vivado version is consistent with the vivado version of this project, open the project directly;
2: If your vivado version is lower than the vivado version of this project, you need to open the project and click File –> Save As; but this method does not It is not safe. The safest way is to upgrade your vivado version to the vivado version of this project or a higher version;
3: If your vivado version is higher than the vivado version of this project, the solution is as follows:
After opening the project, you will find that the IP has been It is locked, as follows:
At this time, the IP needs to be upgraded, and the operation is as follows:
FPGA model inconsistency handling
If your FPGA model is inconsistent with mine, you need to change the FPGA model. The operation is as follows:
After changing the FPGA model, you also need to upgrade the IP. The method of upgrading the IP has been described previously;
Other things to note
1: Since the DDR of each board is not necessarily exactly the same, the MIG IP needs to be configured according to your own schematic. You can even directly delete the MIG of my original project and re-add the IP and reconfigure it; 2: According to your
own To modify the pin constraints of the schematic diagram, just modify it in the xdc file;
3: When transplanting pure FPGA to Zynq, you need to add the zynq soft core to the project;
6. Board debugging, verification and demonstration
Preparation
First connect the development board and the computer. After the development board is connected, it is as shown below:
Then change the IP address of your computer to be consistent with the IP specified in the code. Of course, the IP in the code can be set arbitrarily, but the IP in the code can be modified. Finally, the IP on the computer must also be changed accordingly. My settings are as follows:
ping
Before starting the test, we first ping to test whether UDP is connected, as follows:
static presentation
After the HDMI input 1920x1080 is reduced to 1280x720, the UDP network transmission QT host computer displays as follows:
After the dynamic color bar 1920x1080 is reduced to 1280x720, the UDP network transmission QT host computer displays as follows:
Dynamic presentation
The dynamic video demonstration is as follows:
FPGA-UDP-Video scaling transmission-K7-16 to 9
7. Benefits: Obtain project source code
Bonus: Acquisition of engineering code.
The code is too large to be sent by email. It will be sent via a certain network disk link.
The information acquisition method is: private, or the V business card at the end of the article.
The network disk information is as follows: