FPGA image scaling Gigabit network UDP network video transmission, implemented based on 88E1518 PHY, providing engineering and QT host computer source code plus technical support

1 Introduction

Even those who have never played with UDP protocol stack are embarrassed to say that they have played with FPGA. This is a sentence said by a CSDN boss, and I firmly believe it. . .
The UDP protocol stack is widely used in actual projects, especially in the medical and military industries. The image splicing solutions currently on the market mainly include the Video Mixer solution officially launched by Xilinx and the custom solution of hand-shredding the code; the Video Mixer officially launched by Xilinx The solution directly calls IP and can be implemented through SDK configuration. However, it is difficult to enable and requires high FPGA resources. It is not suitable for small-scale FPGA. It is very used on zynq and K7 and above platforms. If you are interested in Video If you are interested in the Mixer solution, you can refer to my previous blog. Blog address:
Click to go directly

This article uses Xilinx's Kintex7 FPGA based on the 88E1518 network PHY chip to implement Gigabit UDP video transmission (video is scaled and then transmitted). There are two video sources, corresponding to whether the developer has a camera in hand. One is to use the onboard HDMI input interface (laptop input simulates HDMI input source); the other is if you don’t have a camera in your hand, or your development board does not have an HDMI input interface, you can use the dynamic color bar generated internally in the code to simulate camera video. The video source is selected through the `define macro definition at the top level of the code. The HDMI input interface is selected as the video input source by default after power-on. After the FPGA collects the video, it first uses the image scaling module implemented in pure verilog to reduce the video, that is, from the input The 1920x1080 resolution is reduced to 1280x720. Because our QT host computer currently only supports 1280x720, scaling is required; use FDMA to cache the video into DDR3, then read the video out, and perform UDP on the video according to the communication protocol with the QT host computer. Data grouping, then use our UDP protocol stack to encapsulate the video with UDP data, then send the data to the Tri Mode Ethernet MAC IP, output to the 88E1518 network PHY on the development board, and then the UDP video passes through the RJ45 on the development board The network port is transmitted to the computer host via a network cable, and the computer uses the QT host computer we provide to collect images and display them; the FPGA engineering source code of the vivado2019.1 version and the QT host computer and its source code are provided;

This blog describes in detail the design of FPGA based on the 88E1518 network PHY chip to implement Gigabit UDP video transmission. The engineering code can be comprehensively compiled and debugged on the board, and can be directly transplanted into the project. It is suitable for school students and graduate student project development, and is also suitable for in-service applications. Engineers learn and improve, which can be applied to high-speed interfaces or image processing fields in medical, military and other industries;
complete and run-through engineering source code and technical support are provided;
the method of obtaining engineering source code and technical support is placed at the end of the article, please be patient See the end;

Version update instructions

This version is the 2nd version. Based on readers' suggestions, the following improvements and updates have been made to the 1st version of the project:
1: Added the option of inputting video dynamic color bars. Some readers said that they do not have an OV5640 camera in their hands, or the camera principle The picture is inconsistent with mine, which makes the transplantation process very difficult. Based on this, a dynamic color bar is added. It is generated internally by the FPGA and can be used without an external camera. The usage method is explained later. This routine is onboard It is an HDMI input interface. Friends who do not have this interface can choose to use dynamic color bars;
2: Optimized FDMA. The data read and write burst length of AXI4 in the previous FDMA was 256, resulting in insufficient bandwidth on low-end FPGAs and thus poor image quality. Not good. Based on this, the data read and write burst length of AXI4 in FDMA was changed to 128;
3: Optimized the code of the UDP protocol stack and its data buffer FIFO group, and added the code description of this part in the blog post;
4 : Added instructions for the use, update, and modification of the Tri Mode Ethernet MAC IP core, and placed them in the information package in the form of a separate document; 5: Optimized the
overall code structure, making the previously messy code clean and concise;

Disclaimer

This project and its source code include both parts written by myself and parts obtained from public channels on the Internet (including CSDN, Xilinx official website, Altera official website, etc.). If you feel offended, please send a private message to criticize and educate; based on this, this project The project and its source code are limited to readers or fans for personal study and research, and are prohibited from being used for commercial purposes. If legal issues arise due to commercial use by readers or fans themselves, this blog and the blogger have nothing to do with it, so please use it with caution. . .

2. Recommendation of relevant solutions

UDP video transport – no scaling

I have a UDP video transmission solution similar to this blog, but its input video is not scaled, but is directly cached and sent to the UDP protocol stack output. The blog link is as follows: Click directly to go

FPGA image scaling solution

The image scaling scheme used in this blog is the content of a blog post I published before. If you are interested in this image scaling part, you can refer to it. The blog link is as follows: Click directly to go

The Ethernet solution I already have here

At present, I have a large number of UDP protocol project source codes here, including UDP data loopback, video transmission, AD acquisition and transmission, etc. There are also TCP protocol projects, as well as RDMA NIC 10G 25G 100G network card project source code. Brothers who have needs for network communication can Go and have a look: Click directly to go.
The engineering blog of the Gigabit TCP protocol is as follows:
Click directly to go.

3. Design idea framework

The FPGA engineering design block diagram is as follows:
Insert image description here

Video source selection

There are two types of video sources, corresponding to whether the developer has a camera in hand. One is to use the onboard HDMI input interface; the other is to use if you do not have a camera in hand, or your development board does not have an HDMI input interface. , you can use the dynamic color bar generated inside the code to simulate camera video. The video source is selected through the macro definition at the top level of the code. The HDMI input interface is selected as the video input source by default after power-on. The video source is selected through the `define macro at the top level of the code
. The definition is as follows:
Insert image description here
The selection logic code part is as follows:
Insert image description here
The selection logic is as follows:
when (comment) define COLOR_IN, the input source video is a dynamic color bar;
when (no comment) define COLOR_IN, the input source video is HDMI input;

IT6802 decoding chip configuration and collection

The IT6802 decoding chip requires i2c configuration before it can be used. Regarding the configuration and use of the IT6802 decoding chip, please refer to my previous blogs. Blog address: Click to go directly to
the IT6802 decoding chip configuration and acquisition. Both parts are implemented using the verilog code module. Code location As follows:
Insert image description here
The code is configured as 1920x1080 resolution;

Dynamic color bar

The dynamic color bar can be configured for videos of different resolutions. The border width of the video, the size of the dynamic moving block, the moving speed, etc. can all be parameterized. Here I configure the resolution as 1920x1080, the code location of the dynamic color bar module and the top-level interface. An example is as follows:
Insert image description here
Insert image description here

Cross-clock FIFO

The function of cross-clock FIFO is to solve the problem of cross-clock domain. When the video is not scaled, there is no cross-clock domain problem of video, but this problem exists when the video is reduced or enlarged. Using FIFO buffer can make the image scaling module each time All that is read is valid input data. Note that the input timing of the original video has been disrupted here;

Detailed explanation of image scaling module

Because our QT host computer currently only supports 1280x720, we need to scale, that is, reduce the input resolution from 1920x1080 to 1280x720; use a laptop to simulate the HDMI video input source;

Design block diagram

This design integrates commonly used bilinear interpolation and neighborhood interpolation algorithms into one code, and selects a certain algorithm by inputting parameters; the code is implemented using pure verilog, without any IP, and can be transplanted arbitrarily between Xilinx, Intel, and domestic FPGAs. ; The code uses ram and fifo as the core to implement data caching and interpolation. The design architecture is as follows: The
Insert image description here
video input timing requirements are as follows:
Insert image description here
the input pixel data can only be changed when dInValid and nextDin are high at the same time;
the video output timing requirements are as follows:
Insert image description here
the output pixel data is in It can be output only when dOutValid and nextdOut are high at the same time;

Code block diagram

The code is implemented using pure verilog, without any IP, and can be transplanted arbitrarily between Xilinx, Intel, and domestic FPGAs; there
are many ways to implement image scaling, the simplest is Xilinx’s HLS method, using the opencv library and a few lines of c++ language The code can be completed. For information about HLS implementation of image scaling, please refer to the article I wrote before. HLS implementation of image scaling
. There are other image scaling routine codes on the Internet, but most of them use IP, making it difficult to transplant on other FPGA devices, and the versatility is not good. Good; in comparison, this design code is universal; the code structure is as shown in the figure;
Insert image description here
the top-level interface part is as follows:
Insert image description here

Integration and selection of 2 interpolation algorithms

This design integrates commonly used bilinear interpolation and neighborhood interpolation algorithms into one code, and selects a certain algorithm through input parameters; the
specific selection parameters are as follows:

input  wire i_scaler_type //0-->bilinear;1-->neighbor

You can select by entering the value of i_scaler_type;

Enter 0 to select the bilinear interpolation algorithm;
enter 1 to select the neighborhood interpolation algorithm;

Regarding the mathematical differences between these two algorithms, please refer to my previous article HLS implements image scaling

UDP protocol stack

This UDP protocol stack solution needs to be used with Xilinx's Tri Mode Ethernet MAC three-speed network IP. The UDP protocol stack netlist file is used. Although the source code cannot be seen, UDP communication can be realized normally. The protocol stack is not currently open source and only provides network table file, but does not affect use. The protocol stack has a user interface, so that users do not need to care about the complex UDP protocol but only need to care about the simple user interface timing to operate UDP transceiver. It is very simple; the protocol stack architecture is as follows: Protocol
stack
Insert image description here
performance The performance is as follows:
1: Supports UDP receive checksum verification function, but does not support UDP send checksum generation;
2: Supports IP header checksum generation and verification, and supports the PING function in the ICMP protocol, which can receive and Respond to PING requests from devices within the same subnet;
3: Can automatically initiate or respond to ARP requests from devices within the same subnet, and ARP transmission and reception are fully adaptive. The ARP table can save 256 IP and MAC address pairs within the same subnet;
4: Supports ARP timeout mechanism, which can detect whether the destination IP address of the required data packet is reachable;
5: The protocol stack transmission bandwidth utilization can reach 93% , under high sending bandwidth, the internal arbitration mechanism ensures that the PING and ARP functions are not affected in any way;
6: The sending process will not cause packet loss;
7: Provides a 64-bit wide AXI4-Stream MAC interface, which can be used with Xilinx official Gigabit The Ethernet IP core Tri Mode Ethernet MAC is used together with the 10 Gigabit Ethernet IP core 10 Gigabit Ethernet Subsystem and 10 Gigabit Ethernet MAC;
with this protocol stack, we do not need to care about the implementation of the complex UDP protocol and can directly call the interface use. . .
The sending timing of this UDP protocol stack user interface is as follows:
Insert image description here
The receiving timing of this UDP protocol stack user interface is as follows:
Insert image description here

UDP video data packet

To realize the packaging of UDP video data, UDP data sending must be consistent with the receiving program of the QT host computer. The UDP frame format defined by the host computer includes the frame header UDP data. The frame header is defined as follows: The UDP data packaging code on the FPGA side must be consistent with the above
Insert image description here
. The data frame format of the figure corresponds, otherwise QT cannot parse it. The data packet state machine and data frame are defined in the code, as follows: In addition, since UDP
Insert image description here
transmission is 64-bit data width, and image pixel data is 24-bit width, it must Reassembling UDP data to ensure alignment of pixel data is the difficulty of the entire project, and is also the difficulty of UDP data transmission for all FPGAs;

UDP protocol stack data sending

The UDP protocol stack has sending and receiving functions, but only sending is used here. The code structure of this part is as follows:
Insert image description here
I have already prepared the UDP protocol stack code group, and users can use it directly;

UDP protocol stack data buffer

Here is the following explanation of the data buffer FIFO group used in the code:
Since the AXI-Stream data interface bit width of the UDP IP protocol stack is 64 bits, the AXI-Stream data interface bit width of the Tri Mode Ethernet MAC is 8 bits. Therefore, to interconnect the UDP IP protocol stack and Tri Mode Ethernet MAC through the AXI-Stream interface, clock domain and data bit width conversion is required. The implementation plan is shown in the figure below:
Insert image description here
The transceiver path (this design only uses transmission) uses two AXI-Stream DATA FIFOs. One FIFO realizes the conversion of asynchronous clock domain, and the other FIFO
realizes mode function; since the AXI-Stream data interface synchronization clock signal of the Tri Mode Ethernet MAC at Gigabit rate is 125MHz, at this time, the AXI-Stream data interface synchronization clock signal of the UDP protocol stack 64bit should be 125MHz/(64/8)= 15.625MHz, therefore,
the clocks at both ends of the asynchronous AXI-Stream DATA FIFO are 125MHz (8bit) and 15.625MHz (64bit) respectively; after the AXI-Stream interface of the UDP IP protocol stack undergoes FIFO clock domain conversion, the data bit width needs to be Conversion, the conversion of data bit width is completed through AXI4-Stream Data Width Converter. In the receiving path, the conversion is performed from 8bit to 64bit; in the transmitting path, the conversion from 64bit to 8bit is performed;

Modification of IP address and port number

The UDP protocol stack sets aside IP address and port number modification ports for users to freely modify. The locations are as follows:
Insert image description here

Introduction to Tri Mode Ethernet MAC and porting considerations

This design calls the official IP of Xilinx: Tri Mode Ethernet MAC. Its position in the code is as follows: You
Insert image description here
can see that the Tri Mode Ethernet MAC IP core is in a locked state. This is intentional. The purpose is to extend the IP according to different PHY Modify its internal code and internal timing constraint code based on time parameters. Since the network PHY used in this design is 88E1518, here we focus on the modification and transplantation of Tri Mode Ethernet MAC when using 88E1518. When you need to transplant the project, or your vivado When the version is inconsistent with mine, the Tri Mode Ethernet MAC needs to be upgraded in vivado. However, since the IP has been artificially locked by us, upgrading and modification require some high-end operations. Regarding the operation method, I wrote a special document. It is attached to the information package as follows:
Insert image description here

88E1518 PHY

The network PHY used in this design development board is 88E1518, which works in delay mode. The schematic diagram leads to MDIO, but there is no need for MDIO configuration in the code. 88E1518 can be made to work in delay mode by pulling up and down resistors. The PHY supports up to Gigabit, and can automatically negotiate between 10M/100M/1000M, but this design is fixed at 1000M on the Tri Mode Ethernet MAC side; in the data package, we provide the schematic diagram of 88E1518;
Insert image description here

QT host computer and source code

We provide QT screenshot display host computer and its source code that match the UDP communication protocol. The directory is as follows:
Insert image description here
Our QT currently only supports video snapshot display with 1280x720 resolution, but also reserves a 1080P interface. For QT development, Interested friends can try to modify the code to adapt to 1080P, because QT is just a verification tool here and is not the focus of this project, so I won’t go into details. For details, please refer to the QT source code of the reference package, the location is as follows:
Insert image description here

4. Detailed explanation of vivado project

Development board FPGA model: Xilinx–Kintex7–xc7k325tffg676-2;
Development environment: Vivado2019.1;
Input: HDMI or dynamic color bar, resolution 1920x1080;
Output: Gigabit UDP protocol stack, 88E1518 PHY, RJ45 network port;
Engineering role: Gigabit UDP network video transmission;
the project BD is as follows:
Insert image description here
the project code structure is as follows:
Insert image description here
the resource consumption and power consumption of the project are as follows:
Insert image description here

5. Project transplantation instructions

Vivado version inconsistency handling

1: If your vivado version is consistent with the vivado version of this project, open the project directly;
2: If your vivado version is lower than the vivado version of this project, you need to open the project and click File –> Save As; but this method does not It is not safe. The safest way is to upgrade your vivado version to the vivado version of this project or a higher version;
Insert image description here
3: If your vivado version is higher than the vivado version of this project, the solution is as follows:
Insert image description here
After opening the project, you will find that the IP has been It is locked, as follows:
Insert image description here
At this time, the IP needs to be upgraded, and the operation is as follows:
Insert image description here
Insert image description here

FPGA model inconsistency handling

If your FPGA model is inconsistent with mine, you need to change the FPGA model. The operation is as follows:
Insert image description here
Insert image description here
Insert image description here
After changing the FPGA model, you also need to upgrade the IP. The method of upgrading the IP has been described previously;

Other things to note

1: Since the DDR of each board is not necessarily exactly the same, the MIG IP needs to be configured according to your own schematic. You can even directly delete the MIG of my original project and re-add the IP and reconfigure it; 2: According to your
own To modify the pin constraints of the schematic diagram, just modify it in the xdc file;
3: When transplanting pure FPGA to Zynq, you need to add the zynq soft core to the project;

6. Board debugging, verification and demonstration

Preparation

First connect the development board and the computer. After the development board is connected, it is as shown below:
Insert image description here
Then change the IP address of your computer to be consistent with the IP specified in the code. Of course, the IP in the code can be set arbitrarily, but the IP in the code can be modified. Finally, the IP on the computer must also be changed accordingly. My settings are as follows:
Insert image description here

ping

Before starting the test, we first ping to test whether UDP is connected, as follows:
Insert image description here

static presentation

After the HDMI input 1920x1080 is reduced to 1280x720, the UDP network transmission QT host computer displays as follows:
Insert image description here
After the dynamic color bar 1920x1080 is reduced to 1280x720, the UDP network transmission QT host computer displays as follows:
Insert image description here

Dynamic presentation

The dynamic video demonstration is as follows:

FPGA-UDP-Video scaling transmission-K7-16 to 9

7. Benefits: Obtain project source code

Bonus: Acquisition of engineering code.
The code is too large to be sent by email. It will be sent via a certain network disk link.
The information acquisition method is: private, or the V business card at the end of the article.
The network disk information is as follows:
Insert image description here
Insert image description here

Guess you like

Origin blog.csdn.net/qq_41667729/article/details/133071169
Recommended