FPGA high-end project: image acquisition + UltraScale GTH + PCIE, aurora 8b/10b codec + PCIE video transmission, providing engineering source code and QT host computer source code and technical support


FPGA high-end project: image acquisition + UltraScale GTH + PCIE, aurora 8b/10b codec + PCIE video transmission, providing engineering source code and QT host computer source code and technical support

1 Introduction

Even those who have never played with GT resources are embarrassed to say that they have played with FPGA. This is what a CSDN boss said, and I firmly believe it. . . GT resources are an important selling point of Xilinx series FPGAs and are also the basis for high-speed interfaces. Whether it is PCIE, SATA, MAC, etc., GT resources are needed for high-speed data serialization and deserialization. Different Xilinx FPGA series have different GT resource types, the low-end A7 has GTP, the K7 has GTX, the V7 has GTH, and the higher-end U+ series has GTY, etc. Their speeds are getting higher and higher, and their application scenarios are becoming more and more high-end. . . UltraScale GTH is suitable for FPGAs of the Xilinx UltraScale series, including Virtex UltraScale, Kintex UltraScale, Zynq® UltraScale and other devices. Under the UltraScale series, there is only GTH. Compared with GTH, UltraScale GTH has a higher line rate and supports more protocol types. , lower power consumption and higher bandwidth. . .

This article uses Xilinx's Kirtex7-UltraScale-xcku060-ffva1156-2-i FPGA's UltraScale GTH+PCIE, aurora 8b/10b codec, and PCIE video transmission experiments. There are two video sources, corresponding to the development boards in the hands of developers. If there is no HDMI input interface, one is to use a laptop to simulate HDMI video. The ADV7611 chip decodes the input HDMI video into GRB for use by the FPGA; if your development board has an HDMI input interface, or your development board has an HDMI input If the decoding chip is not ADV7611, you can use the dynamic color bar generated inside the code to simulate the camera video; the video source is selected through the define macro definition at the top level of the code, and the HDMI input is used as the video source by default; after the FPGA collects the video, it will first send Enter the data packet module to package the video, and add the control frame header and frame tail and other identifiers based on the character BC; then call the official Xilinx UltraScale GTH IP core and configure it to 8b/10b codec mode. The line rate is configured as 5G; then the 8b/10b encoded video is sent out through the onboard SFP optical port; the onboard SFP optical port then receives the 8b/10b encoded video; UltraScale GTH then performs 8b/10b decoding. ; Then send the data to the data alignment module for alignment processing; then send the data to the data unpacking module to remove the frame header and frame tail and restore the video timing; then use my commonly used FDMA image cache architecture to write the image into DDR4 to do three Frame cache; then call Xilinx official XDMA to read the video from DDR4 and send it to the PC through the PCIE bus; the PC runs the QT host computer and receives the image sent by the FPGA in an interrupt mode and displays it;

Provides a set of vivado2022.2 version FPGA project source code, provides a Windows version QT host computer source code, and provides a modified XDMA driver and its source code suitable for interrupt mode;

This blog describes in detail the design of Xilinx’s Kirtex7-UltraScale-xcku060-ffva1156-2-i FPGA’s UltraScale GTH resource for board-to-board video transmission experiments. The engineering code can be comprehensively compiled and debugged on the board, and can be directly Project transplantation is suitable for school students and graduate project development, as well as for on-the-job engineers to improve their studies. It can be applied to high-speed interfaces or image processing fields in medical, military and other industries;
Provides a complete , Runtong's engineering source code and technical support;
The method of obtaining the engineering source code and technical support is placed at the end of the article, please be patient until the end;

Disclaimer

This project and its source code include both parts written by myself and parts obtained from public channels on the Internet (including CSDN, Xilinx official website, Altera official website, etc.). If you feel offended, please send a private message to criticize and educate; based on this, this project The project and its source code are limited to readers or fans for personal study and research, and are prohibited from being used for commercial purposes. If legal issues arise due to commercial use by readers or fans themselves, this blog and the blogger have nothing to do with it, so please use it with caution. . .

2. Recommendation of relevant solutions

The GT high-speed interface solution I already have here

My homepage has the FPGA GT high-speed interface column. This column has video transmission routines and PCIE transmission routines for GT resources such as GTP, GTX, GTH, GTY, etc. GTP is built based on the A7 series FPGA development board, and GTX It is built based on the K7 or ZYNQ series FPGA development board, GTH is built based on the KU or V7 series FPGA development board, and GTY is built based on the KU+ series FPGA development board; the following is the column address:
Click to go directly

My existing PCIE solution

My homepage has a PCIE communication column, which implements data interaction with the QT host computer based on the polling mode of XDMA. There are both PCIE solutions based on RIFFA and PCIE solutions based on XDMA; both simple Data interaction, speed measurement, and application-level image collection and transmission are also available. The following is the column address:
Click to go directly
In addition, my homepage has PCIE in interrupt mode Communication column, this column implements data interaction with the QT host computer based on the XDMA interrupt mode. The following is the column address: Click to go directly

3. Detailed design plan

This article uses Xilinx's Kirtex7-UltraScale-xcku060-ffva1156-2-i FPGA's UltraScale GTH+PCIE, aurora 8b/10b codec, and PCIE video transmission experiments. There are two video sources, corresponding to the development boards in the hands of developers. If there is no HDMI input interface, one is to use a laptop to simulate HDMI video. The ADV7611 chip decodes the input HDMI video into GRB for use by the FPGA; if your development board has an HDMI input interface, or your development board has an HDMI input If the decoding chip is not ADV7611, you can use the dynamic color bar generated inside the code to simulate the camera video; the video source is selected through the define macro definition at the top level of the code, and the HDMI input is used as the video source by default; after the FPGA collects the video, it will first send Enter the data packet module to package the video, and add the control frame header and frame tail and other identifiers based on the character BC; then call the official Xilinx UltraScale GTH IP core and configure it to 8b/10b codec mode. The line rate is configured as 5G; then the 8b/10b encoded video is sent out through the onboard SFP optical port; the onboard SFP optical port then receives the 8b/10b encoded video; UltraScale GTH then performs 8b/10b decoding. ; Then send the data to the data alignment module for alignment processing; then send the data to the data unpacking module to remove the frame header and frame tail and restore the video timing; then use my commonly used FDMA image cache architecture to write the image into DDR4 to do three Frame cache; then call Xilinx official XDMA to read the video from DDR4 and send it to the PC through the PCIE bus; the PC runs the QT host computer and receives the image sent by the FPGA in an interrupt mode and displays it;

Design block diagram

The block diagram of the detailed engineering design plan is as follows:
Insert image description here
Block diagram explanation: The arrow represents the data flow direction, the text inside the arrow represents the data format, and the numbers outside the arrow represent the steps of the data flow direction;

Video source selection

There are two types of video sources, which correspond to whether the development board in the hands of the developer has an HDMI input interface. One is to use a laptop to simulate HDMI video. The ADV7611 chip decodes the input HDMI video into GRB and then supplies it. FPGA is used; if your development board has an HDMI input interface, or the HDMI input decoding chip of your development board is not ADV7611, you can use the dynamic color bar generated internally in the code to simulate camera video; the video source is selected through the define macro at the top level of the code The definition is carried out, and the HDMI input is used as the video source by default; the video source is selected through the `define macro definition at the top level of the code; as follows:
Insert image description here
The selection logic code part is as follows:
Insert image description here
The selection logic is as follows:
When (comment) define COLOR_TEST, the input source video is HDMI input;
When (no comment) define COLOR_TEST, the input source video It’s a dynamic color bar;

ADV7611 decoding chip configuration and collection

Use ADV7611 to decode the input HDMI video to adapt to the FPGA development board with the onboard ADV7611 decoding chip; the ADV7611 decoding chip requires i2c configuration to be used. The configuration and acquisition of the ADV7611 decoding chip are both implemented using the verilog code module. The code The resolution in the code is 1920x1080; the code location is as follows:
Insert image description here
The resolution in the code is 1920x1080;

Dynamic color bar

The dynamic color bar can be configured for videos of different resolutions. The border width of the video, the size of the dynamic moving block, the moving speed, etc. can all be parameterized. Here I configure the resolution as 1920x1080, the code location of the dynamic color bar module and the top-level interface. An example is as follows:
Insert image description here
Insert image description here

video data packet

Since video needs to be sent and received in GTH through the Aurora 8b/10b protocol, the data must be packaged to adapt to the Aurora 8b/10b protocol standard; the code location of the video data packaging module is as follows:
Insert image description here
First, we store the 16-bit video in the FIFO. When one line is full, it is read from the FIFO and sent to the GTH for transmission. Before that, a frame of video needs to be numbered, also called a command. When GTH packages Data is sent according to fixed instructions. When GTH unpacks, it restores the field synchronization signal and video valid signal of the video according to the fixed instructions; when the rising edge of the field synchronization signal of a frame of video arrives, a frame of video start instruction 0 is sent. When the falling edge of the field synchronization signal of a frame of video arrives, a frame of video start command 1 is sent. During the video blanking period, invalid data 0 and invalid data 1 are sent. When the video valid signal arrives, each line of video is numbered, and one line of video is sent first to start. command, after sending the current video line number, when one line of video is sent, a line of video end command is sent. After one frame of video is sent, first one frame of video end command 0 is sent, and then one frame of video end command 1 is sent; at this point, One frame of video is sent. This module is not easy to understand, so I made detailed Chinese comments in the code. It should be noted that in order to prevent the Chinese comments from being displayed out of order, please use the notepad++ editor to open the code; command definition As follows:
Insert image description here
The instruction can be changed arbitrarily, but the lowest byte must be bc;

UltraScale GTH The most detailed interpretation of the entire network

The most detailed introduction to UltraScale GTH is definitely Xilinx's official "ug576-ultrascale-gth-transceivers". Let's interpret it here:
"ug576-ultrascale-gth-transceivers" "I have put the PDF document in the information package, and there are ways to obtain it at the end of the article;
The FPGA model of the development board I used is Kirtex7-UltraScale-xcku060-ffva1156-2-i; UltraScale The transceiver speed of GTH is between 500 Mb/s and 16.375 Gb/s, which is 3G higher than GTH; UltraScale GTH transceiver supports different serial transmission interfaces or protocols, such as PCIE 1.1/2.0 interface, 10G network XUAI interface, OC-48, serial RapidIO interface, SATA (Serial ATA) interface, digital component serial interface (SDI), etc.;
The project calls UltraScale GTH to do data encoding and decoding of the Aurora 8b/10b protocol , the code location is as follows:
Insert image description here
The basic configuration of UltraScale GTH is as follows: onboard differential crystal oscillator 125M, line rate configuration is 5G, protocol type is referred to aurora 8b/10b;
Insert image description here

UltraScale GTH basic structure

Xilinx uses Quad to group serial high-speed transceivers. Four serial high-speed transceivers and a COMMOM (QPLL) form a Quad. Each serial high-speed transceiver is called a Channel. The following picture is a schematic diagram of the UltraScale GTH transceiver in the Kintex7 UltraScale FPGA chip: "ug576-ultrascale-gth-transceivers" page 19;
Insert image description here
In the FPGA of the Ultrascale/Ultrascale+ architecture series, the GTH high-speed Transceivers are usually divided using Quad. A Quad consists of four GTHE3/4_CHANNEL primitives and one GTHE3/4_COMMON primitive. Each GTHE3/4_COMMON contains two LC-tank plls (QPLL0 and QPLL1). Instantiating GTHE3/4_COMMON is only required when using QPLL in your application. Each GTHE3/4_CHANNEL consists of a channel PLL (CPLL), a transmitter, and a receiver. A reference clock can be connected directly to a GTHE3/4_CHANNEL primitive without instantiating GTHE3/4_COMMON;

The transmitter and receiver functions of the Ultrascale GTH transceiver are independent of each other and are composed of Physical Media Attachment (Physical Media Adaptation Layer PMA) and Physical Coding Sublayer (Physical Coding Sublayer PCS). PMA internally integrates serial-to-parallel conversion (PISO), pre-emphasis, receive equalization, clock generator and clock recovery, etc.; PCS internally integrates 8b/10b codec, elastic buffer, channel bonding and clock correction, etc. Each GTHE3 The logic circuit of the /4_CHANNEL source language is shown in the figure below: "ug576-ultrascale-gth-transceivers" page 20;
Insert image description here
It doesn't make much sense to say too much here, because I haven't done a few big ones. The project will not understand the things inside. For first-time users or those who want to use it quickly, more energy should be focused on the invocation and use of the IP core. I will also focus on the invocation and use of the IP core later; < /span>

Reference clock selection and distribution

GTH transceivers in UltraScale devices offer different reference clock input options. The reference clock selection architecture supports QPLL0, QLPLL1 and CPLL. Architecturally, each quad contains four GTHE3/4_CHANNEL primitives, one GTHE3/4_COMMON primitive, two dedicated external reference clock pin pairs, and dedicated reference clock routing. If a high-performance QPLL is used, GTHE3/4_COMMON must be instantiated, as shown in the detailed view of the GTHE3/4_COMMON clock multiplexer structure below (page 33 of "ug576-ultrascale-gth-transceivers") in a Quad There are 6 reference clock pin pairs in the quad, two local reference clock pin pairs: GTREFCLK0 or GTREFCLK1, two reference clock pin pairs from the upper two Quads: GTSOUTHREFCLK0 or GTSOUTHREFCLK1, two reference clock pin pairs from the lower quads Two Quads: GTNORTHREFCLK0 or GTNORTHREFCLK1.
Insert image description here

UltraScale GTH send and receive processing flow

First, after the user logic data is 8B/10B encoded, it enters a transmit buffer (Phase Adjust FIFO). This buffer is mainly used to isolate the clocks of the two clock domains of the PMA sublayer and PCS sublayer to solve the problem of clock rate matching and phase between the two. To solve the problem of differences, the high-speed Serdes is finally used for parallel-to-serial conversion (PISO). If necessary, pre-emphasis (TX Pre-emphasis) and post-emphasis can be performed. It is worth mentioning that if the TXP and TXN differential pins are accidentally cross-connected during PCB design, this design error can be compensated for by polarity control (Polarity). The processes at the receiving end and the transmitting end are opposite, and there are many similarities, so I won’t go into details here. What needs to be noted is the elastic buffer of the RX receiving end, which has clock correction and channel binding functions. You can write a paper or even a book for each function point here, so you only need to know a concept and use it in specific projects. Again: for first time use or want to use it quickly For readers, more energy should be focused on the calling and use of IP cores.

UltraScale GTH send interface

Pages 104 to 179 of "ug576-ultrascale-gth-transceivers" provide a detailed introduction to the sending processing process. Most of the content is not necessary for the user to delve into, because the manual is basically his own. The design concept leaves few interfaces for users to operate. Based on this idea, we focus on the interfaces that users need to use when instantiating UltraScale GTH;
Insert image description here
Users only need to use You only need to care about the clock and data of the sending interface. This part of the interface of the UltraScale GTH instantiation module is as follows: The file name is gth_aurora_example_wrapper.v, which is automatically generated by the official after instantiating the IP;
Insert image description here
Insert image description here
in the code I have rebinded it for you and made it the top level of the module. The code part is as follows:
The file name is gth_aurora_example_top.v; it instantiates the official gth_aurora_example_wrapper.v;
Insert image description here

UltraScale GTH receiving interface

Pages 181 to 314 of "ug576-ultrascale-gth-transceivers" provide a detailed introduction to the sending processing process. Most of the content is not necessary for the user to delve into, because the manual is basically his own. Based on the design concept, there are not many interfaces left for users to operate. Based on this idea, we focus on the interfaces that are needed for the sending part when instantiating UltraScale GTH;
Insert image description here
Users only need to use You only need to care about the clock and data of the sending interface. This part of the interface of the UltraScale GTH instantiation module is as follows: The file name is gth_aurora_example_wrapper.v, which is automatically generated by the official after instantiating the IP;
Insert image description here
Insert image description here
in the code I have rebinded it for you and made it the top level of the module. The code part is as follows:
The file name is gth_aurora_example_top.v; it instantiates the official gth_aurora_example_wrapper.v;
Insert image description here

UltraScale GTH IP core calling and usage

Insert image description here
The basic configuration of UltraScale GTH is as follows: onboard differential crystal oscillator 125M, line rate configuration is 5G, protocol type is referred to aurora 8b/10b;
Insert image description here
For specific configuration, please refer to vivado project, in IP After configuration, you need to open the example project and copy the files inside to use in your own project. However, this step has already been done in my project; the method to open the example project is as follows:
Insert image description here

data alignment

Since the aurora 8b/10b data transmission and reception of GT resources naturally has data misalignment, it is necessary to perform data alignment processing on the received decoded data. The code location of the data alignment module is as follows:
Insert image description here
The K code control character format I defined is: XX_XX_XX_BC, so use an rx_ctrl to indicate whether the data is a K code COM symbol;
rx_ctrl = 4'b0000 means that 4-byte data does not have a COM code;
rx_ctrl = 4'b0001 means [7: 0] in the 4-byte data is COM code;
rx_ctrl = 4'b0010 means 4-byte data [15: 8] is the COM code;
rx_ctrl = 4'b0100 means [23:16] of the 4-byte data is the COM code;
rx_ctrl = 4'b1000 means [31:24] in the 4-byte data is the COM code;
Based on this, when the K code is received, the data is aligned, that is, the data is Take a shot and perform dislocation combination with the new incoming data. This is the basic operation of FPGA and will not be repeated here;

Video data unpacking

Data unpacking is the reverse process of data grouping. The code location is as follows:
Insert image description here
When UltraScale GTH unpacks, it restores the video field sync signal and video effective signal according to fixed instructions; these The signal is an important signal for the subsequent image cache; so far, the data in and out of GTX has been discussed. I have described the block diagram of the entire process in the code, as follows:
The file name is helai_GT_8b10b_video.v;
Insert image description here

image cache

Old fans who often read my blog should know that my image caching routine is FDMA. Its function is to send the image into DDR for 3-frame buffering and then read out the display. The purpose is to match the input and output. Clock difference and improving output video quality. Regarding FDMA, please refer to my previous blog, blog address: Click to go directly
It should be noted here that FDMA requires It can only be used with a controller. The controller used in this design has code modifications compared to the previous controller and is not encapsulated as IP. The purpose of this is to facilitate parameter modification. For details, please refer to the code; the location is as follows:
Insert image description here

The use of XDMA and its interrupt mode

This design uses the official XDMA solution of Xilinx to build a PCIE communication platform based on the Xilinx series FPGA, and uses the interrupt mode of XDMA to communicate with the QT host computer, that is, the QT host computer realizes data interaction with the FPGA through software interrupts; XDMA will receive data from the SFP The received video is read from DDR3 and sent to the computer host through the PCIE bus. The computer host runs the QT host computer software. The QT software receives the image data sent by PCIE through the on-off method and displays the image in real time;

The key to this design is that we wrote an XDMA interrupt module. This module is used to cooperate with the driver to handle interrupts. xdma_inter.v provides the AXI-LITE interface. The host computer reads and writes the registers of xdma_inter.v by accessing the user space address. This module registers the interrupt bit number in the interrupt bit input by user_irq_req_i, and outputs it to the XDMA IP. When the driver of the host computer responds to the interrupt, it writes the xdma_inter.v register in the interrupt to clear the processed interrupt. The code location of the DMA interrupt module is as follows:
Insert image description here
Refer to my PCIE communication column, column address:Click to go directly
Insert image description here

QT host computer and its source code

QT host computer This solution uses VS2015 + Qt 5.12.10 to complete the construction of the host computer development software environment. The QT program calls the XDMA official API and uses interrupt mode to implement data interaction with the FPGA. This routine implements read and write speed measurement. , provides QT host computer software and its source code, the path is as follows:
Insert image description here
The screenshot of the QT source code part is as follows:
Insert image description here

4. Detailed explanation of vivado project

Development board FPGA model: Xilinx–Kirtex7-UltraScale-xcku060-ffva1156-2-i
Development environment: Vivado2022.2;
Input: HDMI or dynamic color bar, resolution 1920x1080@60Hz;
Output: 8B/10B loopback uses the optical fiber of the SFP optical port; the FPGA board and the PC use the PCIE3.0 bus;
Application: FPGA high-end project: image acquisition + UltraScale GTH + PCIE, aurora 8b/10b codec, PCIE video transmission;
The project Block Design is as follows: a> The FPGA resource consumption and power consumption estimate after comprehensive compilation is as follows:
Insert image description here
The project code structure is as follows:
Insert image description here

Insert image description here

5. Project transplantation instructions

Vivado version inconsistency handling

1: If your vivado version is consistent with the vivado version of this project, open the project directly;
2: If your vivado version is lower than the vivado version of this project, you need to open it After the project, click File –> Save As; however, this method is not safe. The safest way is to upgrade your vivado version to the vivado version of this project or a higher version;
Insert image description here
3 : If your vivado version is higher than the vivado version of this project, the solution is as follows:
Insert image description here
After opening the project, you will find that the IPs are locked, as follows:
Insert image description here
At this time It is necessary to upgrade the IP, please do as follows:
Insert image description here
Insert image description here

FPGA model inconsistency handling

If your FPGA model is inconsistent with mine, you need to change the FPGA model. The operation is as follows:
Insert image description here
Insert image description here
Insert image description here
After changing the FPGA model, you also need to upgrade the IP. The method of upgrading the IP has been described previously. ;

Other things to note

1: Since the DDR of each board is not necessarily exactly the same, the MIG IP needs to be configured according to your own schematic diagram. You can even directly delete the MIG of my original project here and re-add the IP and reconfigure it;
2: Modify the pin constraints according to your own schematic diagram, just modify it in the xdc file;
3: Transplanting pure FPGA to Zynq needs to be done in the project Add zynq soft core;

6. Board debugging and verification

Preparation

FPGA development board;
Laptop, if your board does not have an HDMI input interface, you can choose dynamic color bars;
SFP optical port module and optical fiber;
Desktop computer that supports PCIE3.0;
Connect the optical fiber, power on the board, and download the bit;
The optical fiber connection method of the board is as follows :
Insert image description here

static presentation

HDMI input: When UltraScale GTH runs 5 line rate, the output is as follows:
Insert image description here
Dynamic color bar input: When UltraScale GTH runs 5G line rate, the output is as follows:
Insert image description here

Dynamic presentation

A short video of dynamic color bar output was recorded. The output dynamic demonstration is as follows:

V7-GTH-COLOR

7. Benefits: Obtaining engineering codes

Benefit: Acquisition of engineering code
The code is too large to be sent by email. It will be sent via a certain network disk link.
How to obtain data: Private, or the V business card at the end of the article.
The network disk information is as follows:
Insert image description here
Insert image description here

Guess you like

Origin blog.csdn.net/qq_41667729/article/details/134794728