FPGA GTH aurora 8b/10b codec PCIE video transmission, provide 2 sets of engineering source code plus QT host computer source code and technical support

1 Introduction

Even those who have never played with GT resources are embarrassed to say that they have played with FPGA. This is what a CSDN boss said, and I firmly believe it. . .
GT resources are an important selling point of Xilinx series FPGAs and are also the basis for high-speed interfaces. Whether it is PCIE, SATA, MAC, etc., GT resources are needed for high-speed data serialization and deserialization. Different Xilinx FPGA series have different GT resource types, the low-end A7 has GTP, the K7 has GTX, the V7 has GTH, and the higher-end U+ series has GTY, etc. Their speeds are getting higher and higher, and their application scenarios are becoming more and more high-end. . .

This article uses the GTH resource of Xilinx's Virtex7 FPGA to do a video transmission experiment. There are two video sources, which correspond to whether the developer has a camera. One is to use a laptop to simulate HDMI video, and silicon9011 decodes the input HDMI to GRB. For FPGA use; if you do not have a camera, or your development board does not have an HDMI input interface, you can use the dynamic color bar generated internally in the code to simulate camera video; the video source is selected through the `define macro definition at the top level of the code, the default Use HDMI input as the video source; call the GTH IP core, use verilog to write the encoding and decoding module and data alignment module of the video data, use the two SFP optical ports on the development board hardware to implement data transmission and reception, and the FPGA receives the high-speed data sent from the SFP After receiving the data, use FDMA to write the data into the DDR3 cache, then call XDMA to read the data in DDR3, and then send the data to the computer host through the PCIE2.0 bus through XDMA. The computer uses the QT host computer to receive and display the image; This blog provides 2 sets of vivado project source code. The difference between the 2 sets of projects is whether to use 1 SFP optical port for sending and receiving or two 2 SFP optical ports for sending and receiving;

This blog describes in detail the design scheme of FPGA GTH 8b/10b codec PCIE video transmission. The engineering code can be comprehensively compiled and debugged on the board, and can be directly transplanted into the project. It is suitable for school students and graduate project development, and is also suitable for on-the-job engineers to study. Improved, it can be applied to high-speed interfaces or image processing fields in medical, military and other industries;
provides complete, run-through engineering source code and technical support;
the method of obtaining engineering source code and technical support is placed at the end of the article, please be patient until the end ;

Disclaimer

This project and its source code are partly written by myself, and partly obtained from public channels on the Internet (including CSDN, Xilinx official website, Altera official website, etc.). The project and its source code are limited to the personal study and research of readers or fans, and commercial use is prohibited. If the legal issues caused by the commercial use of readers or fans for their own reasons have nothing to do with this blog and the blogger, please use it with caution. . .

2. The GT high-speed interface solution I already have here

My homepage has a FPGA GT high-speed interface column. This column has video transmission routines and PCIE transmission routines for GT resources such as GTP, GTX, GTH, GTY, etc. GTP is based on the A7 series FPGA development board, and GTX is based on the K7 or ZYNQ series. FPGA development board is built, GTH is built based on KU or V7 series FPGA development board, GTY is built based on KU+ series FPGA development board; the following is the column address:
click to go directly

3. The most detailed interpretation of GTH on the entire network

The most detailed introduction to GTH is definitely Xilinx's official "ug476_7Series_Transceivers". Let's use this to interpret:
I have put the PDF document of "ug476_7Series_Transceivers" in the information package. There is a way to obtain it at the end of the article; the
FPGA model of the development board I used It is Xilinx Virtex7 xc7vx690tffg1761-3; it has 8 GTH resources, 2 of which are connected to 2 SFP optical ports, and the transceiver speed of each channel is between 500 Mb/s and 10.3125 Gb/s. GTH transceiver supports different serial transmission interfaces or protocols, such as PCIE 1.1/2.0 interface, 10GbE XUAI interface, OC-48, serial RapidIO interface, SATA (Serial ATA) interface, digital component serial interface (SDI) etc;

GTH basic structure

Xilinx uses Quad to group serial high-speed transceivers. Four serial high-speed transceivers and a COMMOM (QPLL) form a Quad. Each serial high-speed transceiver is called a Channel. The figure below shows four channels. Schematic diagram of GTH transceiver in Virtex7 FPGA chip: "ug476_7Series_Transceivers" page 24; The
insert image description here
specific internal logic block diagram of GTH is as follows. It consists of four transceiver channels GTHE2_CHANNEL primitive and one GTHE2_COMMON primitive. Each GTHE2_CHANNEL includes a transmitting circuit TX and a receiving circuit RX. The clock of GTHE2_CHANNEL can come from CPLL or QPLL and can be configured in the IP configuration interface; page 25 of "ug476_7Series_Transceivers"; the logic circuit of each GTHE2_CHANNEL is as shown in the figure below: "ug476_7Series_Transceivers
insert image description here
" 》Page 26;
insert image description here
The sender and receiver functions of GTHE2_CHANNEL are independent and are composed of two sub-layers: PMA (Physical Media Attachment, physical media adaptation layer) and PCS (Physical Coding Sublayer, physical coding sublayer). The PMA sublayer includes high-speed serial-to-parallel conversion (Serdes), pre/post-emphasis, receiving equalization, clock generator and clock recovery circuits. The PCS sublayer includes circuits such as 8B/10B codec, buffer, channel bonding and clock correction.
It doesn’t make much sense to say too much here, because you won’t understand what’s inside if you haven’t done several large projects. For first-time users or those who want to use it quickly, more energy should be focused on the calling and use of IP cores. , I will also focus on the calling and use of IP core later;

GTH send and receive processing flow

First, after the user logic data is 8B/10B encoded, it enters a transmit buffer (Phase Adjust FIFO). This buffer is mainly used to isolate the clocks of the two clock domains of the PMA sublayer and PCS sublayer to solve the problem of clock rate matching and phase between the two. To solve the problem of differences, the high-speed Serdes is finally used for parallel-to-serial conversion (PISO). If necessary, pre-emphasis (TX Pre-emphasis) and post-emphasis can be performed. It is worth mentioning that if the TXP and TXN differential pins are accidentally cross-connected during PCB design, this design error can be compensated for by polarity control (Polarity). The processes at the receiving end and the transmitting end are opposite, and there are many similarities, so I won’t go into details here. What needs to be noted is the elastic buffer of the RX receiving end, which has clock correction and channel binding functions. You can write a paper or even a book for each function point here, so you only need to know a concept and use it in specific projects. Again: for first time use or want to use it quickly For readers, more energy should be focused on the calling and use of IP cores.

GTH reference clock

The GTH module has two differential reference clock input pins (MGTREFCLK0P/N and MGTREFCLK1P/N), which can be selected by the user as the reference clock source of the GTH module. On general A7 series development boards, there is a 148.5Mhz GTH reference clock connected to MGTREFCLK0 as the GTH reference clock. The differential reference clock is converted into a single-ended clock signal through the IBUFDS module and enters the QPLL or CPLL of GTHE2_COMMOM to generate the required clock frequencies in the TX and RX circuits. If the TX and RX transceiver speeds are the same, the TX circuit and the RX circuit can use the clock generated by the same PLL. If the TX and RX transceiver speeds are not the same, the clocks generated by different PLL clocks need to be used. Reference clock The GT reference routine provided by Xilinx is already very good. We don’t need to modify it when we call it. The reference clock structure diagram of GTH is as follows: "ug476_7Series_Transceivers" page 31;
insert image description here

GTH sending interface

Pages 107 to 165 of "ug476_7Series_Transceivers" introduce the sending processing process in detail. Most of the content does not need to be studied deeply by users, because the manual basically talks about his own design ideas, leaving user-operable interfaces and Not much. Based on this idea, we will focus on the interfaces needed for the sending part left to the user when instantiating GTH;
insert image description here

Users only need to care about the clock and data of the sending interface. This part of the interface of the GTH instantiated module is as follows:
insert image description here
insert image description here
In the code, I have re-binded it for you and made it to the top level of the module. The code part is as follows:
insert image description here

GTH receiving interface

Pages 167 to 295 of "ug476_7Series_Transceivers" introduce the transmission process in detail, and most of the content can be ignored by users, because the manual basically talks about his own design ideas, leaving the user's operable interface and Not many, based on this idea, we will focus on the interface that the user needs to use in the sending part of the GTH instantiation; the
insert image description here
user only needs to care about the clock and data of the receiving interface. The interface of this part of the GTH instantiation module is as follows:
insert image description here
insert image description here
In the code, I have rebinded it for you and made it to the top level of the module. The code part is as follows:
insert image description here

GTH IP core calling and usage

insert image description here
Different from the tutorials of other bloggers on the Internet, I personally like to use the sharing logic as shown below:
insert image description here
There are two benefits of this choice. One is to facilitate DRP speed change, and the other is to facilitate the modification of the IP core. After modifying the IP core, it can be compiled directly. But, you no longer need to open the example project, and then copy a bunch of files below and put them into your own project. Does it need to be so complicated to play with GTH?
insert image description here
Here is an explanation of the numbers in the above picture:
1: Line rate. According to your project requirements, the range of GTH is 0.5 to 13.1G. Since my project is video transmission, it can be within the rate range of GTH. In this example Cheng chose 5G;
2: Reference clock, this depends on your schematic, it can be 80M, 125M, 148.5M, 156.25M, etc., my development board is 156.25M; 4: GTH group binding,
this It's very important. There are two binding references. It is your development board schematic diagram, but it is the official reference "ug476_7Series_Transceivers". The official divides the GTH resources into multiple groups according to different BANKs, because the GT resources are Xilinx series. The dedicated resources of FPGA occupy dedicated Bnak, so the pins are also dedicated. So how do these GTH groups and pins correspond? The description of "ug476_7Series_Transceivers" is as follows: The red box is the FPGA pin corresponding to the schematic diagram of my development board; my
insert image description here
board schematic diagram is as follows:
insert image description here
insert image description here
Select the 8b/10b codec with an external data bit width of 32bit, as follows: What
insert image description here
is discussed here It is K code detection:
insert image description here
K28.5 is selected here, which is the so-called COM code. The hexadecimal is bc. It has many functions. It can represent idle disorder symbols and data misalignment flags. It is used here to mark data misalignment. , the 8b/10b protocol defines the K code as follows:
insert image description here
The following is clock correction, which is the elastic buffer corresponding to the internal receiving part of the GTH;
insert image description here
Here is a concept of clock frequency offset, especially when the clocks of the sending and receiving parties are from different sources, the frequency offset set here is 100ppm, and it is stipulated that the sender sends a 4-byte sequence every 5000 data packets, and the elastic buffer of the receiver will be based on The 4-byte sequence and the position of the data in the buffer determine whether to delete or insert a byte in a 4-byte sequence, in order to ensure the stability of the data from the sender to the receiver and eliminate the influence of clock frequency offset ;

4. Design idea framework

This blog provides 2 sets of vivado project source code. The difference between the 2 sets of projects is whether one SFP optical port is used for transceiver or two SFP optical ports are used for transceiver; when using 1 SFP optical port for transceiver, the optical fiber is used to connect the SFP RX. and TX; using 2 SFP optical ports for transceiver use optical fiber to connect the RX of one SFP and the TX of another SFP; the design framework is as follows: The block diagram of
using 2 SFP optical ports is as follows:
insert image description here
The block diagram of using 1 SFP optical port is as follows:
insert image description here

Video source selection

There are two types of video sources, which correspond to whether the developer has a camera in his hand. If you have a camera in your hand, or your development board has an HDMI input interface, use the HDMI input as the video input source, which I use here. It is a notebook simulating HDMI video, and the silicon9011 decoding chip decodes HDMI; if you do not have a camera, or your development board does not have an HDMI input interface, you can use the dynamic color bar generated internally in the code to simulate the camera video. The dynamic color bar is a moving picture. , can completely simulate video; HDMI input is used as the video source by default; the video source is selected through the `define macro definition at the top level of the code; as follows: The selection logic
insert image description here
code part is as follows:
insert image description here
The selection logic is as follows:
When (comment) define COLOR_TEST, the input source The video is a dynamic color bar;
when (without comment) define COLOR_TEST, the input source video is HDMI input;

Silicon9011 decoding chip configuration and acquisition

The silicon9011 decoding chip requires i2c configuration before it can be used. Regarding the configuration and use of the silicon9011 decoding chip, please refer to my previous blog. Blog address: Click to go directly to the silicon9011 decoding chip configuration and acquisition. Both parts are implemented with the verilog code module. Code
location As follows:
insert image description here
1920x1080 resolution is configured in the code;

Dynamic color bar

The dynamic color bar can be configured as videos with different resolutions. The border width of the video, the size of the dynamic moving square, and the moving speed can all be parameterized. I configure the resolution here as 1920x1080, the code position of the dynamic color bar module and the top-level interface and An example is as follows:
insert image description here
insert image description here

video data packet

Since the video needs to be sent and received in GTH through the Aurora 8b/10b protocol, the data must be packaged to adapt to the Aurora 8b/10b protocol standard; the code location of the video data package module is as follows: First, we store the 16-bit video in the
insert image description here
FIFO , when a row is full, it is read from the FIFO and sent to the GTH for transmission; before that, a frame of video needs to be numbered, also called an instruction. When GTH is packetizing, data is sent according to fixed instructions. When GTH is unpacking, data is sent according to fixed instructions. The instruction restores the field synchronization signal and video valid signal of the video; when the rising edge of the field synchronization signal of a frame of video arrives, a frame of video start command 0 is sent, and when the falling edge of the field synchronization signal of a frame of video arrives, a frame is sent Video start command 1, send invalid data 0 and invalid data 1 during the video blanking period, when the video valid signal arrives, each line of video is numbered, first send a line of video start command, then send the current video line number, when a line of video is sent After completion, send another line of video end command. After one frame of video is sent, first send one frame of video end command 0, and then send one frame of video end command 1; at this point, one frame of video is sent. This module is not easy to understand. So I made detailed Chinese comments in the code. It should be noted that in order to prevent the Chinese comments from being displayed out of order, please use the notepad++ editor to open the code; the instruction is defined as follows: the instruction can be changed arbitrarily, but the lowest byte must be
insert image description here
bc ;

GTH aurora 8b/10b

This is to call GTH to do data encoding and decoding of the Aurora 8b/10b protocol. GTH has been outlined in detail before and will not be discussed here; the code location is as follows:
insert image description here

data alignment

Since the aurora 8b/10b data transmission and reception of GT resources naturally has data misalignment, it is necessary to perform data alignment processing on the received decoded data. The code position of the data alignment module is as follows: The K code control character format I defined is: XX_XX_XX_BC, so
insert image description here
use One rx_ctrl indicates whether the data is a K-code COM symbol;
rx_ctrl = 4'b0000 indicates that the 4-byte data has no COM code;
rx_ctrl = 4'b0001 indicates that [7: 0] in the 4-byte data is a COM code;
rx_ctrl = 4'b0010 means [15: 8] in the 4-byte data is the COM code;
rx_ctrl = 4'b0100 means the [23:16] in the 4-byte data is the COM code;
rx_ctrl = 4'b1000 means the 4-byte [31:24] in the data is the COM code;
based on this, when the K code is received, the data will be aligned, that is, the data will be patted and combined with the new incoming data. This is the basis of FPGA Operation, no more details here;

Video data unpacking

Data unpacking is the reverse process of data grouping. The code location is as follows:
insert image description here
When GTH is unpacking, it restores the field synchronization signal and video effective signal of the video according to fixed instructions; these signals are important signals for subsequent image cache; at this point, the data enters and exits the
GTH This part has been explained. I have described the block diagram of the whole process in the code, as follows:
insert image description here

image cache

Old fans who often read my blog should know that my routine for image caching is FDMA. Its function is to send the image to DDR for 3-frame buffering and then read it out for display. The purpose is to match the input and output clock difference and improve the output. Video quality, regarding FDMA, please refer to my previous blog, blog address: Click to go directly
here. It should be noted that the DDR3 of my development board is not a direct patch DDR3 particle, but a DDR3 memory stick with a SODIMM interface. This In each case, the MIG configuration method is different. Based on this, I have written a blog before to introduce how to configure the DDR3 memory module of the SODIMM interface in the FPGA MIG. You can go there to take a look. Please refer to my previous blog. Blog, blog address: click to go directly

The use of XDMA and its interrupt mode

This design uses the official XDMA solution of Xilinx to build a PCIE communication platform based on the Xilinx series FPGA, and uses the interrupt mode of XDMA to communicate with the QT host computer, that is, the QT host computer realizes data interaction with the FPGA through software interrupts; XDMA will receive data from the SFP The received video is read from DDR3 and sent to the computer host through the PCIE bus. The computer host runs the QT host computer software. The QT software receives the image data sent by PCIE through the on-off method and displays the image in real time;

The key to this design is that we wrote an XDMA interrupt module. This module is used to cooperate with the driver to handle interrupts. xdma_inter.v provides the AXI-LITE interface. The host computer reads and writes the registers of xdma_inter.v by accessing the user space address. This module registers the interrupt bit number in the interrupt bit input by user_irq_req_i, and outputs it to the XDMA IP. When the driver of the host computer responds to the interrupt, it writes the xdma_inter.v register in the interrupt to clear the processed interrupt.
insert image description here
The code location of the DMA interrupt module is
insert image description here
as follows:

QT host computer and its source code

QT host computer This program uses VS2015 + Qt 5.12.10 to complete the development software environment of the host computer. The QT program calls the official XDMA API to realize data interaction with the FPGA in interrupt mode. This routine implements reading and writing speed measurement and provides QT host computer The software and its source code, the path is as follows:
insert image description here
A screenshot of the QT source code is as follows:
insert image description here

5. vivado project 1–>2-way SFP transmission

Development board FPGA model: Xilinx–Virtex7–xc7vx690tffg1761-3;
Development environment: Vivado2019.1;
Input: HDMI or dynamic color bar, resolution 1920x1080@60Hz;
Output: PCIE2.0 X8;
Application: 2-way SFP GTH 8b/10b Encode and decode PCIE video transmission;
the project Block Design is as follows:
insert image description here
the project code structure is as follows:
insert image description here
FPGA resource consumption and power consumption estimates after comprehensive compilation are as follows:
insert image description here

6. vivado project 2–>1 channel SFP transmission

Development board FPGA model: Xilinx–Virtex7–xc7vx690tffg1761-3;
development environment: Vivado2019.1;
input: HDMI or dynamic color bars, resolution 1920x1080@60Hz;
output: PCIE2.0 X8;
application: 1 channel SFP GTH 8b/10b Codec PCIE video transmission;
the project Block Design is as follows:
insert image description here
the project code structure is as follows:
insert image description here
FPGA resource consumption and power consumption estimation after comprehensive compilation is as follows:
insert image description here

7. Project transplantation instructions

Inconsistent handling of vivado versions

1: If your vivado version is consistent with the vivado version of this project, open the project directly;
2: If your vivado version is lower than the vivado version of this project, you need to open the project and click File –> Save As; but this method does not It is not safe. The safest way is to upgrade your vivado version to the vivado version of this project or a higher version;
insert image description here
3: If your vivado version is higher than the vivado version of this project, the solution is as follows:
insert image description here
After opening the project, you will find that the IP has been It is locked, as follows:
insert image description here
At this time, the IP needs to be upgraded, and the operation is as follows:
insert image description here
insert image description here

Inconsistent processing of FPGA models

If your FPGA model is inconsistent with mine, you need to change the FPGA model. The operation is as follows:
insert image description here
insert image description here
insert image description here
After changing the FPGA model, you also need to upgrade the IP. The method of upgrading the IP has been described previously;

Other things to note

1: Since the DDR of each board is not necessarily exactly the same, the MIG IP needs to be configured according to your own schematic. You can even directly delete the MIG of my original project and re-add the IP and reconfigure it; 2: According to your
own To modify the pin constraints of the schematic diagram, just modify it in the xdc file;
3: When transplanting pure FPGA to Zynq, you need to add the zynq soft core to the project;

8. Board debugging and verification

Fiber optic connection

Project 2: The optical fiber connection method for 1-channel SFP transmission is as follows:
insert image description here
Output effect after HDMI input:
When GTH runs at 5 line rate, the output is as follows:
insert image description here
Dynamic color bar output effect:
When GTH runs at 5G line rate, the output is as follows:
insert image description here

Dynamic presentation

Dynamic color bar output effect demonstration video:

V7-GTH-COLOR

9. Benefits: Obtaining engineering codes

Bonus: Acquisition of engineering code.
The code is too large to be sent by email. It will be sent via a certain network disk link.
The information acquisition method is: private, or the V business card at the end of the article.
The network disk information is as follows:
insert image description here
insert image description here

Guess you like

Origin blog.csdn.net/qq_41667729/article/details/132700767