The most detailed explanation of the whole network of FPGA GTP, aurora 8b/10b protocol, OV5640 camera board-to-board video transmission, providing 2 sets of 4 sets of engineering source code and technical support

1 Introduction

I am ashamed to say that I have played FPGA before if I have never played GT resources. This is a sentence said by a big guy at CSDN, and I firmly believe it. . .
GT resources are an important selling point of Xilinx series FPGAs, and are also the basis for high-speed interfaces. Whether it is PCIE, SATA, MAC, etc., GT resources are required for high-speed data serialization and deserialization processing. Different FPGA series of Xilinx have different GT resource type, the low-end A7 has GTP, K7 has GTX, V7 has GTH, and the higher-end U+ series has GTY, etc. Their speed is getting higher and higher, and their application scenarios are becoming more and more high-end. . .

This article uses the GTP resource of Xilinx's Artix7 FPGA to do a board-to-board video transmission experiment. The video source uses a cheap OV5640 camera module, calls the GTP IP core, writes the codec module and data alignment module of video data in verilog, and uses a development board Two SFP optical ports on the hardware realize data transmission and reception; this blog provides two sets of vivado project source code, the difference between the two sets of projects is that two FPGA development boards are used for receiving and sending respectively; this blog describes in detail FPGA GTP video transmission The design scheme, the engineering code can be comprehensively compiled and debugged on the board, and the project can be directly transplanted. It is suitable for the development of students and postgraduate projects, and it is also suitable for the learning and improvement of in-service engineers. It can be applied to high-speed interfaces or images in medical, military and other industries. Processing field;
provide complete and smooth project source code and technical support;
the method of obtaining project source code and technical support is at the end of the article, please be patient to read the end;

disclaimer

This project and its source code are partly written by myself, and partly obtained from public channels on the Internet (including CSDN, Xilinx official website, Altera official website, etc.). The project and its source code are limited to the personal study and research of readers or fans, and commercial use is prohibited. If the legal issues caused by the commercial use of readers or fans for their own reasons have nothing to do with this blog and the blogger, please use it with caution. . .

2. The GT high-speed interface solution I have here

My homepage has a FPGA GT high-speed interface column, which includes video transmission routines and PCIE transmission routines for GT resources such as GTP, GTX, GTH, and GTY, among which GTP is based on A7 series FPGA development boards, and GTX is based on K7 or ZYNQ series FPGA development board is built, GTH is built based on KU or V7 series FPGA development board, GTY is built based on KU+ series FPGA development board; the following is the column address:
click to go directly

3. The most detailed interpretation of the GTP network

The most detailed introduction to GTP must be Xilinx's official "ug482_7Series_GTP_Transceivers", we use this to interpret:
I have put the PDF document of "ug482_7Series_GTP_Transceivers" in the information package, and there is a way to obtain it at the end of the article; the
FPGA model of the development board I used It is Xilinx Artix7 xc7a35tfgg484-2; it has 4 channels of GTP resources, and the sending and receiving speed of each channel is between 500 Mb/s and 6.6 Gb/s. GTP transceivers support different serial transmission interfaces or protocols, such as PCIE 1.1/2.0 interface, 10 Gigabit network XUAI interface, OC-48, serial RapidIO interface, SATA (Serial ATA) interface, digital component serial interface (SDI) etc;

GTP basic structure

Xilinx uses Quad to group serial high-speed transceivers. Four serial high-speed transceivers and a COMMOM (QPLL) form a Quad. Each serial high-speed transceiver is called a Channel (channel). The figure below shows four The schematic diagram of the GTP transceiver in the Artix-7 FPGA chip: "ug482_7Series_GTP_Transceivers" page 13; the
在这里插入图片描述
specific internal logic block diagram of GTP is shown below, which consists of four transceiver channels GTPE2_CHANNEL primitives and a GTPE2_COMMON primitive. Each GTPE2_CHANNEL includes a transmitting circuit TX and a receiving circuit RX; "ug482_7Series_GTP_Transceivers" page 14; the
在这里插入图片描述
logic circuit of each GTPE2_CHANNEL is shown in the following figure: "ug482_7Series_GTP_Transceivers" page 15;
在这里插入图片描述
the functions of the transmitting end and receiving end of GTPE2_CHANNEL are independent, both It consists of two sublayers: PMA (Physical Media Attachment, physical media adaptation layer) and PCS (Physical Coding Sublayer, physical coding sublayer). The PMA sublayer includes high-speed serial-to-parallel conversion (Serdes), pre-/post-emphasis, receiving equalization, clock generator and clock recovery circuits. The PCS sublayer includes circuits such as 8B/10B codec, buffer, channel bonding, and clock correction.
It doesn’t make much sense to say too much here, because if you haven’t done a few big projects, you won’t understand what’s inside. For first-time users or fast users, more energy should be focused on the calling and use of IP cores , I will also focus on the call and use of the IP core later;

GTP send and receive processing flow

First, the user logic data enters a sending buffer (Phase Adjust FIFO) after being encoded by 8B/10B. This buffer is mainly used for clock isolation between the two clock domains of the PMA sublayer and the PCS sublayer, and solves the clock rate matching and phase adjustment between the two. For the problem of difference, the parallel-to-serial conversion (PISO) is performed through high-speed Serdes. If necessary, pre-emphasis (TX Pre-emphasis) and post-emphasis can be performed. It is worth mentioning that if you accidentally cross-connect the TXP and TXN differential pins during PCB design, you can make up for this design error through polarity control (Polarity). The receiving end and the sending end process are opposite, and there are many similarities, so I won’t go into details here. It should be noted that the elastic buffer of the RX receiving end has clock correction and channel binding functions. Every function point here can write a paper or even a book, so here you only need to know a concept, and you can use it in a specific project, or the same sentence: For the first time or want to use it quickly For those who are concerned, more energy should be focused on the calling and use of IP cores.

Reference clock for GTP

The GTP module has two differential reference clock input pins (MGTREFCLK0P/N and MGTREFCLK1P/N), which can be selected by the user as the reference clock source of the GTP module. On the general A7 series development board, there is a 125Mhz GTP reference clock connected to MGTREFCLK0/1 as the GTP reference clock. The differential reference clock is converted into a single-ended clock signal by the IBUFDS module and entered into PLL0 and PLL1 of GTPE2_COMMOM to generate the required clock frequency in the TX and RX circuits. If the TX and RX transceiver speeds are the same, the TX circuit and the RX circuit can use the clock generated by the same PLL. If the TX and RX transceiver speeds are not the same, the clocks generated by different PLL clocks need to be used. Reference clock Here, the GT reference routine given by Xilinx has been done very well, and we don’t need to modify it when we call it; the reference clock structure of GTP is as follows: "ug482_7Series_GTP_Transceivers" page 21;
在这里插入图片描述

GTP send interface

Pages 75 to 123 of "ug482_7Series_GTP_Transceivers" introduce the transmission process in detail, and most of the content can be ignored by users, because the manual basically talks about his own design ideas, leaving the user's operable interface and Not many, based on this idea, we will focus on the interfaces that are required for the sending part left to the user during GTP instantiation;
在这里插入图片描述
users only need to care about the clock and data of the sending interface. The interface of this part of the GTP instantiation module is as follows:
在这里插入图片描述
在这里插入图片描述
In the code, I have rebinded for you and made it to the top level of the module, the code part is as follows:
在这里插入图片描述

GTP receiving interface

Pages 125 to 213 of "ug482_7Series_GTP_Transceivers" introduce the transmission process in detail, and most of the content can be ignored by users, because the manual basically talks about his own design ideas, leaving the user's operable interface and Not many, based on this idea, we will focus on the interfaces that are required for the sending part left to the user during GTP instantiation;
在这里插入图片描述
users only need to care about the clock and data of the receiving interface. The interface of this part of the GTP instantiation module is as follows:
在这里插入图片描述
在这里插入图片描述
In the code, I have rebinded for you and made it to the top level of the module, the code part is as follows:
在这里插入图片描述

GTP IP core call and use

在这里插入图片描述
Different from the tutorials of other bloggers on the Internet, I personally like to use the shared logic as shown in the figure below:
在这里插入图片描述
There are two advantages of this choice, one is to facilitate DRP speed change, and the other is to facilitate the modification of the IP core. After modifying the IP core, compile it directly. But, it is no longer necessary to open the example project, and then copy a bunch of files below into your own project or something. Does it need to be so complicated to play GTP?
在这里插入图片描述
Here is an explanation of the labels in the above picture:
1: Line rate, according to your own project requirements, the range of GTP is 0.5 to 6.25G. Since my project is video transmission, it can be within the range of GTP rate, for general use Specifically, I instantiated 5 GTPs in the vivado project, and the rates are 1G, 2G, 4G, 5G; 2:
Reference clock, this depends on your schematic diagram, it can be 80M, 125M, 148.5M, 156.25M Wait, my development board is 125M;
4: GTP group binding, this is very important, there are two binding references, it is your development board schematic diagram, but the official reference "ug482_7Series_GTP_Transceivers" , the official GTP resources are divided into 4 groups, the names are X0Y0, X0Y1, X0Y2, X0Y3. Since GT resources are dedicated resources for Xilinx series FPGAs and occupy dedicated Bnak, the pins are also dedicated. Then these GTP groups and boots How do the feet correspond? The description of "ug482_7Series_GTP_Transceivers" is as follows: the red box is the FPGA pin corresponding to the schematic diagram of my development board; the 在这里插入图片描述
schematic diagram of my board is as follows:
在这里插入图片描述
在这里插入图片描述
select the 8b/10b codec with an external data bit width of 32bit, as follows
在这里插入图片描述
: It is K code detection:
在这里插入图片描述
choose K28.5 here, which is the so-called COM code, and the hexadecimal system is bc. It has many functions, and it can represent idle disordered symbols and data misalignment marks. It is used to mark data misalignment. , the 8b/10b protocol defines the K code as follows:
在这里插入图片描述
下面讲的是时钟矫正,也就是对应GTP内部接收部分的弹性buffer;
在这里插入图片描述
这里有一个时钟频偏的概念,特别是收发双方时钟不同源时,这里设置的频偏为100ppm,规定每隔5000个数据包发送方发送一个4字节的序列,接收方的弹性buffer会根据这4字节的序列,以及数据在buffer中的位置来决定删除或者插入一个4字节的序列中的一个字节,目的是确保数据从发送端到接收端的稳定性,消除时钟频偏的影响;

4、设计思路框架

本博客提供2组vivado工程源码,2组工程的不同点在于使用两个FPGA开发板分别做接收和发送;我这里有2个FPGA开发板,记作开发板1和开发板2,两个开发板上均有ov5640摄像头和HDMI输出接口,2组vivado工程源码如下:
第1组vivado工程源码:开发板1采集本板子的ov5640摄像头数据,经过本板子的GTP编码,经过本板子的SFP光口的TX接口发出去;开发板2的SFP光口的RX接口接收数据,经过本板子的GTP解码,然后将图像做三帧缓存后经过silcom9134芯片发送HDMI视频给显示器;框图如下:
在这里插入图片描述
第2组vivado工程源码:开发板2采集本板子的ov5640摄像头数据,经过本板子的GTP编码,经过本板子的SFP光口的TX接口发出去;开发板1的SFP光口的RX接口接收数据,经过本板子的GTP解码,然后将图像做三帧缓存后经过HDMI发送模块发送HDMI视频给显示器;框图如下:
在这里插入图片描述

OV5640摄像头配置及采集

The OV5640 camera requires i2c configuration to use, and the video data of the DVP interface needs to be collected as video data in RGB565 or RGB888 format. Both parts are implemented with verilog code modules. The code location is as follows: The camera is configured with a resolution of 1280x720@60Hz, as
在这里插入图片描述
follows :
在这里插入图片描述
The camera capture module supports video output in RGB565 and RGB888 formats, which can be configured by parameters, as follows:
在这里插入图片描述
RGB_TYPE=0 outputs the original RGB565 format;
RGB_TYPE=1 outputs the original RGB888 format;
the design selects the RGB565 format;

video packet

Since the video needs to be sent and received through the aurora 8b/10b protocol in GTP, the data must be packaged to adapt to the aurora 8b/10b protocol standard; the code position of the video data package module is as follows: first, we store the 16bit video in the
在这里插入图片描述
FIFO , when a row is full, it is read from FIFO and sent to GTP for transmission; before that, a frame of video needs to be numbered, which is also called an instruction. When GTP packs, it sends data according to a fixed instruction. The command restores the video field synchronization signal and video effective signal; when the rising edge of a frame of video field synchronization signal arrives, send a frame of video start instruction 0, and when the falling edge of a frame of video field synchronization signal arrives, send a frame Video start command 1, send invalid data 0 and invalid data 1 during the video blanking period, number each line of video when the video valid signal arrives, first send a line of video start command, and then send the current video line number, when a line of video is sent After completion, send a line of video end command. After sending a frame of video, first send a frame of video end command 0, and then send a frame of video end command 1; so far, a frame of video is sent. This module is not easy to understand. So I made detailed Chinese comments in the code. It should be noted that in order to prevent the disordered display of Chinese comments, please open the code with notepad++ editor; the command definition is as follows: the command can be changed arbitrarily, but the lowest byte must be
在这里插入图片描述
bc ;

GTP aurora 8b/10b


在这里插入图片描述
This is to call GTP to do the data encoding and decoding of the aurora 8b/10b protocol. I have already made a detailed overview of GTP, so I won’t talk about it here; 2G, 4G, 5G; the code uses a parameter to select the rate, as follows:
在这里插入图片描述
GTP_RATE=8'd1, GTP runs at 1G line rate;
GTP_RATE=8'd2, GTP runs at 2G line rate;
GTP_RATE=8'd4, GTP runs at 2G line rate; Run at 4G line rate;
GTP_RATE=8'd5, GTP runs at 5G line rate;
According to my test, the video transmission effect is best when GTP runs at 4G line rate;

data alignment

Since the aurora 8b/10b data transmission and reception of GT resources naturally has data misalignment, it is necessary to perform data alignment processing on the received decoded data. The code position of the data alignment module is as follows: The K code control character format I defined is: XX_XX_XX_BC, so
在这里插入图片描述
use One rx_ctrl indicates whether the data is a K-code COM symbol;
rx_ctrl = 4'b0000 indicates that the 4-byte data has no COM code;
rx_ctrl = 4'b0001 indicates that [7: 0] in the 4-byte data is a COM code;
rx_ctrl = 4'b0010 means [15: 8] in the 4-byte data is the COM code;
rx_ctrl = 4'b0100 means the [23:16] in the 4-byte data is the COM code;
rx_ctrl = 4'b1000 means the 4-byte [31:24] in the data is the COM code;
based on this, when the K code is received, the data will be aligned, that is, the data will be patted, and the new data will be misplaced and combined. This is the basis of FPGA Operation, no more details here;

Video data unpacking

数据解包是数据组包的逆过程,代码位置如下:
在这里插入图片描述
GTP解包时根据固定的指令恢复视频的场同步信号和视频有效信号;这些信号是作为后面图像缓存的重要信号;
至此,数据进出GTP部分就已经讲完了,整个过程的框图我在代码中描述了,如下:
在这里插入图片描述

图像缓存

经常看我博客的老粉应该都知道,我做图像缓存的套路是FDMA,他的作用是将图像送入DDR中做3帧缓存再读出显示,目的是匹配输入输出的时钟差和提高输出视频质量,关于FDMA,请参考我之前的博客,博客地址:点击直接前往

视频输出

视频从FDMA读出后,经过VGA时序模块和HDMI发送模块后输出显示器,代码位置如下:
在这里插入图片描述
VGA时序配置为1280X720,HDMI发送模块采用verilog代码手写,可以用于FPGA的HDMI发送应用,关于这个模块,请参考我之前的博客,博客地址:点击直接前往
开发板2的HDMI输出使用silcom9134芯片完成,silicon9134芯片需要i2c配置才能使用,关于silicon9134芯片的配置,请参考我之前的博客,博客地址:点击直接前往

5、第1组vivado工程详解

开发板1的ov5640摄像头GTP发送工程

开发板FPGA型号:Xilinx–Artix7–xc7a35tfgg484-2;
开发环境:Vivado2019.1;
输入:开发板1的ov5640摄像头,分辨率1280x720@60Hz;
输出:开发板1的SFP光口的TX接口;
应用:OV5640摄像头板对板视频传输;
工程Block Design如下:
在这里插入图片描述
工程代码架构如下:
在这里插入图片描述
综合编译完成后的FPGA资源消耗和功耗预估如下:
在这里插入图片描述

开发板2的GTP接收HDMI显示工程

开发板FPGA型号:Xilinx–Artix7–xc7a35tfgg484-2;
开发环境:Vivado2019.1;
输入:开发板2的SFP光口的RX接口
输出:开发板2的HDMI发送接口送显示器显示;
应用:OV5640摄像头板对板视频传输;
工程Block Design如下:
在这里插入图片描述
工程代码架构如下:
在这里插入图片描述
综合编译完成后的FPGA资源消耗和功耗预估如下:
在这里插入图片描述

6、第2组vivado工程详解

开发板2的ov5640摄像头GTP发送工程

开发板FPGA型号:Xilinx–Artix7–xc7a35tfgg484-2;
开发环境:Vivado2019.1;
输入:开发板2的ov5640摄像头,分辨率1280x720@60Hz;
输出:开发板2的SFP光口的TX接口;
应用:OV5640摄像头板对板视频传输;
工程代码架构如下:
在这里插入图片描述
综合编译完成后的FPGA资源消耗和功耗预估如下:
在这里插入图片描述

开发板1的GTP接收HDMI显示工程

开发板FPGA型号:Xilinx–Artix7–xc7a35tfgg484-2;
开发环境:Vivado2019.1;
输入:开发板1的SFP光口的RX接口
输出:开发板1的HDMI发送接口送显示器显示;
应用:OV5640摄像头板对板视频传输;
工程Block Design如下:
在这里插入图片描述
工程代码架构如下:
在这里插入图片描述
综合编译完成后的FPGA资源消耗和功耗预估如下:
在这里插入图片描述

7、上板调试验证

光纤连接

第1组vivado工程的两块板子的光纤接法如下:
在这里插入图片描述
第2组vivado工程的两块板子的光纤接法如下:
在这里插入图片描述

静态演示

下面以第1组vivado工程的两块板子为例展示输出效果:
当GTP运行4G线速率时输出如下:
在这里插入图片描述

动态演示

第1组vivado工程的两块板子时的短视频如下;

GTP-OV5640摄像头板对板视频传输1


第2组vivado工程的两块板子时的短视频如下;

GTP-OV5640摄像头板对板视频传输2

8、福利:工程代码的获取

福利:工程代码的获取
代码太大,无法邮箱发送,以某度网盘链接方式发送,
资料获取方式:私,或者文章末尾的V名片。
网盘资料如下:
在这里插入图片描述
在这里插入图片描述
在这里插入图片描述

Guess you like

Origin blog.csdn.net/qq_41667729/article/details/132306518