FPGA pure verilog code realizes H265 video compression, supports 4K30 frame resolution, provides engineering source code and technical support

1 Introduction

H265 video compression and decoding are widely used in the field of FPGA image transmission. Xilinx high-end devices have embedded H265 accelerators, which can be used by calling the API under the Linux system. However, for H264 video compression and decoding applications or learning that require custom private algorithms or protocols For researchers, pure verilog code to achieve H264 video compression is still of practical value. This design uses pure verilog code to realize H265 video compression without using any IP, which has reference value;

This article describes in detail the design scheme of FPGA pure verilog code to realize H265 video compression. The engineering code can be comprehensively compiled and debugged on the board, but at present only the simulation level has been achieved, and it can be directly transplanted to the project. For in-service engineers to do project development, it can be applied to the fields of digital imaging and image compression in medical, military and other industries;
provide complete and smooth engineering source code and technical support;
the method of obtaining engineering source code and technical support is at the end of the article, Please be patient until the end;

2. The video image codec scheme I have here

I have image JPEG decompression, JPEG-LS compression, H264 codec, H265 codec and other solutions, and there will be more solutions in the future. I will integrate them into a column and will continue to update. Column address:
direct click to go

3. H265 – Video Compression Theory

Please refer to this article, it feels very well written: https://blog.csdn.net/qq_39969848/article/details/129003896

4. H265 – video compression – performance

The performance is as follows:
input video format: YUV4:2:0;
output video stream: h265 video stream; 8 bits; valid signal accompanying data;
supported resolution: maximum 4K@30Hz;
GOP: I/P; that is, intra frame Inter prediction algorithm;
up to 35 prediction modes;
CTU: 64x64;
CU: 8x8~64x64;
PU: 4x4~64x64;
TU: 4x4/8x8/16x16/32x32;
1/4 Sub-pixel;
Search range: 64;
CABAC Entropy coding;
SAO (Sample Adaptive Offset);
SKIP/MERGE;

5. H265-Video Compression-Design Scheme

insert image description here
The basic process of video compression coding can be summarized as follows:
1. Use some method to predict the pixels of the currently processed image block;
2. Subtract the original pixel value from the predicted pixel value to obtain the residual value;
3. The residual value Transformation and quantization are performed to obtain the output residual coefficients, which are then subjected to entropy coding to form the final compressed code stream;
4. The residual coefficients are dequantized and inversely transformed, and then added to the previously obtained predicted pixels to obtain reconstructed pixels. Stored as reference pixels for prediction.

The top-level interface of the H264 video compression code is as follows:

module helai_h265_encode_2023(
  input                         clk                   ,	//系统时钟,最大支持 400MHz
  input                         rstn                  ,	//系统复位--低电平有效 
  //初始化配置
  //db -->Deblocking filter 块滤波器
  //sao-->sample adaptive offseta 自适应采样偏移  
  input                         i_h265_start          ,	//输入--开启h265压缩--高电平有效
  output                        o_h265_ec_OK          ,	//输出--h265压缩完成标志--高电平有效
  input  [`PIC_WIDTH -1 :0]     i_yuv_width           , //输入--压缩视频总宽度
  input  [`PIC_HEIGHT-1 :0]     i_yuv_height          , //输入--压缩视频总高度
  input  [5             :0]     i_init_QP             , //输入--初始化QP值--例程给的27
  input                         i_DataRate_type       , //输入--码流类型: 0-->Intra ; 1-->Inter
  input                         i_IinP_en             , //输入--enable I block in P frame
  input                         i_db_en               ,	//输入--块滤波器使能
  input                         i_sao_en              ,	//输入--SAO使能--SAO(Sample Adaptive Offset)
  input  [5-1:0]                i_blok4x4bit          ,	//输入--4X4模块则填入4                               
  input   [32-1:0]              i_skip_cost_thre_08   ,	//输入--skip开销阈值;例程设置为0
  input   [32-1:0]              i_skip_cost_thre_16   ,
  input   [32-1:0]              i_skip_cost_thre_32   ,
  input   [32-1:0]              i_skip_cost_thre_64   ,
  // RC(Rate control) 配置
  // roi(region of interest_--图像感兴趣区域
  output [32             -1 :0] o_rc_mod64_sum        ,	//输出--mod64校验
  input  [32             -1 :0] i_rc_bitnum           ,	//输入--cabac编码bit数,一般为10000
  input  [16             -1 :0] i_rc_k                ,	//输入--rc k 值,一般给0
  input  [6              -1 :0] i_rc_roi_height       ,	//输入--roi 高度
  input  [7              -1 :0] i_rc_roi_width        ,	//输入--roi 宽度
  input  [7              -1 :0] i_rc_roi_x            ,	//输入--roi 起始 X 轴坐标
  input  [7              -1 :0] i_rc_roi_y            ,	//输入--roi 起始 Y 轴坐标
  input                         i_rc_roi_en           ,	//输入--roi 使能
  input  [10             -1 :0] i_rc_L1_frame_byte    , //输入--一级流水线处理帧数
  input  [10             -1 :0] i_rc_L2_frame_byte    , //输入--二级流水线处理帧数
  input                         i_rc_lcu_en           ,	//输入--LCU 使能
  input  [6              -1 :0] i_rc_QP_max           ,	//输入--QP最大值
  input  [6              -1 :0] i_rc_QP_min           ,	//输入--QP最小值
  input  [6              -1 :0] i_rc_QP_delta         ,	//输入--roi QP值
  //IME 配置                                           
  input  [CMD_NUM_WIDTH  -1 :0] i_IME_cmd_num         ,	//输入--IME指令
  input  [CMD_DAT_WIDTH  -1 :0] i_IME_cmd_data        ,	//输入--IME数据
  //外部缓存包括(原始输入视频缓存--重构缓存)   
  //ori-->original : ori_yuv_mem-->原始输入视频缓存器
  //rb -->rebuild  : rb_pix_mem -->像素重构缓存器
  output [1-1              : 0] o_rb_pix_mem_video_vs ,	//输出--告诉像素重构缓存开始接收,相当于VGA时序的vs
  input  [1-1              : 0] i_rb_pix_mem_video_OK ,	//输入--像素重构缓存告诉h265压缩模块,接收一帧视频结束
  output [5-1              : 0] o_ext_mem_do_mode     ,	//输出--告诉外部缓存操作指示,详情请看文章
  output [6+`PIC_X_WIDTH -1: 0] o_ext_mem_x_position  ,	//输出--告诉外部缓存当前视频的 X 坐标
  output [6+`PIC_Y_WIDTH -1: 0] o_ext_mem_y_position  ,	//输出--告诉外部缓存当前视频的 Y 坐标
  output [8-1              : 0] o_ext_mem_video_width ,	//输出--告诉外部缓存--视频总宽度
  output [8-1              : 0] o_ext_mem_video_height,	//输出--告诉外部缓存--视频总高度
  input                         i_ori_yuv_mem_video_de,	//输入--原始输入视频缓存--输入的 yuv4:2:0 视频数据有效
  input  [16*`PIXEL_WIDTH-1: 0] i_ori_yuv_mem_video   ,	//输入--原始输入视频缓存--输入的 yuv4:2:0 视频数据
  input                         i_rb_pix_mem_video_de ,	//输入--像素重构缓存告诉h265压缩模块--我需要像素数据
  output [16*`PIXEL_WIDTH-1: 0] o_rb_pix_mem_video    ,	//输出--给像素重构缓存灌的像素数据 
  // h265 output
  output                        o_h265_video_de       ,	//输出--压缩后的 h265 视频数据有效	
  output  [7               : 0] o_h265_video           	//输出--压缩后的 h265 视频数据
  );

6. H265 – Video Compression – Timing

For the timing diagram of input and output operations, please refer to the simulation waveform diagram in the vivado project in the data package. There is a simulation tutorial in the data package; the
timing of the input part is as follows:
insert image description here
insert image description here
the timing of the output part is as follows:
insert image description here
insert image description here
insert image description here

7. Detailed explanation of Vivado project

Create a Vivado project and copy the source code in for functional simulation and synthesis;
development board FPGA model: xc7k410tffg676-2;
input: test stimulus YUV 4:2:0 video stream file;
application: functional simulation and synthesis;
insert image description here
FPGA resource consumption As follows:
resource consumption is relatively large, please refer to whether your FPGA meets the resource requirements;
insert image description here

8, transplant top plate application

Since I have only done the simulation level here, I haven’t debugged on the board yet, but you can use it to transplant the board. You only need to solve two problems when transplanting the board. One is the input timing, and the other is the estimated FPGA resources; please refer to the input
timing The simulation files and simulation waveforms of the vivado project can be viewed by viewing the simulation input stimulus files and waveforms to understand the timing of the drive-in, and then fill in the data correspondingly; please see the above "7. FPGA resource consumption in the detailed explanation of the Vivado project" for FPGA resource consumption "; In addition, the transplant board can provide technical support;

9. Vivado function simulation

Due to the complexity of the simulation process, I specially made a PPT tutorial, which describes the simulation process in detail, hand-in-hand tutorials, and PPT tutorials are given in the data package; the location is as follows: the correct
insert image description here
simulation results are as follows:
insert image description here
insert image description here

10. Welfare: acquisition of engineering code

Benefits: Obtaining the engineering code
The code is too large to be sent by email. It will be sent via a certain network disk link, and
the data acquisition method: the V business card at the end of the article.
The network disk information is as follows:
insert image description here
insert image description here

Guess you like

Origin blog.csdn.net/qq_41667729/article/details/130774853