ffmpeg AVFrame structure and related functions

0. Introduction

AVFrame stored raw data (e.g. YUV video, RGB, audio of the PCM), also contains some information related to, for example: when decoding is stored macroblock type table, the QP table, motion vector data is encoded. It also stores the relevant data.

1. AVFrame data structure definition

FFmpeg version 3.4.1

struct AVFrame defined in <libavutil / frame.h>

Source structure (I removed the comments):

  1 typedef struct AVFrame {
  2 #define AV_NUM_DATA_POINTERS 8
  3 
  4     uint8_t *data[AV_NUM_DATA_POINTERS];
  5 
  6     int linesize[AV_NUM_DATA_POINTERS];
  7 
  8     uint8_t **extended_data;
  9 
 10     int width, height;
 11  
 12     int nb_samples;
 13 
 14     int format;
 15 
 16     int key_frame;
 17 
 18     enum AVPictureType pict_type;
 19 
 20     AVRational sample_aspect_ratio;
 21 
 22     int64_t pts;
 23 
 24 #if FF_API_PKT_PTS
 25 
 26     attribute_deprecated
 27     int64_t pkt_pts;
 28 #endif
 29 
 30     int64_t pkt_dts;
 31 
 32     int coded_picture_number;
 33 
 34     int display_picture_number;
 35 
 36     int quality;
 37 
 38     void *opaque;
 39 
 40 #if FF_API_ERROR_FRAME
 41 
 42     attribute_deprecated
 43     uint64_t error[AV_NUM_DATA_POINTERS];
 44 #endif
 45 
 46     int repeat_pict;
 47 
 48     int interlaced_frame;
 49 
 50     int top_field_first;
 51 
 52     int palette_has_changed;
 53 
 54     int64_t reordered_opaque;
 55 
 56     int sample_rate;
 57 
 58     uint64_t channel_layout;
 59 
 60     AVBufferRef *buf[AV_NUM_DATA_POINTERS];
 61 
 62     AVBufferRef **extended_buf;
 63    
 64     int        nb_extended_buf;
 65 
 66     AVFrameSideData **side_data;
 67     int            nb_side_data;
 68 
 69 #define AV_FRAME_FLAG_CORRUPT       (1 << 0)
 70 
 71 #define AV_FRAME_FLAG_DISCARD   (1 << 2)
 72 
 73     int flags;
 74 
 75     enum AVColorRange color_range;
 76 
 77     enum AVColorPrimaries color_primaries;
 78 
 79     enum AVColorTransferCharacteristic color_trc;
 80 
 81     enum AVColorSpace colorspace;
 82 
 83     enum AVChromaLocation chroma_location;
 84 
 85     int64_t best_effort_timestamp;
 86 
 87     int64_t pkt_pos;
 88 
 89     int64_t pkt_duration;
 90 
 91     AVDictionary *metadata;
 92 
 93     int decode_error_flags;
 94 #define FF_DECODE_ERROR_INVALID_BITSTREAM   1
 95 #define FF_DECODE_ERROR_MISSING_REFERENCE   2
 96 
 97     int channels;
 98 
 99     int pkt_size;
100 
101 #if FF_API_FRAME_QP
102     attribute_deprecated
103     int8_t *qscale_table;
104    
105     attribute_deprecated
106     int qstride;
107 
108     attribute_deprecated
109     int qscale_type;
110 
111     AVBufferRef *qp_table_buf;
112 #endif
113 
114     AVBufferRef *hw_frames_ctx;
115 
116     AVBufferRef *opaque_ref;
117 
118     size_t crop_top;
119     size_t crop_bottom;
120     size_t crop_left;
121     size_t crop_right;
122 } AVFrame;

With #if ... #end field contains, it is to be abandoned or have been abandoned. Not explained.

You must use av_frame_alloc () allocation AVFrame, this is only allocated AVFram itself.
You must use av_frame_free () release.

uint8_t * data [AV_NUM_DATA_POINTERS];

Raw data (video is YUB, RGB, is the audio PCM)

data is an array of pointers, each element of the array is a pointer pointing to the image plane of a video or audio channel in a plane.

For packed format, a Y YUV image, U, V interleaving stored in a plane, for example: YUVYUVYUV ... ..., data [0] to point to this plane;

A two-channel audio frame has a left channel L and right channel R, are stored in an interleaved plane, for example: LRLRLR ... ..., data [0] to point to this plane.

For planar format, a YUV image has Y, U, V three plane, data [0] to point Y plane, data [1] Quality U plane, data [2] points V plane.

One pair of frames with a left audio channel L and right channel R channel two plane, data [0] to point L plane, data [1] Plane point R

int linesize[AV_NUM_DATA_POINTERS];

The case of video, image size LINESIZE per row (number of bytes, there byte alignment).

For audio, the LINESIZE each plane size (number of bytes). The audio except linesize [0]. For audio planar, the plane size of each must be the same.

linesize considerations may be due to some additional performance data is filled, so linesize may be larger than the actual size of the audio and video data corresponding.

uint8_t **extended_data;

　　Point to the data plane

int width, height;

Video frame pixel width and height.

int nb_samples;

Audio frame contained in a single channel sampling points.

int size;

Frame format if the format is not known or is not set, a value of -1.

For the video frame, corresponding to the value of enum AVPixelFormat structure:

1 enum AVPixelFormat {
2     AV_PIX_FMT_NONE = -1,
3     AV_PIX_FMT_YUV420P,   ///< planar YUV 4:2:0, 12bpp, (1 Cr & Cb sample per 2x2 Y samples)
4     AV_PIX_FMT_YUYV422,   ///< packed YUV 4:2:2, 16bpp, Y0 Cb Y1 Cr
5     AV_PIX_FMT_RGB24,     ///< packed RGB 8:8:8, 24bpp, RGBRGB...
6     AV_PIX_FMT_BGR24,     ///< packed RGB 8:8:8, 24bpp, BGRBGR...
7     AV_PIX_FMT_YUV422P,   ///< planar YUV 4:2:2, 16bpp, (1 Cr & Cb sample per 2x1 Y samples)
8     AV_PIX_FMT_YUV444P,   ///< planar YUV 4:4:4, 24bpp, (1 Cr & Cb sample per 1x1 Y samples)
      ...   ...
  };

For audio frame corresponding to the value of enum AVSampleFormat structure:

1  enum AVSampleFormat {
 2      AV_SAMPLE_FMT_NONE = - 1 ,
 3      AV_SAMPLE_FMT_U8,           /// <unsigned 8 bit
 4      AV_SAMPLE_FMT_S16,          /// <signed 16 bit
 5      AV_SAMPLE_FMT_S32,          /// <signed 32 bits
 6      AV_SAMPLE_FMT_FLT,          /// <float
 7      AV_SAMPLE_FMT_DBL ,          /// <double
 8  
9      AV_SAMPLE_FMT_U8P,          /// <unsigned 8 bits, planar
 10      AV_SAMPLE_FMT_S16P,        ///< signed 16 bits, planar
11     AV_SAMPLE_FMT_S32P,        ///< signed 32 bits, planar
12     AV_SAMPLE_FMT_FLTP,        ///< float, planar
13     AV_SAMPLE_FMT_DBLP,        ///< double, planar
14     AV_SAMPLE_FMT_S64,         ///< signed 64 bits
15     AV_SAMPLE_FMT_S64P,        ///< signed 64 bits, planar
16 
17     AV_SAMPLE_FMT_NB           ///< Number of sample formats. DO NOT USE if linking dynamically
18 };

int key_frame;

Identify whether the video frame is a key frame, 1: key frame; 0: non-key frame.

enum AVPictureType pict_type;

The type of video frame (I, B, P, etc.)

enum AVPictureType structure:

 1 enum AVPictureType {
 2     AV_PICTURE_TYPE_NONE = 0, ///< Undefined
 3     AV_PICTURE_TYPE_I,     ///< Intra
 4     AV_PICTURE_TYPE_P,     ///< Predicted
 5     AV_PICTURE_TYPE_B,     ///< Bi-dir predicted
 6     AV_PICTURE_TYPE_S,     ///< S(GMC)-VOP MPEG-4
 7     AV_PICTURE_TYPE_SI,    ///< Switching Intra
 8     AV_PICTURE_TYPE_SP,    ///< Switching Predicted
 9     AV_PICTURE_TYPE_BI,    ///< BI type
10 };

AVRational sample_aspect_ratio;

The aspect ratio of the video frame.

int64_t pts;

Display a timestamp. The unit is time_base.

int64_t pkt_dts;

Corresponding to the decoding time stamp of the packet. This value is obtained from the corresponding copy in pacekt.

If the corresponding packet is not provided only dts pts, this value is the frame pts.

int coded_picture_number;

Encoding a frame number.

int display_picture_number;

Display the frame number

int quality;

Quality (between 1 (best) and FF_LAMBDA_MAX (bad) between a)

void *opaque;

User's private information.

int repeat_pict;

When the decoding delay time for each frame of the picture.

extra_delay = repeat_pict / (2*fps)

int interlaced_frame;

Whether it is interlaced.

int top_field_first;

top field first variable image. If the content is interlaced, first displayed at the top field.

int palette_has_changed;

Tell the user application on a change from the palette

int sample_rate;

Audio sample rate.

uint64_t channel_layout;

Audio channel layout. Each bit represents a specific channel.

Reference source channel_layout.h defined:

 1 #define AV_CH_FRONT_LEFT             0x00000001
 2 #define AV_CH_FRONT_RIGHT            0x00000002
 3 #define AV_CH_FRONT_CENTER           0x00000004
 4 #define AV_CH_LOW_FREQUENCY          0x00000008
 5 #define AV_CH_BACK_LEFT              0x00000010
 6 ...   ...
 7 
 8 #define AV_CH_LAYOUT_MONO              (AV_CH_FRONT_CENTER)
 9 #define AV_CH_LAYOUT_STEREO            (AV_CH_FRONT_LEFT|AV_CH_FRONT_RIGHT)
10 #define AV_CH_LAYOUT_2POINT1           (AV_CH_LAYOUT_STEREO|AV_CH_LOW_FREQUENCY)
11 #define AV_CH_LAYOUT_2_1               (AV_CH_LAYOUT_STEREO|AV_CH_BACK_CENTER)
12 #define AV_CH_LAYOUT_SURROUND          (AV_CH_LAYOUT_STEREO|AV_CH_FRONT_CENTER)
13 #define AV_CH_LAYOUT_3POINT1           (AV_CH_LAYOUT_SURROUND|AV_CH_LOW_FREQUENCY)
14 ...   ...

AVBufferRef *buf[AV_NUM_DATA_POINTERS];

此帧的数据可以由AVBufferRef管理, AVBufferRef提供AVBuffer引用机制.

如果buf[]的所有元素都为NULL, 则此帧不会被引用计数.

必须连续填充buf[], 如果buf[i]为非NULL, 则对所有的j < i, 也必须有b[j]必须为非NULL.

对于视频来说, buf[]包含所有的AVBufferRef指针.

对于具有多于AV_NUM_DATA_POINTERS个声道的planar音频来说, 可能buf[]存不下所有的AVBufferRef指针, 多出的AVBufferRef指针存储在extended_buf数组中.

AVBufferRef **extended_buf;

对于具有多于AV_NUM_DATA_POINTERS个声道的planar音频来说, 可能buf[]存不下所有的AVBufferRef指针, 多出的AVBufferRef指针存储在extended_buf数组中.

int nb_extended_buf;

extended_buf中元素的数目.

AVFrameSideData **side_data;

边缘数据

int nb_side_data;

边缘数据的数目

int64_t best_effort_timestamp;

在流时间基中估计帧时间戳.

编码时未使用

解码时由解码器设置. 用户读取.

int64_t pkt_pos;

记录最后一个扔进解码器的packet在输入文件中的位置偏移量.

int64_t pkt_duration;

对应packet的时长, 单位是AVStream->time_base.

int channels;

音频声道数量.

int pkt_size;

对应packet的大小.

size_t crop_top;
size_t crop_bottom;
size_t crop_left;
size_t crop_right;

用于视频帧图像裁切. 四个值分别为从frame的上/下/左/右边界裁切的像素数.

这写成员暂时没有找到完美的解释(可能也不是很重要或不太常用)

int flags;

enum AVColorRange color_range;

enum AVColorPrimaries color_primaries;

enum AVColorTransferCharacteristic color_trc;

enum AVColorSpace colorspace;

enum AVChromaLocation chroma_location;

AVDictionary *metadata;

int decode_error_flags;

AVBufferRef *hw_frames_ctx;

AVBufferRef *opaque_ref;

2. 相关函数

AVFrame *av_frame_alloc(void);

构造一个AVFrame, 对象成员被设为默认值.

此函数只分配AVFrame对象本身, 而不分配AVFrame中的数据缓存区.

void av_frame_free(AVFrame **frame);

释放AVFrame.

int av_frame_ref(AVFrame *dst, const AVFrame *src);

为src中的数据建立一个新的引用.

将src中帧的各属性拷到dst中, 并且为src中每个AVBufferRef创建一个新的引用.

如果src未使用引用计数, 则dst中会分配新的数据缓存区, 将src中缓存区的数据拷贝到dst中的缓存区.

AVFrame *av_frame_clone(const AVFrame *src);

创建一个新的AVFrame, 新的AVFrame和src使用统一数据缓存区, 缓存区管理使用引用计数机制.

void av_frame_unref(AVFrame *frame);

解除本AVFrame对AVFrame中所有缓存区的引用, 并复位AVFrame中的各成员.

void av_frame_move_ref(AVFrame *dst, AVFrame *src);

将src中所有数据拷贝到dst中, 并复位src.

为避免内存泄漏, 在调用av_frame_move_ref(dst, src)之前应先调用av_frame_unref(dst);

int av_frame_get_buffer(AVFrame *frame, int align);

为音频或视频数据分配新的缓冲区.

调用本函数前, 帧中的以下成员必须先设置好:

- format
- width, height
- nb_samples, channel_layout

本函数会填充AVFrame.data和AVFrame.buf数组, 如果有需要, 还会分配和填充AVFrame.extended_data和AVFrame.extended_buf.

对于planar格式, 回味每个plane分配一个缓冲区.

int av_frame_copy(AVFrame *dst, const AVFrame *src);

将src中的帧数据拷贝到dst中.

本函数并不会有任何分配缓冲区的动作, 调用此函数前dst必须已经使用了和src同样的参数完成了初始化.

本函数只拷贝帧中的数据缓冲区的内容, 而不涉及帧中的其它属性.

参考

[1] 雷霄骅博士结构体分析:AVFrame https://blog.csdn.net/leixiaohua1020/article/details/14214577

[2] 叶余 FFmpeg数据结构AVFrame https://www.cnblogs.com/leisure_chn/p/10404502.html

[3]YelloLayne FFmpeg结构体:AVFrame https://www.jianshu.com/p/25a329b20078