10. AVFrame structure

4.0 AVFrame structure

typedef struct AVFrame {
    
    
	#define AV_NUM_DATA_POINTERS 8
	uint8_t *data[AV_NUM_DATA_POINTERS];
	int linesize[AV_NUM_DATA_POINTERS];
	
	uint8_t **extended_data;
	
	/**宽高 */
	int width, height;
	
	int nb_samples;
	int format;
	
	/**是否是关键帧*/
	int key_frame;
	
	/**帧类型(I,B,P)*/
	enum AVPictureType pict_type;
	
	uint8_t *base[AV_NUM_DATA_POINTERS];
	AVRational sample_aspect_ratio;
	
	int64_t pts;
	int64_t pkt_pts;
	int64_t pkt_dts;
	
	int coded_picture_number;
	int display_picture_number;
	int quality;
	int reference;
	
	/**QP 表*/
	int8_t *qscale_table;
	
	int qstride;
	int qscale_type;
	
	/**跳过宏块表 */
	uint8_t *mbskip_table;
	
	/**运动矢量表*/
	int16_t (*motion_val[2])[2];
	
	/**宏块类型表 */
	uint32_t *mb_type;
	
	/**DCT 系数 */
	short *dct_coeff;
	
	/**参考帧列表 */
	int8_t *ref_index[2];
	
	void *opaque;
	uint64_t error[AV_NUM_DATA_POINTERS];
	
	int type;
	int repeat_pict;
	int interlaced_frame;
	int top_field_first;
	int palette_has_changed;
	int buffer_hints;
	
	AVPanScan *pan_scan;
	int64_t reordered_opaque;
	void *hwaccel_picture_private;
	struct AVCodecContext *owner;
	void *thread_opaque;
	
	/**
	* log2 of the size of the block which a single vector in motion_val represents:
	* (4->16x16, 3->8x8, 2-> 4x4, 1-> 2x2)
	* - encoding: unused
	* - decoding: Set by libavcodec.
	*/
	uint8_t motion_subsample_log2;
	
	/**(音频)采样率 */
	int sample_rate;
	
	uint64_t channel_layout;
	int64_t best_effort_timestamp;
	int64_t pkt_pos;
	int64_t pkt_duration;
	
	AVDictionary *metadata;
	
	int decode_error_flags;
	
	#define FF_DECODE_ERROR_INVALID_BITSTREAM 1
	#define FF_DECODE_ERROR_MISSING_REFERENCE 2
	
	int64_t channels;
	
	...
	
} AVFrame;

1. The AVFrame structure is generally used to store raw data (ie, uncompressed data, such as YUV, RGB for video, and PCM for audio), and
also contains some related information. For example, data such as macroblock type table, QP table, and motion vector table are stored during decoding.
Related data is also stored during encoding. Therefore, AVFrame is a very important structure when FFMPEG is used for stream analysis.

2. Let's look at the function of several main variables (considering the situation of decoding here):
uint8_t *data[AV_NUM_DATA_POINTERS]: The original data after decoding (YUV, RGB for video, PCM for audio)

int linesize[AV_NUM_DATA_POINTERS]: the size of data

int width, height: video frame width and height (1920x1080,1280x720...)

int nb_samples: An AVFrame of audio may contain multiple audio frames. This tag contains several

int format: the original data type after decoding (YUV420, YUV422, RGB24...)

int key_frame: whether it is a key frame

enum AVPictureType pict_type: frame type (I, B, P...)

AVRational sample_aspect_ratio: aspect ratio (16:9, 4:3...)

int64_t pts: display time stamp

int coded_picture_number: coded frame number

int display_picture_number: display frame number

int8_t *qscale_table:QP 表

uint8_t *mbskip_table: skip the macro block table

int16_t (*motion_val[2])[2]: Motion vector table

uint32_t *mb_type: Macro block type table

short *dct_coeff: DCT coefficient, this has not been extracted

int8_t *ref_index[2]: Motion estimation reference frame list (it looks like H.264 is a relatively new standard that involves multiple reference frames)

int interlaced_frame: whether it is interlaced

uint8_t motion_subsample_log2: The number of motion vector samples in a macro block. Other variables that take log will not be listed one by one. The source code has detailed instructions.

3. Here we will focus on analyzing several variables that require a certain understanding:
1) data[]
will store data in packed format (such as RGB24) in data[0].
For data in planar format (such as YUV420P), it will be divided into data[0], data[1], data[2]...
(in YUV420P data[0] stores Y, data[1] stores U, data[2] Save V) For
details, please refer to: FFMPEG realizes the conversion between YUV and RGB various image raw data (swscale)

2) pict_type
contains the following types:

enum AVPictureType {
    
    
	AV_PICTURE_TYPE_NONE = 0, ///< Undefined
	AV_PICTURE_TYPE_I, ///< Intra
	AV_PICTURE_TYPE_P, ///< Predicted
	AV_PICTURE_TYPE_B, ///< Bi-dir predicted
	AV_PICTURE_TYPE_S, ///< S(GMC)-VOP MPEG4
	AV_PICTURE_TYPE_SI, ///< Switching Intra
	AV_PICTURE_TYPE_SP, ///< Switching Predicted
	AV_PICTURE_TYPE_BI, ///< BI type
};

3) The
aspect ratio of sample_aspect_ratio is a score. AVRational is used to express the score in FFMPEG:

/**
* rational number numerator/denominator
*/
typedef struct AVRational{
    
    
	int num; ///< numerator
	int den; ///< denominator
} AVRational;

4) The qscale_table
QP table points to a block of memory, which stores the QP value of each macro block.
The label of the macro block is from left to right, line by line. Each macro block corresponds to 1 QP.
qscale_table[0] is the QP value of the macro block in the first row and column 1;
qscale_table[1] is the QP value of the macro block in the first row and second column;
qscale_table[2] is the QP value of the macro block in the first row and third column .
And so on...

The number of macro blocks is calculated by the following formula:
Note: The size of the macro block is 16x16.

Number of macro blocks per line:
int mb_stride = pCodecCtx->width/16+1

Total number of macro blocks:
int mb_sum = ((pCodecCtx->height+15)>>4)*(pCodecCtx->width/16+1)

5)motion_subsample_log2 The
size of the picture that a motion vector can represent (in terms of width or height, in pixels). Note that log2 is taken here.
The following data is given in the code comments: (2 4, 2 3, 2 2, 2 1)
4->16x16, 3->8x8, 2-> 4x4, 1-> 2x2

That is, when one motion vector represents a 16x16 picture, the value is 4; when
one motion vector represents an 8x8 picture, the value is 3
and so on...

6) The motion_val
motion vector table stores all motion vectors in one frame of video. This value is stored in a special way:

int16_t (*motion_val[2])[2];

A piece of code is given in the comments:

int mv_sample_log2= 4 - motion_subsample_log2;
int mb_width= (width+15)>>4;
int mv_stride= (mb_width << mv_sample_log2) + 1;
motion_val[direction][x + y*mv_stride][0->mv_x, 1->mv_y];

I probably know the structure of the data:
1. First, it is divided into two lists L0 and L1
2. Each list (L0 or L1) stores a series of MVs (each MV corresponds to a picture, the size is determined by motion_subsample_log2)
3. Each MV is divided into abscissa and ordinate (x,y)

Note that, in the MV FFMPEG MB and stored in the structure is no association, a first MV is the upper left corner of the screen on the MV (drawing
size depends surface motion_subsample_log2), the second screen is the first MV 1 The MV of the picture in the second column of the row, and so on. Therefore
, the motion vector of a macro block (16x16) is likely to be as shown in the figure below (line represents the number of motion vectors in a row):
Figure 10.1. The relationship between the motion vector divided by 8x8 and the macro block
The relationship between the motion vector divided by 8x8 and the macro block
7) mb_type
macro block type table The types of all macroblocks in a frame of video are stored. Its storage method is similar to QP table.
It's just that it is uint32 type, and QP is uint8 type. Each macro block corresponds to a macro block type variable.

The macro block type is defined as follows:

//The following defines may change, don't expect compatibility if you use them.
#define MB_TYPE_INTRA4x4 0x0001
#define MB_TYPE_INTRA16x16 0x0002 //FIXME H.264-specific
#define MB_TYPE_INTRA_PCM 0x0004 //FIXME H.264-specific
#define MB_TYPE_16x16 0x0008
#define MB_TYPE_16x8 0x0010
#define MB_TYPE_8x16 0x0020
#define MB_TYPE_8x8 0x0040
#define MB_TYPE_INTERLACED 0x0080
#define MB_TYPE_DIRECT2 0x0100 //FIXME
#define MB_TYPE_ACPRED 0x0200
#define MB_TYPE_GMC 0x0400
#define MB_TYPE_SKIP 0x0800
#define MB_TYPE_P0L0 0x1000
#define MB_TYPE_P1L0 0x2000
#define MB_TYPE_P0L1 0x4000
#define MB_TYPE_P1L1 0x8000
#define MB_TYPE_L0 (MB_TYPE_P0L0 | MB_TYPE_P1L0)
#define MB_TYPE_L1 (MB_TYPE_P0L1 | MB_TYPE_P1L1)
#define MB_TYPE_L0L1 (MB_TYPE_L0 | MB_TYPE_L1)
#define MB_TYPE_QUANT 0x00010000
#define MB_TYPE_CBP 0x0002000
//Note bits 24-31 are reserved for codec specific use (h264 ref0, mpeg1 0mv, ...)

If a macro block contains one or two of the above definitions, the corresponding bit of the corresponding macro block variable will be set to 1.
Note: A macro block can contain several types, but some types cannot be included repeatedly. For example, a macro block cannot be both 16x16 and 8x8.

8) The ref_index
motion estimation reference frame list stores the reference frame indexes of all macroblocks in a frame of video.
This list is actually useless in the earlier compression coding standards.
Only coding standards like H.264 have the concept of multiple reference frames.
Each macro block contains 4 such values, which reflect the index of the reference frame.

Figure 10.2. Video frame Figure 10.3.
Insert picture description here
Results of QP parameter extraction
Insert picture description here
Figure 10.4. Beautified (with color added)
Insert picture description here
Figure 10.5. Result of macroblock type parameter extraction
Insert picture description here
Figure 10.6. Beautified (with color added, clearer , S stands for skip macroblock)
Insert picture description here
Figure 10.7. The result of motion vector parameter extraction (here is List0)
Insert picture description here
Figure 10.8. The result of motion estimation reference frame parameter extraction
Insert picture description here

Guess you like

Origin blog.csdn.net/yanghangwww/article/details/104529238