Important structure of FFmpeg (transferred from Raytheon)

The starting address will be even more wrong

This article mainly integrates the series of articles on the relationship between the most critical structures in FFMPEG , so as to facilitate later learning and data reference.

link between structures

[External link picture transfer failed, the source site may have an anti-leeching mechanism, it is recommended to save the picture and upload it directly (img-cfJn4l60-1610869259427)(06_struct/struct-relationship.jpeg)]

a) Solution protocol (http, rtsp, rtmp, mms)

AVIOContext, URLProtocol, and URLContext mainly store the type and status of the protocol used by video and audio. URLProtocol stores the encapsulation format used by the input video and audio. Each protocol corresponds to a URLProtocol structure. (Note: files in FFMPEG are also regarded as a protocol "file")

b) Decapsulation (flv, avi, rmvb, mp4)

AVFormatContext mainly stores the information contained in the video and audio package format; AVInputFormat stores the package format used by the input video and audio. Each video and audio package format corresponds to an AVInputFormat structure.

c) Decoding (h264, mpeg2, aac, mp3)

Each AVStream stores data related to a video/audio stream; each AVStream corresponds to an AVCodecContext, which stores data related to the decoding method used by the video/audio stream; each AVCodecContext corresponds to an AVCodec, which contains the decoder corresponding to the video/audio . Each decoder corresponds to an AVCodec structure.

d) save data

For video, each structure generally stores one frame; audio may have several frames;
data before decoding: AVPacket and
data after decoding: AVFrame

AVFrame (libavutil/frame.h)

AVFrame is a structure that contains many stream parameters. The AVFrame structure is generally used to store raw data (that is, uncompressed data, such as YUV, RGB for video, and PCM for audio), and also contains some related information. For example, data such as the macroblock type table, QP table, and motion vector table are stored during decoding. Relevant data is also stored during encoding. Therefore, when using FFMPEG for stream analysis, AVFrame is a very important structure.

  • #define AV_NUM_DATA_POINTERS 8
  • uint8_t *data[AV_NUM_DATA_POINTERS] : Decoded raw data (YUV, RGB for video, PCM for audio)
  • int linesize[AV_NUM_DATA_POINTERS] : The size of "one line" of data in data. Note: It is not necessarily equal to the width of the image, generally larger than the width of the image.
  • int width, height : video frame width and height (1920x1080,1280x720…)
  • int nb_samples : An AVFrame of audio may contain multiple audio frames, and this tag contains several
  • int format : the original data type after decoding (YUV420, YUV422, RGB24…), unknown or not set to -1.
  • int key_frame : whether it is a key frame
  • enum AVPictureType pict_type : frame type (I, B, P...)
  • AVRational sample_aspect_ratio : Aspect ratio (16:9, 4:3...)
  • int64_t pts : display timestamp
  • int coded_picture_number : coded frame number
  • int display_picture_number : display frame number
  • int8_t *qscale_table:QP表
  • int channels : number of audio channels
  • int interlaced_frame : whether it is interlaced

Among them, the sample_aspect_ratio aspect ratio is a score, and the AVRational structure:

typedef struct AVRational{
    
    
    int num; ///< Numerator 分子
    int den; ///< Denominator 分母
} AVRational;

QP table qscale_table :
The QP table points to a piece of memory, which stores the QP value of each macroblock. The labels of the macroblocks are from left to right, line by line. Each macroblock corresponds to 1 QP. qscale_table[0] is the QP value of the macroblock in row 1 and column 1; qscale_table[1] is the QP value of the macroblock in row 1 and column 2; qscale_table[2] is the QP value of the macroblock in row 1 and column 3 . And so on...
The number of macroblocks is calculated by the following formula (note: the size of the macroblock is 16x16), the number of macroblocks in each row:

int mb_stride = pCodecCtx->width/16+1

Total number of macroblocks:

int mb_sum = ((pCodecCtx->height+15)>>4)*(pCodecCtx->width/16+1)

AVFormatContext (libavformat/avformat.h)

When developing with FFMPEG, AVFormatContext is a data structure throughout, and many functions use it as a parameter. It is a structure of FFMPEG decapsulation (flv, mp4, rmvb, avi) functions. (consider the case of decoding here)

  • AVInputFormat *iformat : input container format data
  • AVOutputFormat *oformat : output container format data
  • AVIOContext *pb : buffer for input data
  • AVIOContext *pb : buffer for input data
  • unsigned int nb_streams : the number of video and audio streams
  • AVStream **streams : video and audio streams
  • char filename[1024] : filename
  • char *url : input/output URL
  • int64_t duration : duration (unit: microseconds us, convert to seconds need to be divided by 1000000)
  • int bit_rate : bit rate (unit bps, converted to kbps needs to be divided by 1000)
  • int packet_size : the length of the packet
  • AVDictionary *metadata : Metadata

Among them, the metadata of the video is obtained through the av_dict_get() function. Encapsulated in AVDictionary and AVDictionaryEntry:

struct AVDictionary {
    
    
    int count;
    AVDictionaryEntry *elems;
};

typedef struct AVDictionaryEntry {
    
    
    char *key;
    char *value;
} AVDictionaryEntry;

AVStream (libavformat/avformat.h)

AVStream is a structure that stores information about each video/audio stream.

  • int index : identifies the video/audio stream (in AVFormatContext)
  • AVCodecContext *codec : Point to the AVCodecContext of the video/audio stream (they are one-to-one correspondence)
  • AVRational time_base : time base. Through this value, PTS and DTS can be converted into real time. This field is also available in other structures of FFMPEG, but according to my experience, only the time_base in AVStream is available. PTS*time_base=real time
  • int64_t duration : the length of the video/audio stream
  • int64_t nb_frames : number of frames in this stream when known
  • AVDictionary *metadata : metadata information
  • AVRational avg_frame_rate : frame rate (note: for video, this is very important)
  • AVPacket attached_pic : The attached picture. For example, album art attached to some MP3, AAC audio files.

AVIOContext (libavformat/avio.h)

AVIOContext is a structure for FFMPEG to manage input and output data

  • unsigned char *buffer : buffer start position
  • int buffer_size : buffer size (default 32768)
  • unsigned char *buf_ptr : the position read by the current pointer
  • unsigned char *buf_end : where the buffer ends
  • * void opaque : URLContext structure

Among them, the URLContext pointed to by opaque :

typedef struct URLContext {
    
    
    const AVClass *av_class;    /**< information for av_log(). Set by url_open(). */
    const struct URLProtocol *prot;
    void *priv_data;
    char *filename;             /**< specified URL */
    int flags;
    int max_packet_size;        /**< if non zero, the stream is packetized with this max packet size */
    int is_streamed;            /**< true if streamed (no seek possible), default = false */
    int is_connected;
    AVIOInterruptCB interrupt_callback;
    int64_t rw_timeout;         /**< maximum time to wait for (network) read/write operation completion, in mcs */
    const char *protocol_whitelist;
    const char *protocol_blacklist;
    int min_packet_size;        /**< if non zero, the stream is packetized with this min packet size */
} URLContext;

There is also a structure URLProtocol in the URLContext structure. Note: Each protocol (rtp, rtmp, file, etc.) corresponds to a URLProtocol. This structure is also not in the header file provided by FFMPEG. Find its definition from the FFMPEG source code:

typedef struct URLProtocol {
    
    
    const char *name;
    int     (*url_open)( URLContext *h, const char *url, int flags);
    int     (*url_open2)(URLContext *h, const char *url, int flags, AVDictionary **options);
    int     (*url_accept)(URLContext *s, URLContext **c);
    int     (*url_handshake)(URLContext *c);
    int     (*url_read)( URLContext *h, unsigned char *buf, int size);
    int     (*url_write)(URLContext *h, const unsigned char *buf, int size);
    int64_t (*url_seek)( URLContext *h, int64_t pos, int whence);
    int     (*url_close)(URLContext *h);
    int (*url_read_pause)(URLContext *h, int pause);
    int64_t (*url_read_seek)(URLContext *h, int stream_index,
                             int64_t timestamp, int flags);
    int (*url_get_file_handle)(URLContext *h);
    int (*url_get_multi_file_handle)(URLContext *h, int **handles,
                                     int *numhandles);
    int (*url_get_short_seek)(URLContext *h);
    int (*url_shutdown)(URLContext *h, int flags);
    int priv_data_size;
    const AVClass *priv_data_class;
    int flags;
    int (*url_check)(URLContext *h, int mask);
    int (*url_open_dir)(URLContext *h);
    int (*url_read_dir)(URLContext *h, AVIODirEntry **next);
    int (*url_close_dir)(URLContext *h);
    int (*url_delete)(URLContext *h);
    int (*url_move)(URLContext *h_src, URLContext *h_dst);
    const char *default_whitelist;
} URLProtocol;

AVCodecContext (libavcodec/avcodec.h)

AVCodecContext is a structure containing more variables (it feels almost the structure with the most variables). This article will roughly analyze the meaning and function of each variable in the structure. (only decoding is considered here)

  • enum AVMediaType codec_type : type of codec (video, audio, subtitle...)
  • struct AVCodec *codec : the decoder AVCodec (H.264, MPEG2...) used
  • int bit_rate : average bit rate
  • uint8_t *extradata; int extradata_size : Additional information included for a specific encoder (for example, for H.264 decoder, store SPS, PPS, etc.)
  • AVRational time_base : According to this parameter, PTS can be converted into actual time (in seconds)
  • int width, height : if it is a video, it represents width and height
  • int refs : the number of reference frames for motion estimation (there will be multiple frames in H.264, and generally there will be no such things as MPEG2)
  • int sample_rate : sample rate (audio)
  • int channels : number of channels (audio)
  • enum AVSampleFormat sample_fmt : sampling format
  • int frame_size : sampling rate of each channel in the frame (audio)
  • int profile : type (in H.264, other encoding standards should also have)
  • int level : level (not too different from profile)

Among them, 1. Codec type: codec_type

enum AVMediaType {
    
    
    AVMEDIA_TYPE_UNKNOWN = -1,  ///< Usually treated as AVMEDIA_TYPE_DATA
    AVMEDIA_TYPE_VIDEO,
    AVMEDIA_TYPE_AUDIO,
    AVMEDIA_TYPE_DATA,          ///< Opaque data information usually continuous
    AVMEDIA_TYPE_SUBTITLE,
    AVMEDIA_TYPE_ATTACHMENT,    ///< Opaque data information usually sparse
    AVMEDIA_TYPE_NB
};

2. Audio sampling format in FFMPEG: sample_fmt

enum AVSampleFormat {
    
    
    AV_SAMPLE_FMT_NONE = -1,
    AV_SAMPLE_FMT_U8,          ///< unsigned 8 bits
    AV_SAMPLE_FMT_S16,         ///< signed 16 bits
    AV_SAMPLE_FMT_S32,         ///< signed 32 bits
    AV_SAMPLE_FMT_FLT,         ///< float
    AV_SAMPLE_FMT_DBL,         ///< double

    AV_SAMPLE_FMT_U8P,         ///< unsigned 8 bits, planar
    AV_SAMPLE_FMT_S16P,        ///< signed 16 bits, planar
    AV_SAMPLE_FMT_S32P,        ///< signed 32 bits, planar
    AV_SAMPLE_FMT_FLTP,        ///< float, planar
    AV_SAMPLE_FMT_DBLP,        ///< double, planar
    AV_SAMPLE_FMT_S64,         ///< signed 64 bits
    AV_SAMPLE_FMT_S64P,        ///< signed 64 bits, planar

    AV_SAMPLE_FMT_NB           ///< Number of sample formats. DO NOT USE if linking dynamically
};

3.FFMPEG medium size: profile


#define FF_PROFILE_UNKNOWN -99
#define FF_PROFILE_RESERVED -100
 
#define FF_PROFILE_AAC_MAIN 0
#define FF_PROFILE_AAC_LOW  1
#define FF_PROFILE_AAC_SSR  2
#define FF_PROFILE_AAC_LTP  3
#define FF_PROFILE_AAC_HE   4
#define FF_PROFILE_AAC_HE_V2 28
#define FF_PROFILE_AAC_LD   22
#define FF_PROFILE_AAC_ELD  38
 
#define FF_PROFILE_DTS         20
#define FF_PROFILE_DTS_ES      30
#define FF_PROFILE_DTS_96_24   40
#define FF_PROFILE_DTS_HD_HRA  50
#define FF_PROFILE_DTS_HD_MA   60
 
#define FF_PROFILE_MPEG2_422    0
#define FF_PROFILE_MPEG2_HIGH   1
#define FF_PROFILE_MPEG2_SS     2
#define FF_PROFILE_MPEG2_SNR_SCALABLE  3
#define FF_PROFILE_MPEG2_MAIN   4
#define FF_PROFILE_MPEG2_SIMPLE 5
 
#define FF_PROFILE_H264_CONSTRAINED  (1<<9)  // 8+1; constraint_set1_flag
#define FF_PROFILE_H264_INTRA        (1<<11) // 8+3; constraint_set3_flag
 
#define FF_PROFILE_H264_BASELINE             66
#define FF_PROFILE_H264_CONSTRAINED_BASELINE (66|FF_PROFILE_H264_CONSTRAINED)
#define FF_PROFILE_H264_MAIN                 77
#define FF_PROFILE_H264_EXTENDED             88
#define FF_PROFILE_H264_HIGH                 100
#define FF_PROFILE_H264_HIGH_10              110
#define FF_PROFILE_H264_HIGH_10_INTRA        (110|FF_PROFILE_H264_INTRA)
#define FF_PROFILE_H264_HIGH_422             122
#define FF_PROFILE_H264_HIGH_422_INTRA       (122|FF_PROFILE_H264_INTRA)
#define FF_PROFILE_H264_HIGH_444             144
#define FF_PROFILE_H264_HIGH_444_PREDICTIVE  244
#define FF_PROFILE_H264_HIGH_444_INTRA       (244|FF_PROFILE_H264_INTRA)
#define FF_PROFILE_H264_CAVLC_444            44
 
#define FF_PROFILE_VC1_SIMPLE   0
#define FF_PROFILE_VC1_MAIN     1
#define FF_PROFILE_VC1_COMPLEX  2
#define FF_PROFILE_VC1_ADVANCED 3
 
#define FF_PROFILE_MPEG4_SIMPLE                     0
#define FF_PROFILE_MPEG4_SIMPLE_SCALABLE            1
#define FF_PROFILE_MPEG4_CORE                       2
#define FF_PROFILE_MPEG4_MAIN                       3
#define FF_PROFILE_MPEG4_N_BIT                      4
#define FF_PROFILE_MPEG4_SCALABLE_TEXTURE           5
#define FF_PROFILE_MPEG4_SIMPLE_FACE_ANIMATION      6
#define FF_PROFILE_MPEG4_BASIC_ANIMATED_TEXTURE     7
#define FF_PROFILE_MPEG4_HYBRID                     8
#define FF_PROFILE_MPEG4_ADVANCED_REAL_TIME         9
#define FF_PROFILE_MPEG4_CORE_SCALABLE             10
#define FF_PROFILE_MPEG4_ADVANCED_CODING           11
#define FF_PROFILE_MPEG4_ADVANCED_CORE             12
#define FF_PROFILE_MPEG4_ADVANCED_SCALABLE_TEXTURE 13
#define FF_PROFILE_MPEG4_SIMPLE_STUDIO             14
#define FF_PROFILE_MPEG4_ADVANCED_SIMPLE           15

AVCodec (libavcodec/avcodec.h)

AVCodec is a structure that stores codec information.

  • const char *name : the name of the codec, relatively short
  • const char *long_name : the name of the codec, full name, relatively long
  • enum AVMediaType type : Indicates the type, whether it is video, audio, or subtitles
  • enum AVCodecID id : ID, not repeated
  • const AVRational *supported_framerates : supported framerates (video only)
  • const enum AVPixelFormat *pix_fmts : supported pixel formats (video only)
  • const int *supported_samplerates : supported sample rates (audio only)
  • const enum AVSampleFormat *sample_fmts : supported sample formats (audio only)
  • const uint64_t *channel_layouts : number of channels supported (audio only)
  • int priv_data_size : size of private data

Among them, the AVMediaType structure:

enum AVMediaType {
    
    
    AVMEDIA_TYPE_UNKNOWN = -1,  ///< Usually treated as AVMEDIA_TYPE_DATA
    AVMEDIA_TYPE_VIDEO,
    AVMEDIA_TYPE_AUDIO,
    AVMEDIA_TYPE_DATA,          ///< Opaque data information usually continuous
    AVMEDIA_TYPE_SUBTITLE,
    AVMEDIA_TYPE_ATTACHMENT,    ///< Opaque data information usually sparse
    AVMEDIA_TYPE_NB
};

AVCodecID structure:

enum AVCodecID {
    
    
    AV_CODEC_ID_NONE,
    /* video codecs */
    AV_CODEC_ID_MPEG1VIDEO,
    AV_CODEC_ID_MPEG2VIDEO, ///< preferred ID for MPEG-1/2 video decoding
    AV_CODEC_ID_H261,
    AV_CODEC_ID_H263,
    //...(代码太长,略)
}

AVPixelFormat structure:

enum AVPixelFormat {
    
    
    AV_PIX_FMT_NONE = -1,
    AV_PIX_FMT_YUV420P,   ///< planar YUV 4:2:0, 12bpp, (1 Cr & Cb sample per 2x2 Y samples)
    AV_PIX_FMT_YUYV422,   ///< packed YUV 4:2:2, 16bpp, Y0 Cb Y1 Cr
    AV_PIX_FMT_RGB24,     ///< packed RGB 8:8:8, 24bpp, RGBRGB...
    //...(代码太长,略)
}

AVPacket (libavcodec/avcodec.h)

AVPacket is a structure that stores information about compressed encoded data. For example for H.264. The data of 1 AVPacket usually corresponds to a NAL. Note: here is just a correspondence, not exactly the same. There is a slight difference between them: use the FFMPEG class library to separate the H.264 stream in the multimedia file. Therefore, when using FFMPEG for video and audio processing, the data data of the obtained AVPacket can often be directly written into a file, so as to obtain the video and audio code stream file.

  • int64_t pts : display timestamp
  • int64_t dts : decoded timestamp
  • uint8_t *data : compressed encoded data
  • int size : the size of data
  • int stream_index : identifies the video/audio stream to which the AVPacket belongs.


Guess you like

Origin blog.csdn.net/github_38117599/article/details/112748155